Hi Maheedhara,
Can you please provide us with the running-config from both the leaf switches.
Thanks,Praitk Pande
Hi Maheedhara,
Can you please provide us with the running-config from both the leaf switches.
Thanks,Praitk Pande
Hi,
I am trying to get line rate with two machines (A and B) connected back to back using Connect_X5 EN 100G NICs.
Machine A (Transmitting pkts)
Run DPDK pktgen: sudo ./app/x86_64-native-linuxapp-gcc/pktgen -l 0-5 -n 3 -w 04:00.0 -- -T -P -m "[1:2-5].0"
Machine B (Receiving pkts)
Run DPDK pktgen: sudo ./app/x86_64-native-linuxapp-gcc/pktgen -l 0-5 -n 3 -w 04:00.0 -- -T -P -m "[1-4:5].0"
Machine A is sending packets @ 52G
Machine B is receiving packets @ 16G
Here are some more information.
Dell R620 with Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz (6 cores)
Only one socket is there.
HT is disabled.
isolcpus=1-5
DPDK 18.08
Ubuntu 16.04 LTS
MLNX_OFED_LINUX-4.4-2.0.7.0-ubuntu16.04-x86_64 was installed
My questions:
Q1. Why is machine B not able to receive more than 16G packets?
Q2. PCIE capacity and status speeds are diffrent.
root:~$ sudo lspci -s 04:00.0 -vvv | grep Width
LnkCap: Port #0, Speed 16GT/s, Width x16, ASPM not supported, Exit Latency L0s unlimited, L1 unlimited
LnkSta: Speed 8GT/s, Width x16, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
As you see, speed available is 16GT/s, but only 8GT/s is used. How can I increase this? The card is installed in SLOT2_G2_X16(CPU1).
Thanks,
A
Hi Michael,
Thank you for posting your question on the Mellanox Community.
Based on the information provided, please follow Mellanox Community Document -> https://community.mellanox.com/docs/DOC-2474.
If after applying the Community document, the issue issue is not resolved, please open a Mellanox Support case by sending an email to support@mellanox.com
Thanks and regards,
~Mellanox Technical Support
Hi,
I'm not sure how HowTo Configure PFC on ConnectX-4 article is helpful. It doesn't relate to NetworkManager at all.
To whoever is reading this thread - this question is NOT answered as of this moment.
Hi Tom,
Thanks you for posting your question on the Mellanox Community.
Based on the information provided, the following Mellanox Community document explains the 'rx_out_of_buffer' ethtool/xstat statistic.
You can improve the rx_out_of_buffer behavior with tuning the node and also modifying the ring-size on the adapter (ethtool -g <int>)
Also make sure, you follow the DPDK Performance recommendations from the following link -> https://doc.dpdk.org/guides/nics/mlx5.html#performance-tuning
If you still experience performance issues after these recommendations, please do not hesitate to open a Mellanox Support Case, by emailing to support@mellanox.com
Thanks and regards,
~Mellanox Technical Support
Hi Edward,
Thank you for posting your question on the Mellanox Community.
Based on the information provided, this issue needs more engineering effort.
Can you please open a Mellanox Support case by emailing support@mellanox.com
Thanks and regards,
~Mellanox Technical Support
Thank you! I totally missed checking the adapters! I did find the correct file needed for the switch PSID. Also, appreciate the explanation on opensm config. I believe my flash cards for CMC will be delivered tomorrow, will start of updates (Bios, I/O, etc.) first, install frontend Rocks and then tackle MOFED install (3.4 works), I think this will get moving...!
I have the following Infiniband HCA's
ConnectX-3
Connect-IB
ConnectX-4
1. is all the above HCA's SR-IOV configurable for Docker Environment ?
2. What is the documentation that should be following ?
3. Can it establish rdma communication while using it in the docker ?
Hello , Mellanox Academy support team
I am on studying the Infiniband Fabric ,and I have problem on create the topology file on the MSB7890 and 2 HCA card connect to it ,but I don't know how to create the topology file here ,some question here are :
1: the course on the topology file is out of date and not clear to me ,it is focus on the SX6036/SX6025
2: no ibnl file related tothe MSB7890 switch,
3: no SM defined in the topology file
here is what I do in my topology file :
[root@node01 ~]# ibdmchk -t a.topo
-------------------------------------------------
IBDMCHK Cluster Design Mode:
Topology File .. a.topo
SM Node ........
SM Port ........ 4294967295
LMC ............ 0
-I- Parsing topology definition:a.topo
-W- Ignoring 'p13 -> HCA-1 node01 p1' (line:2)
-W- Ignoring 'p17 -> HCA-1 node04 p1' (line:3)
-I- Defined 1/2 systems/nodes
-E- Fail to find SM node:
[root@node01 ~]# cat a.topo
MSB7700 MSB7700
p13 -> HCA-1 node01 p1
p17 -> HCA-1 node04 p1
here is the link for my ib network :
[root@node01 ~]# iblinkinfo
CA: node04 HCA-1:
0x98039b0300078390 4 1[ ] ==( 4X 25.78125 Gbps Active/ LinkUp)==> 3 17[ ] "SwitchIB Mellanox Technologies" ( )
Switch: 0x248a070300f82490 SwitchIB Mellanox Technologies:
3 1[ ] ==( Down/ Polling)==> [ ] "" ( )
3 2[ ] ==( Down/ Polling)==> [ ] "" ( )
3 3[ ] ==( Down/ Polling)==> [ ] "" ( )
3 4[ ] ==( Down/ Polling)==> [ ] "" ( )
3 5[ ] ==( Down/ Polling)==> [ ] "" ( )
3 6[ ] ==( Down/ Polling)==> [ ] "" ( )
3 7[ ] ==( Down/ Polling)==> [ ] "" ( )
3 8[ ] ==( Down/ Polling)==> [ ] "" ( )
3 9[ ] ==( Down/ Polling)==> [ ] "" ( )
3 10[ ] ==( Down/ Polling)==> [ ] "" ( )
3 11[ ] ==( Down/ Polling)==> [ ] "" ( )
3 12[ ] ==( Down/ Polling)==> [ ] "" ( )
3 13[ ] ==( 4X 25.78125 Gbps Active/ LinkUp)==> 1 1[ ] "node01 HCA-1" ( )
3 14[ ] ==( Down/ Polling)==> [ ] "" ( )
3 15[ ] ==( Down/ Polling)==> [ ] "" ( )
3 16[ ] ==( Down/ Polling)==> [ ] "" ( )
3 17[ ] ==( 4X 25.78125 Gbps Active/ LinkUp)==> 4 1[ ] "node04 HCA-1" ( )
3 18[ ] ==( Down/ Polling)==> [ ] "" ( )
3 19[ ] ==( Down/ Polling)==> [ ] "" ( )
3 20[ ] ==( Down/ Polling)==> [ ] "" ( )
3 21[ ] ==( Down/ Polling)==> [ ] "" ( )
3 22[ ] ==( Down/ Polling)==> [ ] "" ( )
3 23[ ] ==( Down/ Polling)==> [ ] "" ( )
3 24[ ] ==( Down/ Polling)==> [ ] "" ( )
3 25[ ] ==( Down/ Polling)==> [ ] "" ( )
3 26[ ] ==( Down/ Polling)==> [ ] "" ( )
3 27[ ] ==( Down/ Polling)==> [ ] "" ( )
3 28[ ] ==( Down/ Polling)==> [ ] "" ( )
3 29[ ] ==( Down/ Polling)==> [ ] "" ( )
3 30[ ] ==( Down/ Polling)==> [ ] "" ( )
3 31[ ] ==( Down/ Polling)==> [ ] "" ( )
3 32[ ] ==( Down/ Polling)==> [ ] "" ( )
3 33[ ] ==( Down/ Polling)==> [ ] "" ( )
3 34[ ] ==( Down/ Polling)==> [ ] "" ( )
3 35[ ] ==( Down/ Polling)==> [ ] "" ( )
3 36[ ] ==( Down/ Polling)==> [ ] "" ( )
3 37[ ] ==( Down/ Polling)==> [ ] "" ( )
CA: node01 HCA-1:
0x98039b0300078348 1 1[ ] ==( 4X 25.78125 Gbps Active/ LinkUp)==> 3 13[ ] "SwitchIB Mellanox Technologies" ( )
Thanks for your contact.
Can I get some expert advice on this if possible?
Thanks
Hi Corbin,
If we plug MC3208011-SX into MAM1Q00A-QSA and configure "speed 1000" at that interface , will it works fine ?
==> Yes. Shouldn't be an issue
Just make sure you configure the interface speed manually to 1G
Thanks,
Pratik Pande
Hi Alkx,
Thanks a lot for your guide. I just read this reply. In the past days, I use the method of rpm patching and rebuilding, force installation for the debugging. It works and I can get the symbols, but very complex. The method you provided here seems to be very flexible. I will try this next time.
The dust is invisible to the naked eye and is very easy to attach to the fiber connector. In the routine maintenance of the fiber optic connector, the fiber optic connector is contaminated with oil, powder and other contaminants. These contaminants may cause problems such as unclean fiber tips, aging connectors, degraded cable quality, and unobstructed network links. Therefore, it is necessary to clean the fiber connector regularly and take dust-proof measures.
Hi Alkx,
Many thanks.
I will read the specification later when time is available. Yes, I did ask the question on that blog. When I click to commit my question, nothing update there. So I assumed that I failed to commit due to the network or the server's problem. To my understanding, the NIC will generate the ACK once it received the whole frame/segment without any error to accelerate the processing. Indeed, almost all the chips will do so in the embedded platform, maybe the NIC on the server will have the same consideration.
I am just confused the cycles gap of poll cq between the RC and UC mode. And by the way, where can I get the latency and BW report of MLNX NICs? Is it public? From the blog, it is said that usually less than 1 microsecond with small packets. But my testing result is more than 1 us. And no matter whether the numactl is used to specify the CPU core and memory allocation.
Hi All,
Performance issue on 8 x Del R740 servers with ESXi 6.5 and vCenter 6.5
Each server required 4 VM's (Redhat 7.4) to sit on it
with 1Gb, 10Gb and 40Gb network access and to accommodate that we needed to
create 10Gb and 40Gb distributed switches. We uplinked 2 ports from a 40Gb
& 10Gb switch to the servers.
The 10Gb distributed switch works fine but our 40Gb distributed switch is problematic.
We are using the tool iPerf to validate the bandwidth between the cluster and an Isilon storage array.
troubleshooting steps:
- iPerf from 1 Isilon node to Isilon node, consistently getting over 35Gb
- From VM to Isilon Node, iPerf is satisfactorily getting over 35Gb
- From Isilon node to VM, iPerf is ranging between 13Gb and 20Gb
- between redhat VM's on separate ESXi hosts, iPerf is ranging between 13Gb and 20Gb
- From VMs on same ESXi hosts, iPerf is getting over 35Gb
- We checked the firmware of the 40Gb card (MLNX 40Gb 2P ConnectX3Pro Adpt ) and it's at the latest version (2.42.5000).
Has encountered a similar issue - or what steps need to be taken for ESXi 6.5 and the mellanox cards (MLNX 40Gb 2P ConnectX3Pro Adpt)
Here is the fw details of the card. I think those are the latest.
flint -d /dev/mst/mt4103_pciconf0 q
Image type: FS2
FW Version: 2.42.5000
FW Release Date: 5.9.2017
Product Version: 02.42.50.00
Rom Info: version_id=8025 type=CLP
type=UEFI version=14.11.45 cpu=AMD64
type=PXE version=3.4.752
Device ID: 4103
Description: Node Port1 Port2 Sys image
GUIDs: ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff
MACs: 70106fa0f9b0 70106fa0f9b1
VSD:
PSID: HP_2240110004
I saw at http://www.mellanox.com/page/products_dyn?product_family=26
"Linux Inbox Drivers
Mellanox Adapters' Linux VPI Drivers for Ethernet and InfiniBand are also available Inbox in all the major distributions, RHEL, SLES, Ubuntu and more. Inbox drivers enable Mellanox High performance for Cloud, HPC, Storage, Financial Services and more with the Out of box experience of Enterprise grade Linux distributions."
I've used this for several generations of Mellanox cards of for last decade with a wide variety of linux distributions. Just config with /etc/network/interfaces, ifconfig ib0 works, datagram/connected mode works, IPoIB works.
From what I can tell with 18.04 it doesn't work. The default (ubuntu supplied/INBOX) drivers:
# cat /sys/class/net/ib0/mode
datagram
# echo connected > /sys/class/net/ib0/mode
-bash: echo: write error: Invalid argument
I found docs that with the MLNX_OFED drivers that you just:
# cat ib_ipoib.conf
options ib_ipoib ipoib_enhanced=0
I get:
[ 57.573664] ib_ipoib: unknown parameter 'ipoib_enhanced' ignored
Is there any documentation for the "INBOX drivers"? Anyone know how to get connected mode working? I'm guess the 18.04 drivers are too old to have the ability to disable enhanced mode, but to new to have the connected mode working by default.