Quantcast
Channel: Mellanox Interconnect Community: Message List
Viewing all articles
Browse latest Browse all 6230

ConnectX-3 RoCE fails send WRs when running multiple QPs

$
0
0

Hi all, I have spent the last 3 weeks building an event-driven I/O engine for RNICs.

I have worked out most of the kinks, but there is an error I cannot understand.

For test application, I have a single client connecting to two loopback servers (3 processes total).

I programmed it so that QP connections are made based on information exchanged using an out-of-band TCP/IP connection.

The first connection goes smoothly. The both sides exchange all the right information, both QPs transition to RTS, and exchanges data in IB transport. No problem.

The QPs in the second connection also transition to RTS without an error. However, soon after they post their first send/recv WRs, the sender runs into an erroneous send WC. (status 12: RETRY_EXC_ERR, vendor_err 129)

Retrying the send results in yet another erroneous WC with status 5, vendor_err 249.

As far as I know, libibverbs APIs are thread-safe, so I doubt race is an issue. Also, the non-event-loop part of client is single threaded, so all connections are made sequentially.

The first connection is working fine, so I doubt there's any reason the second should not work.

(The exchange using the first connection works perfectly even as the second one fails)

 

The server I'm running has a dual-port MCX312A-XCBT. Port 1 (1-based indexing) doubles as both my ssh port and RoCE port.

I'm running mlx4_core 1.1 and mlx4_en 2.1.6 (It says here it's from Aug 2013)

 

On a side note, another surprise I discovered while benchmarking this ConnectX-3 EN card is that RoCE READ/WRITE performance substantially degrades (down to 20MB/s from 1GB/s) with multi-megabyte payloads,

so if you could also shed some light on this issue, I would really appreciate it.

 

While I do realize that the drivers are somewhat dated, I currently do not have any direct access to the servers, and I would have to contact overseas branch for the updates,

so I would like to avoid any driver update or changes to hardware configuration if at all possible or unless conclusively deemed relevant.

 

Thank you for your time .


Viewing all articles
Browse latest Browse all 6230

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>