RDMA bandwidth testing works:
server# ib_send_bw -d mlx4_0 -i 1 -F -b
------------------------------------------------------------------
Send Bidirectional BW Test
Connection type : RC
Inline data is used up to 400 bytes message
local address: LID 0x02, QPN 0x01c6, PSN 0xef4fed
remote address: LID 0x04, QPN 0x004c, PSN 0x98d931
Mtu : 2048
------------------------------------------------------------------
#bytes #iterations BW peak[MB/sec] BW average[MB/sec]
------------------------------------------------------------------
client:
# ib_send_bw -d mlx4_0 -i 1 -F 10.18.3.31 -b
------------------------------------------------------------------
Send Bidirectional BW Test
Connection type : RC
Inline data is used up to 400 bytes message
local address: LID 0x04, QPN 0x004c, PSN 0x98d931
remote address: LID 0x02, QPN 0x01c6, PSN 0xef4fed
Mtu : 2048
------------------------------------------------------------------
#bytes #iterations BW peak[MB/sec] BW average[MB/sec]
Conflicting CPU frequency values detected: 1200.000000 != 2101.000000
Test integrity may be harmed !
Conflicting CPU frequency values detected: 1200.000000 != 2101.000000
Test integrity may be harmed !
Warning: measured timestamp frequency 2099.8 differs from nominal 1200 MHz
65536 1000 11457.31 11456.84
------------------------------------------------------------------
root@nas-2-1:~#
Does the lack of results from the server side look like a problem?
On the client I did:
# mount -o rdma,port=20049 10.18.3.58:/export/1 /mnt/ib
I can cd around, and cat/touch small files:
# cat .gnuplot-wxt
raise=1
persist=0
ctrl=0
rendering=2
hinting=100
#
But not anything large:
root@c2-31:/mnt/ib# md5sum testfile
md5sum: testfile: Input/output error
I've not tried NFSv3 yet, does NFSv4 have a known problem with RDMA? The same pair of machines has NFSv4 working over GigE. I tried your command and:
# dmesg -c
# cat testfile
cat: testfile: Input/output error
# rpcdebug -m rpc -s all
# dmesg
[242031.688205] RPC: looking up Generic cred
[242031.688213] RPC: looking up Generic cred
[242031.688216] RPC: looking up Generic cred
[242031.688222] RPC: new task initialized, procpid 12386
[242031.688224] RPC: allocated task ffff880845c33000
[242031.688236] RPC: 110 __rpc_execute flags=0x81
[242031.688241] RPC: 110 call_start nfs4 proc OPEN (async)
[242031.688243] RPC: 110 call_reserve (status 0)
[242031.688246] RPC: 110 reserved req ffff880853255600 xid dfed959b
[242031.688249] RPC: 110 call_reserveresult (status 0)
[242031.688251] RPC: 110 call_refresh (status 0)
[242031.688253] RPC: 110 refreshing UNIX cred ffff881050ef20c0
[242031.688255] RPC: 110 call_refreshresult (status 0)
[242031.688257] RPC: 110 call_allocate (status 0)
[242031.688263] RPC: xprt_rdma_allocate: size 6344 too large for buffer[1024]: prog 100003 vers 4 proc 1
[242031.688269] RPC: xprt_rdma_allocate: size 6344, request 0xffff88082b81c000
[242031.688271] RPC: 110 call_bind (status 0)
[242031.688273] RPC: 110 call_connect xprt ffff880846752000 is connected
[242031.688275] RPC: 110 call_transmit (status 0)
[242031.688277] RPC: 110 xprt_prepare_transmit
[242031.688279] RPC: 110 xprt_cwnd_limited cong = 0 cwnd = 4096
[242031.688281] RPC: 110 rpc_xdr_encode (status 0)
[242031.688283] RPC: 110 marshaling UNIX cred ffff881050ef20c0
[242031.688286] RPC: 110 using AUTH_UNIX cred ffff881050ef20c0 to wrap rpc data
[242031.688290] RPC: 110 xprt_transmit(220)
[242031.688293] RPC: rpcrdma_inline_pullup: pad 0 destp 0xffff88082b81d83c len 220 hdrlen 220
[242031.688297] RPC: rpcrdma_register_frmr_external: Using frmr ffff880845ccda90 to map 1 segments
[242031.688301] RPC: rpcrdma_create_chunks: reply chunk elem 3124@0x82b81e3f4:0xe006730f (last)
[242031.688305] RPC: rpcrdma_marshal_req: reply chunk: hdrlen 48 rpclen 220 padlen 0 headerp 0xffff88082b81d100 base 0xffff88082b81d760 lkey 0x8000
[242031.688309] RPC: 110 xmit complete
[242031.688311] RPC: 110 sleep_on(queue "xprt_pending" time 4355352568)
[242031.688313] RPC: 110 added to queue ffff880846752258 "xprt_pending"
[242031.688315] RPC: 110 setting alarm for 60000 ms
[242031.688318] RPC: wake_up_first(ffff880846752190 "xprt_sending")
[242031.688333] RPC: rpcrdma_event_process: event rep ffff880845ccda90 status 0 opcode 8 length 4294936584
[242031.688340] RPC: rpcrdma_event_process: event rep (null) status 0 opcode 0 length 4294936584
[242031.688473] RPC: rpcrdma_event_process: event rep ffff88085436a000 status 0 opcode 80 length 48
[242031.688479] RPC: rpcrdma_reply_handler: reply 0xffff88085436a000 completes request 0xffff88082b81c000
[242031.688479] RPC request 0xffff880853255600 xid 0x9b95eddf
[242031.688484] RPC: rpcrdma_count_chunks: chunk 308@0x82b81e3f4:0xe006730f
[242031.688486] RPC: rpcrdma_reply_handler: xprt_complete_rqst(0xffff880846752000, 0xffff880853255600, 308)
[242031.688489] RPC: 110 xid dfed959b complete (308 bytes received)
[242031.688492] RPC: 110 __rpc_wake_up_task (now 4355352568)
[242031.688493] RPC: 110 disabling timer
[242031.688496] RPC: 110 removed from queue ffff880846752258 "xprt_pending"
[242031.688500] RPC: __rpc_wake_up_task done
[242031.688544] RPC: 110 __rpc_execute flags=0x881
[242031.688547] RPC: 110 call_status (status 308)
[242031.688549] RPC: 110 call_decode (status 308)
[242031.688551] RPC: 110 validating UNIX cred ffff881050ef20c0
[242031.688554] RPC: 110 using AUTH_UNIX cred ffff881050ef20c0 to unwrap rpc data
[242031.688560] RPC: 110 call_decode result 0
[242031.688563] RPC: wake_up_first(ffff881052515e98 "NFSv4.0 transport Slot table")
[242031.688566] RPC: 110 return 0, status 0
[242031.688568] RPC: 110 release task
[242031.688570] RPC: wake_up_first(ffff880846752190 "xprt_sending")
[242031.688573] RPC: xprt_rdma_free: called on 0xffff88085436a000
[242031.688580] RPC: rpcrdma_event_process: event rep ffff880845ccda90 status 0 opcode 7 length 4294936584
[242031.688586] RPC: 110 release request ffff880853255600
[242031.688588] RPC: wake_up_first(ffff880846752320 "xprt_backlog")
[242031.688590] RPC: rpc_release_client(ffff881052517200)
[242031.688604] RPC: 110 freeing task
[242031.688657] RPC: new task initialized, procpid 12386
[242031.688660] RPC: allocated task ffff88085145fb38
[242031.688672] RPC: 111 __rpc_execute flags=0x1
[242031.688678] RPC: 111 call_start nfs4 proc READ (async)
[242031.688681] RPC: 111 call_reserve (status 0)
[242031.688686] RPC: 111 reserved req ffff880853255600 xid e0ed959b
[242031.688689] RPC: 111 call_reserveresult (status 0)
[242031.688692] RPC: 111 call_refresh (status 0)
[242031.688695] RPC: 111 refreshing UNIX cred ffff881050ef20c0
[242031.688697] RPC: 111 call_refreshresult (status 0)
[242031.688699] RPC: 111 call_allocate (status 0)
[242031.688705] RPC: xprt_rdma_allocate: size 684, request 0xffff88082bea8000
[242031.688707] RPC: 111 call_bind (status 0)
[242031.688709] RPC: 111 call_connect xprt ffff880846752000 is connected
[242031.688711] RPC: 111 call_transmit (status 0)
[242031.688713] RPC: 111 xprt_prepare_transmit
[242031.688715] RPC: 111 xprt_cwnd_limited cong = 0 cwnd = 4096
[242031.688717] RPC: 111 rpc_xdr_encode (status 0)
[242031.688719] RPC: 111 marshaling UNIX cred ffff881050ef20c0
[242031.688721] RPC: 111 using AUTH_UNIX cred ffff881050ef20c0 to wrap rpc data
[242031.688724] RPC: 111 xprt_transmit(152)
[242031.688727] RPC: rpcrdma_inline_pullup: pad 0 destp 0xffff88082bea97f8 len 152 hdrlen 152
[242031.688731] RPC: rpcrdma_register_frmr_external: Using frmr ffff880845ccd040 to map 1 segments
[242031.688734] RPC: rpcrdma_create_chunks: write chunk elem 4096@0x8460d0000:0xe006310f (more)
[242031.688737] RPC: rpcrdma_register_frmr_external: Using frmr ffff880845ccda68 to map 1 segments
[242031.688739] RPC: rpcrdma_event_process: event rep ffff880845ccd040 status 0 opcode 8 length 4294936584
[242031.688744] RPC: rpcrdma_create_chunks: write chunk elem 152@0x82bea9974:0xe006720d (last)
[242031.688747] RPC: rpcrdma_marshal_req: write chunk: hdrlen 68 rpclen 152 padlen 0 headerp 0xffff88082bea9100 base 0xffff88082bea9760 lkey 0x8000
[242031.688749] RPC: rpcrdma_event_process: event rep ffff880845ccda68 status 0 opcode 8 length 4294936584
[242031.688753] RPC: 111 xmit complete
[242031.688755] RPC: 111 sleep_on(queue "xprt_pending" time 4355352568)
[242031.688757] RPC: 111 added to queue ffff880846752258 "xprt_pending"
[242031.688759] RPC: 111 setting alarm for 60000 ms
[242031.688762] RPC: wake_up_first(ffff880846752190 "xprt_sending")
[242031.688876] RPC: rpcrdma_event_process: event rep ffff88085436a000 status 0 opcode 80 length 112
[242031.688881] RPC: rpcrdma_reply_handler: reply 0xffff88085436a000 completes request 0xffff88082bea8000
[242031.688881] RPC request 0xffff880853255600 xid 0x9b95ede0
[242031.688884] RPC: rpcrdma_count_chunks: chunk 4096@0x8460d0000:0xe006310f
[242031.688887] RPC: rpcrdma_inline_fixup: srcp 0xffff88085436a094 len 60 hdrlen 60
[242031.688890] RPC: rpcrdma_reply_handler: xprt_complete_rqst(0xffff880846752000, 0xffff880853255600, 4156)
[242031.688892] RPC: 111 xid e0ed959b complete (4156 bytes received)
[242031.688894] RPC: 111 __rpc_wake_up_task (now 4355352568)
[242031.688896] RPC: 111 disabling timer
[242031.688898] RPC: 111 removed from queue ffff880846752258 "xprt_pending"
[242031.688900] RPC: __rpc_wake_up_task done
[242031.688905] RPC: 111 __rpc_execute flags=0x801
[242031.688907] RPC: 111 call_status (status 4156)
[242031.688908] RPC: 111 call_decode (status 4156)
[242031.688910] RPC: 111 validating UNIX cred ffff881050ef20c0
[242031.688913] RPC: 111 using AUTH_UNIX cred ffff881050ef20c0 to unwrap rpc data
[242031.688915] RPC: 111 call_decode result 0
[242031.688918] RPC: wake_up_first(ffff881052515e98 "NFSv4.0 transport Slot table")
[242031.688921] RPC: 111 return 0, status 0
[242031.688922] RPC: 111 release task
[242031.688924] RPC: wake_up_first(ffff880846752190 "xprt_sending")
[242031.688927] RPC: xprt_rdma_free: called on 0xffff88085436a000
[242031.688934] RPC: rpcrdma_event_process: event rep ffff880845ccd040 status 0 opcode 7 length 4294936584
[242031.688937] RPC: rpcrdma_event_process: event rep ffff880845ccda68 status 0 opcode 7 length 4294936584
[242031.688941] RPC: 111 release request ffff880853255600
[242031.688943] RPC: wake_up_first(ffff880846752320 "xprt_backlog")
[242031.688945] RPC: rpc_release_client(ffff881052517200)
[242031.689159] RPC: new task initialized, procpid 12386
[242031.689162] RPC: allocated task ffff880845c32c00
[242031.689173] RPC: 112 __rpc_execute flags=0x81
[242031.689177] RPC: 112 call_start nfs4 proc CLOSE (async)
[242031.689179] RPC: 112 call_reserve (status 0)
[242031.689182] RPC: 112 reserved req ffff880853255600 xid e1ed959b
[242031.689184] RPC: 112 call_reserveresult (status 0)
[242031.689185] RPC: 112 call_refresh (status 0)
[242031.689188] RPC: 112 refreshing UNIX cred ffff881050ef20c0
[242031.689190] RPC: 112 call_refreshresult (status 0)
[242031.689192] RPC: 112 call_allocate (status 0)
[242031.689196] RPC: xprt_rdma_allocate: size 3244 too large for buffer[1024]: prog 100003 vers 4 proc 1
[242031.689201] RPC: xprt_rdma_allocate: size 3244, request 0xffff88082b81c000
[242031.689203] RPC: 112 call_bind (status 0)
[242031.689205] RPC: 112 call_connect xprt ffff880846752000 is connected
[242031.689207] RPC: 112 call_transmit (status 0)
[242031.689208] RPC: 112 xprt_prepare_transmit
[242031.689210] RPC: 112 xprt_cwnd_limited cong = 0 cwnd = 4096
[242031.689213] RPC: 112 rpc_xdr_encode (status 0)
[242031.689214] RPC: 112 marshaling UNIX cred ffff881050ef20c0
[242031.689217] RPC: 112 using AUTH_UNIX cred ffff881050ef20c0 to wrap rpc data
[242031.689220] RPC: 112 xprt_transmit(160)
[242031.689223] RPC: rpcrdma_inline_pullup: pad 0 destp 0xffff88082b81d800 len 160 hdrlen 160
[242031.689226] RPC: rpcrdma_register_frmr_external: Using frmr ffff880845ccc5f0 to map 1 segments
[242031.689229] RPC: rpcrdma_create_chunks: reply chunk elem 2760@0x82b81d944:0xe005ef0f (last)
[242031.689233] RPC: rpcrdma_marshal_req: reply chunk: hdrlen 48 rpclen 160 padlen 0 headerp 0xffff88082b81d100 base 0xffff88082b81d760 lkey 0x8000
[242031.689235] RPC: 112 xmit complete
[242031.689238] RPC: 112 sleep_on(queue "xprt_pending" time 4355352568)
[242031.689240] RPC: 112 added to queue ffff880846752258 "xprt_pending"
[242031.689241] RPC: 112 setting alarm for 60000 ms
[242031.689244] RPC: wake_up_first(ffff880846752190 "xprt_sending")
[242031.689262] RPC: rpcrdma_event_process: event rep ffff880845ccc5f0 status 0 opcode 8 length 4294936584
[242031.689417] RPC: rpcrdma_event_process: event rep ffff88085436a000 status 0 opcode 80 length 48
[242031.689424] RPC: rpcrdma_reply_handler: reply 0xffff88085436a000 completes request 0xffff88082b81c000
[242031.689424] RPC request 0xffff880853255600 xid 0x9b95ede1
[242031.689428] RPC: rpcrdma_count_chunks: chunk 132@0x82b81d944:0xe005ef0f
[242031.689431] RPC: rpcrdma_reply_handler: xprt_complete_rqst(0xffff880846752000, 0xffff880853255600, 132)
[242031.689433] RPC: 112 xid e1ed959b complete (132 bytes received)
[242031.689436] RPC: 112 __rpc_wake_up_task (now 4355352568)
[242031.689438] RPC: 112 disabling timer
[242031.689440] RPC: 112 removed from queue ffff880846752258 "xprt_pending"
[242031.689445] RPC: __rpc_wake_up_task done
[242031.689452] RPC: 112 __rpc_execute flags=0x881
[242031.689454] RPC: 112 call_status (status 132)
[242031.689456] RPC: 112 call_decode (status 132)
[242031.689459] RPC: 112 validating UNIX cred ffff881050ef20c0
[242031.689461] RPC: 112 using AUTH_UNIX cred ffff881050ef20c0 to unwrap rpc data
[242031.689467] RPC: 112 call_decode result 0
[242031.689470] RPC: wake_up_first(ffff881052515e98 "NFSv4.0 transport Slot table")
[242031.689474] RPC: 112 return 0, status 0
[242031.689476] RPC: 112 release task
[242031.689478] RPC: wake_up_first(ffff880846752190 "xprt_sending")
[242031.689481] RPC: xprt_rdma_free: called on 0xffff88085436a000
[242031.689489] RPC: rpcrdma_event_process: event rep ffff880845ccc5f0 status 0 opcode 7 length 4294936584
[242031.689493] RPC: 112 release request ffff880853255600
[242031.689495] RPC: wake_up_first(ffff880846752320 "xprt_backlog")
[242031.689497] RPC: rpc_release_client(ffff881052517200)
[242031.689523] RPC: 112 freeing task
Here's the mount:
# mount | grep rdma
10.18.3.58:/export/1 on /mnt/ib type nfs (rw,rdma,port=20049,vers=4,addr=10.18.3.58,clientaddr=10.18.3.31)