Quantcast
Channel: Mellanox Interconnect Community: Message List
Viewing all articles
Browse latest Browse all 6230

Cannot direct ping between two ConnectX-3 cards

$
0
0

Hi,

 

I have two ConnectX-3 FDR Infiniband + 40Gige cards in separate machine where both are directly connected (Machine A:Port 1 Machine B:port 2). Both machine is equip with Ubuntu 12.04. I am actually trying to run MPI application using Infiniband cable instead of TCP/IP normal cable. I have manage to install the driver and set up the IP address for the ports by following this article. But I'm having problem to ping between this two machine using the IP that I had set. if I use "Ping" command it will return as follow.

 

root@gpu0:/# ping 172.31.128.53
PING 172.31.128.53 (172.31.128.53) 56(84) bytes of data.
From 172.31.128.51 icmp_seq=1 Destination Host Unreachable
From 172.31.128.51 icmp_seq=2 Destination Host Unreachable
From 172.31.128.51 icmp_seq=3 Destination Host Unreachable

 

while if using ibping ( ibping -G 0xf4521403007f6082 ) it will just hangs there. How to debug this problem? here is some info about the both machine setup.

 

Ibstat

root@gpu0:/# ibstat
CA 'mlx4_0'
    CA type: MT4099
    Number of ports: 2
    Firmware version: 2.30.3110
    Hardware version: 1
    Node GUID: 0xf4521403007f6060
    System image GUID: 0xf4521403007f6063
    Port 1:
        State: Active
        Physical state: LinkUp
        Rate: 56
        Base lid: 1
        LMC: 0
        SM lid: 1
        Capability mask: 0x0251486a
        Port GUID: 0xf4521403007f6061
        Link layer: InfiniBand
    Port 2:
        State: Down
        Physical state: Disabled
        Rate: 10
        Base lid: 1
        LMC: 0
        SM lid: 1
        Capability mask: 0x0251486a
        Port GUID: 0xf4521403007f6062
        Link layer: InfiniBand


root@gpu1:/# ibstat
CA 'mlx4_0'
    CA type: MT4099
    Number of ports: 2
    Firmware version: 2.30.3110
    Hardware version: 1
    Node GUID: 0xf4521403007f6080
    System image GUID: 0xf4521403007f6083
    Port 1:
        State: Down
        Physical state: Disabled
        Rate: 10
        Base lid: 0
        LMC: 0
        SM lid: 0
        Capability mask: 0x02514868
        Port GUID: 0xf4521403007f6081
        Link layer: InfiniBand
    Port 2:
        State: Active
        Physical state: LinkUp
        Rate: 56
        Base lid: 2
        LMC: 0
        SM lid: 1
        Capability mask: 0x02514868
        Port GUID: 0xf4521403007f6082
        Link layer: InfiniBand

Ifconfig

root@gpu0:/# ifconfig

ib0       Link encap:UNSPEC  HWaddr A0-00-01-00-FE-80-00-00-00-00-00-00-00-00-00-00 
          inet6 addr: fe80::f652:1403:7f:6061/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:2044  Metric:1
          RX packets:49 errors:0 dropped:0 overruns:0 frame:0
          TX packets:233 errors:0 dropped:33 overruns:0 carrier:0
          collisions:0 txqueuelen:1024
          RX bytes:10933 (10.9 KB)  TX bytes:33254 (33.2 KB)

 

ib1       Link encap:UNSPEC  HWaddr A0-00-01-10-FE-80-00-00-00-00-00-00-00-00-00-00 
          inet addr:172.31.128.51  Bcast:172.31.128.255  Mask:255.255.255.0
          UP BROADCAST MULTICAST  MTU:2044  Metric:1
          RX packets:104 errors:0 dropped:0 overruns:0 frame:0
          TX packets:114 errors:0 dropped:5 overruns:0 carrier:0
          collisions:0 txqueuelen:1024
          RX bytes:14695 (14.6 KB)  TX bytes:16489 (16.4 KB)


root@gpu1:/# ifconfig
ib0       Link encap:UNSPEC  HWaddr A0-00-01-00-FE-80-00-00-00-00-00-00-00-00-00-00 
          inet addr:172.31.128.53  Bcast:172.31.128.255  Mask:255.255.255.0
          UP BROADCAST MULTICAST  MTU:4092  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1024
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

 

ib1       Link encap:UNSPEC  HWaddr A0-00-01-10-FE-80-00-00-00-00-00-00-00-00-00-00 
          inet6 addr: fe80::f652:1403:7f:6082/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:2044  Metric:1
          RX packets:262 errors:0 dropped:0 overruns:0 frame:0
          TX packets:159 errors:0 dropped:31 overruns:0 carrier:0
          collisions:0 txqueuelen:1024
          RX bytes:31927 (31.9 KB)  TX bytes:27854 (27.8 KB)

route

root@gpu0:/# route
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
default         10.1.32.254     0.0.0.0         UG    0      0        0 eth0
10.1.32.0       *               255.255.255.0   U     1      0        0 eth0
link-local      *               255.255.0.0     U     1000   0        0 eth0
172.31.128.0    *               255.255.255.0   U     0      0        0 ib1
192.168.122.0   *               255.255.255.0   U     0      0        0 virbr0


root@gpu1:/# route
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
172.31.128.0    *               255.255.255.0   U     0      0        0 ib0
192.168.122.0   *               255.255.255.0   U     0      0        0 virbr0

 

Notes: Besides of this problem, i also face a problem to start opensm where it always hangs. Instead of starts the opensm I starts the opensmd. Im not sure it is the same thing or what.


Amirul


Viewing all articles
Browse latest Browse all 6230


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>