Hi,
Still nothing... hope this info can be helpful.
I notice that OpenSM must be started on hypervisor host in my case this is S1 otherwise the virtual function's ports are linked up but have state DOWN.
When I start OpenSM (option: PORTS="ALL") all the ports become active (both are cable connected).
I noticed also a few more things:
So far only with ibnetdiscover in virtual system produce system message in hypervisor host:
mlx4_core 0000:04:00.0: slave 2 is trying to execute a Subnet MGMT MAD, class 0x1, method 0x81 for attr 0x11. Rejecting
mlx4_core 0000:04:00.0: vhcr command MAD_IFC (0x24) slave:2 in_param 0x26aaf000 in_mod=0xffff0001, op_mod=0xc failed with error:0, status -1
sminfo command gives the correct OpenSM lid information i.e. give the lid number from OpenSM master:
# sminfo --debug -v
ibwarn: [2843] smp_query_status_via: attr 0x20 mod 0x0 route Lid 1
ibwarn: [2843] _do_madrpc: send failed; Function not implemented
ibwarn: [2843] mad_rpc: _do_madrpc failed; dport (Lid 1)
sminfo: iberror: [pid 2843] main: failed: query
In virtual host I can see message in log:
ibnetdiscover[2755]: segfault at e4 ip 00000031d420a8b6 sp 00007fffc2eee6b8 error 4 in libibmad.so.5.3.1[31d4200000+12000]
and in hypervisor host:
<mlx4_ib> _mlx4_ib_mcg_port_cleanup: _mlx4_ib_mcg_port_cleanup-1102: ff12401bffff000000000000ffffffff (port 2): WARNING: group refcount 1!!! (pointer ffff88083f4fa000)
One more thing:
In virtual machine I started OpenSM with guid point to local port in VF and get those messages:
Jul 24 14:27:09 830432 [FA2C0700] 0x80 -> Entering DISCOVERING state
Using default GUID 0x14050000000002
Jul 24 14:27:09 994036 [FA2C0700] 0x02 -> osm_vendor_bind: Mgmt class 0x81 binding to port GUID 0x14050000000002
Jul 24 14:27:10 398748 [FA2C0700] 0x02 -> osm_vendor_bind: Mgmt class 0x03 binding to port GUID 0x14050000000002
Jul 24 14:27:10 398958 [FA2C0700] 0x02 -> osm_vendor_bind: Mgmt class 0x04 binding to port GUID 0x14050000000002
Jul 24 14:27:10 399371 [FA2C0700] 0x02 -> osm_vendor_bind: Mgmt class 0x21 binding to port GUID 0x14050000000002
Jul 24 14:27:10 399960 [FA2C0700] 0x02 -> osm_opensm_bind: Setting IS_SM on port 0x0014050000000002
Jul 24 14:27:10 400439 [FA2C0700] 0x01 -> osm_vendor_set_sm: ERR 5431: setting IS_SM capmask: cannot open file '/dev/infiniband/issm0': Invalid argument
Jul 24 14:27:10 401700 [F66B8700] 0x01 -> osm_vendor_send: ERR 5430: Send p_madw = 0x7fcfe40008c0 of size 256 TID 0x1234 failed -5 (Invalid argument)
Jul 24 14:27:10 401700 [F66B8700] 0x01 -> sm_mad_ctrl_send_err_cb: ERR 3113: MAD completed in error (IB_ERROR): SubnGet(NodeInfo), attr_mod 0x0, TID 0x1234
Jul 24 14:27:10 401700 [F66B8700] 0x01 -> vl15_send_mad: ERR 3E03: MAD send failed (IB_UNKNOWN_ERROR)
Jul 24 14:27:10 401983 [F5CB7700] 0x01 -> state_mgr_is_sm_port_down: ERR 3308: SM port GUID unknown
Regular linux cat on file /dev/infiniband/issm0 works in hypervisor system at least it's waiting when in VM I get exactly the messages from OpenSM log:
# cat /dev/infiniband/issm0
cat: /dev/infiniband/issm0: Invalid argument
both file on host and VM are the same regarding to access:
VM:
#ls -aZ /dev/infiniband/issm0
crw-rw----. root root system_u:object_r:device_t:s0 /dev/infiniband/issm0
Host:
#ls -lZ /dev/infiniband/issm0
crw-rw----. root root system_u:object_r:device_t:s0 /dev/infiniband/issm0