Re: Client application single value KV Put high latency using multiple threads (pthread)


ping.wong@...
 
Edited

Hi Johann,

I have the control interface on 10Gbps Ethernet and the data plane interface is on 100Gbps Ethernet.
 
Per your recommendation, I tried ofi+tcp;ofi_rxm; however, the client application failed (marked with ******).  
 
Server1 - connected ofi+tcp;ofi_rxm
 
DEBUG 01:02:25.452378 mgmt_system.go:183: processing 1 join requests
DEBUG 01:02:25.458189 mgmt_system.go:255: updated system member: rank 0, uri ofi+tcp;ofi_rxm://11.11.200.46:31416, Joined->Joined
daos_io_server:0 DAOS I/O server (v1.1.2.1) process 215563 started on rank 0 with 4 target, 2 helper XS, firstcore 0, host test46.autocache.com.
 
Server2 - conected  ofi+tcp;ofi_rxm
 
DEBUG 01:02:09.275423 raft.go:204: no known peers, aborting election:
DEBUG 01:02:09.911677 instance_drpc.go:66: DAOS I/O Server instance 0 drpc ready: uri:"ofi+tcp;ofi_rxm://11.11.200.48:31416" nctxs:7 drpcListenerSock:"/tmp/daos_sockets/daos_io_server_28178.sock" ntgts:4
DEBUG 01:02:09.914435 system.go:155: DAOS system join request: sys:"daos_server" uuid:"e32fcef5-c6c4-491f-a25b-f21ae4d3a75f" rank:1 uri:"ofi+tcp;ofi_rxm://11.11.200.48:31416" nctxs:7 addr:"0.0.0.0:10001" srvFaultDomain:"/test48.sdmsl.net"
DEBUG 01:02:09.915330 rpc.go:213: request hosts: [test46:10001 test48:10001 test62:10001]
daos_io_server:0 DAOS I/O server (v1.1.2.1) process 28178 started on rank 1 with 4 target, 2 helper XS, firstcore 1, host test48.sdmsl.net.
 
Client Failed
=================
DAOS Flat KV test..
=================
[==========] Running 1 test(s).
setup: creating pool, SCM size=4 GB, NVMe size=16 GB
setup: created pool a9177073-f014-477b-9ad1-5fe36d334f07
setup: connecting to pool
daos_pool_connect failed, rc: -1020                                *******************************
[  FAILED  ] GROUP SETUP
[  ERROR   ] DAOS KV API tests
state not set, likely due to group-setup issue
[==========] 0 test(s) run.
[  PASSED  ] 0 test(s).
daos_fini() failed with -1001               
 
This is part of the client log (with errors):
 
02/03-01:04:24.76 test62 DAOS[29842/29842] mgmt DBUG src/mgmt/cli_mgmt.c:192 fill_sys_info() GetAttachInfo Provider: ofi+tcp;ofi_rxm, Interface: enp24s0f0, Domain: enp24s0f0,CRT_CTX_SHARE_ADDR: 0, CRT_TIMEOUT: 0
                                                                 ...
 
02/03-01:04:32.78 test62 DAOS[29842/29842] external ERR  # NA -- Error -- /home/ssgroot/git/daos/build/external/dev/mercury/src/na/na_ofi.c:3431
 # na_ofi_addr_lookup(): Unrecognized provider type found from: sockets://11.11.200.48:31416
02/03-01:04:32.78 test62 DAOS[29842/29842] external ERR  # HG -- Error -- /home/ssgroot/git/daos/build/external/dev/mercury/src/mercury_core.c:1220
 # hg_core_addr_lookup(): Could not lookup address ofi+sockets://11.11.200.48:31416 (NA_INVALID_ARG)
02/03-01:04:32.78 test62 DAOS[29842/29842] external ERR  # HG -- Error -- /home/ssgroot/git/daos/build/external/dev/mercury/src/mercury_core.c:3850
 # HG_Core_addr_lookup2(): Could not lookup address
02/03-01:04:32.78 test62 DAOS[29842/29842] external ERR  # HG -- Error -- /home/ssgroot/git/daos/build/external/dev/mercury/src/mercury.c:1490
 # HG_Addr_lookup2(): Could not lookup ofi+sockets://11.11.200.48:31416 (HG_INVALID_ARG) ************************************************************************************************
02/03-01:04:32.78 test62 DAOS[29842/29842] rpc  ERR  src/cart/crt_rpc.c:1038 crt_req_hg_addr_lookup() HG_Addr_lookup2() failed. uri=ofi+sockets://11.11.200.48:31416, hg_ret=11 **********************************
02/03-01:04:32.78 test62 DAOS[29842/29842] rpc  ERR  src/cart/crt_rpc.c:1133 crt_req_send_internal() crt_req_hg_addr_lookup() failed, rc -1020, opc: 0x1010003.
02/03-01:04:32.78 test62 DAOS[29842/29842] rpc  ERR  src/cart/crt_rpc.c:1234 crt_req_send(0x1f7ea90) [opc=0x1010003 (DAOS) rpcid=0x636fb8e100000000 rank:tag=1:0] crt_req_send_internal() failed, DER_HG(-1020): 'Transport layer mercury error'
02/03-01:04:32.78 test62 DAOS[29842/29842] rpc  DBUG src/cart/crt_rpc.c:1580 timeout_bp_node_exit(0x1f7ea90) [opc=0x1010003 rpcid=0x636fb8e100000000 rank:tag=1:0] exiting the timeout binheap.
02/03-01:04:32.78 test62 DAOS[29842/29842] rpc  DBUG src/cart/crt_context.c:629 crt_req_timeout_untrack(0x1f7ea90) [opc=0x1010003 rpcid=0x636fb8e100000000 rank:tag=1:0] decref to 4.
02/03-01:04:32.78 test62 DAOS[29842/29842] rpc  DBUG src/cart/crt_context.c:1017 crt_context_req_untrack(0x1f7ea90) [opc=0x1010003 rpcid=0x636fb8e100000000 rank:tag=1:0] decref to 3.
02/03-01:04:32.78 test62 DAOS[29842/29842] rpc  ERR  src/cart/crt_context.c:309 crt_rpc_complete(0x1f7ea90) [opc=0x1010003 (DAOS) rpcid=0x636fb8e100000000 rank:tag=1:0] failed, DER_HG(-1020): 'Transport layer mercury error'
02/03-01:04:32.78 test62 DAOS[29842/29842] rpc  DBUG src/cart/crt_context.c:316 crt_rpc_complete(0x1f7ea90) [opc=0x1010003 rpcid=0x636fb8e100000000 rank:tag=1:0] Invoking RPC callback (rank 1 tag 0) rc: DER_HG(-1020): 'Transport layer mercury error'
02/03-01:04:32.78 test62 DAOS[29842/29842] rpc  DBUG src/cart/crt_context.c:321 crt_rpc_complete(0x1f7ea90) [opc=0x1010003 rpcid=0x636fb8e100000000 rank:tag=1:0] decref to 2.
02/03-01:04:32.78 test62 DAOS[29842/29842] rpc  DBUG src/cart/crt_rpc.c:1260 crt_req_send(0x1f7ea90) [opc=0x1010003 rpcid=0x636fb8e100000000 rank:tag=1:0] decref to 1.
02/03-01:04:32.78 test62 DAOS[29842/29842] mgmt DBUG src/mgmt/cli_mgmt.c:808 dc_mgmt_get_pool_svc_ranks() a9177073: daos_rpc_send_wait() failed, DER_HG(-1020): 'Transport layer mercury error'
02/03-01:04:32.78 test62 DAOS[29842/29842] rpc  DBUG src/cart/crt_rpc.c:537 crt_req_decref(0x1f7ea90) [opc=0x1010003 rpcid=0x636fb8e100000000 rank:tag=1:0] decref to 0.
02/03-01:04:32.78 test62 DAOS[29842/29842] hg   DBUG src/cart/crt_hg.c:971 crt_hg_req_destroy(0x1f7ea90) [opc=0x1010003 rpcid=0x636fb8e100000000 rank:tag=1:0] destroying
 
02/03-01:04:32.78 test62 DAOS[29842/29842] crt  ERR  src/cart/crt_init.c:537 crt_finalize() cannot finalize, current ctx_num(1).    ***********************************
02/03-01:04:32.78 test62 DAOS[29842/29842] crt  ERR  src/cart/crt_init.c:596 crt_finalize() crt_finalize failed, rc: -1001.
02/03-01:04:32.78 test62 DAOS[29842/29842] client ERR  src/client/api/event.c:147 daos_eq_lib_fini() failed to shutdown crt: DER_NO_PERM(-1001): 'Operation not permitted'
02/03-01:04:32.78 test62 DAOS[29842/29842] client ERR  src/client/api/init.c:267 daos_fini() failed to finalize eq: DER_NO_PERM(-1001): 'Operation not permitted' ******************

I cannot find any documentation in the Deployment Guide about ofi+tcp;ofi_rxm settings on the server side and on the client side.   Perhaps, I missed some settings in some .yml file.


Thanks
Ping


 

Join daos@daos.groups.io to automatically receive all group messages.