Hi Ping,
Sorry, I should have provided more details in my previous email. After switching to
ofi+tcp;ofi_rxm in the config file, you will have to reformat and restart the agent since we don’t support live provider change yet. It would be great if you could provide me with the output of “daos pool autotest” with both ofi+sockets
and ofi+tcp;ofi_rxm so that I can compare it with results that I have on my side with 40Gbps.
Cheers,
Johann
From:
<daos@daos.groups.io> on behalf of "ping.wong via groups.io" <ping.wong@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Wednesday 3 February 2021 at 08:13
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: Re: [daos] Client application single value KV Put high latency using multiple threads (pthread)
[Edited Message Follows]
Hi Johann,
I have the control interface on 10Gbps Ethernet and the data plane interface is on 100Gbps Ethernet.
Per your recommendation, I tried ofi+tcp;ofi_rxm; however, the client application failed (marked with ******).
Server1 - connected ofi+tcp;ofi_rxm
DEBUG 01:02:25.452378 mgmt_system.go:183: processing 1 join requests
DEBUG 01:02:25.458189 mgmt_system.go:255: updated system member: rank 0, uri ofi+tcp;ofi_rxm://11.11.200.46:31416, Joined->Joined
daos_io_server:0 DAOS I/O server (v1.1.2.1) process 215563 started on rank 0 with 4 target, 2 helper XS, firstcore 0, host test46.autocache.com.
Server2 - conected ofi+tcp;ofi_rxm
DEBUG 01:02:09.275423 raft.go:204: no known peers, aborting election:
DEBUG 01:02:09.911677 instance_drpc.go:66: DAOS I/O Server instance 0 drpc ready: uri:"ofi+tcp;ofi_rxm://11.11.200.48:31416" nctxs:7 drpcListenerSock:"/tmp/daos_sockets/daos_io_server_28178.sock" ntgts:4
DEBUG 01:02:09.914435 system.go:155: DAOS system join request: sys:"daos_server" uuid:"e32fcef5-c6c4-491f-a25b-f21ae4d3a75f" rank:1 uri:"ofi+tcp;ofi_rxm://11.11.200.48:31416" nctxs:7 addr:"0.0.0.0:10001" srvFaultDomain:"/test48.sdmsl.net"
DEBUG 01:02:09.915330 rpc.go:213: request hosts: [test46:10001 test48:10001 test62:10001]
daos_io_server:0 DAOS I/O server (v1.1.2.1) process 28178 started on rank 1 with 4 target, 2 helper XS, firstcore 1, host test48.sdmsl.net.
[==========] Running 1 test(s).
setup: creating pool, SCM size=4 GB, NVMe size=16 GB
setup: created pool a9177073-f014-477b-9ad1-5fe36d334f07
setup: connecting to pool
daos_pool_connect failed, rc: -1020 *******************************
[ ERROR ] DAOS KV API tests
state not set, likely due to group-setup issue
[==========] 0 test(s) run.
daos_fini() failed with -1001
This is part of the client log (with errors):
02/03-01:04:24.76 test62 DAOS[29842/29842] mgmt DBUG src/mgmt/cli_mgmt.c:192 fill_sys_info() GetAttachInfo Provider: ofi+tcp;ofi_rxm, Interface: enp24s0f0, Domain: enp24s0f0,CRT_CTX_SHARE_ADDR: 0, CRT_TIMEOUT:
0
02/03-01:04:32.78 test62 DAOS[29842/29842] external ERR # NA -- Error -- /home/ssgroot/git/daos/build/external/dev/mercury/src/na/na_ofi.c:3431
# na_ofi_addr_lookup(): Unrecognized provider type found from: sockets://11.11.200.48:31416
02/03-01:04:32.78 test62 DAOS[29842/29842] external ERR # HG -- Error -- /home/ssgroot/git/daos/build/external/dev/mercury/src/mercury_core.c:1220
# hg_core_addr_lookup(): Could not lookup address ofi+sockets://11.11.200.48:31416 (NA_INVALID_ARG)
02/03-01:04:32.78 test62 DAOS[29842/29842] external ERR # HG -- Error -- /home/ssgroot/git/daos/build/external/dev/mercury/src/mercury_core.c:3850
# HG_Core_addr_lookup2(): Could not lookup address
02/03-01:04:32.78 test62 DAOS[29842/29842] external ERR # HG -- Error -- /home/ssgroot/git/daos/build/external/dev/mercury/src/mercury.c:1490
# HG_Addr_lookup2(): Could not lookup ofi+sockets://11.11.200.48:31416 (HG_INVALID_ARG) ************************************************************************************************
02/03-01:04:32.78 test62 DAOS[29842/29842] rpc ERR src/cart/crt_rpc.c:1038 crt_req_hg_addr_lookup() HG_Addr_lookup2() failed. uri=ofi+sockets://11.11.200.48:31416, hg_ret=11 **********************************
02/03-01:04:32.78 test62 DAOS[29842/29842] rpc ERR src/cart/crt_rpc.c:1133 crt_req_send_internal() crt_req_hg_addr_lookup() failed, rc -1020, opc: 0x1010003.
02/03-01:04:32.78 test62 DAOS[29842/29842] rpc ERR src/cart/crt_rpc.c:1234 crt_req_send(0x1f7ea90) [opc=0x1010003 (DAOS) rpcid=0x636fb8e100000000 rank:tag=1:0] crt_req_send_internal() failed, DER_HG(-1020):
'Transport layer mercury error'
02/03-01:04:32.78 test62 DAOS[29842/29842] rpc DBUG src/cart/crt_rpc.c:1580 timeout_bp_node_exit(0x1f7ea90) [opc=0x1010003 rpcid=0x636fb8e100000000 rank:tag=1:0] exiting the timeout binheap.
02/03-01:04:32.78 test62 DAOS[29842/29842] rpc DBUG src/cart/crt_context.c:629 crt_req_timeout_untrack(0x1f7ea90) [opc=0x1010003 rpcid=0x636fb8e100000000 rank:tag=1:0] decref to 4.
02/03-01:04:32.78 test62 DAOS[29842/29842] rpc DBUG src/cart/crt_context.c:1017 crt_context_req_untrack(0x1f7ea90) [opc=0x1010003 rpcid=0x636fb8e100000000 rank:tag=1:0] decref to 3.
02/03-01:04:32.78 test62 DAOS[29842/29842] rpc ERR src/cart/crt_context.c:309 crt_rpc_complete(0x1f7ea90) [opc=0x1010003 (DAOS) rpcid=0x636fb8e100000000 rank:tag=1:0] failed, DER_HG(-1020): 'Transport layer
mercury error'
02/03-01:04:32.78 test62 DAOS[29842/29842] rpc DBUG src/cart/crt_context.c:316 crt_rpc_complete(0x1f7ea90) [opc=0x1010003 rpcid=0x636fb8e100000000 rank:tag=1:0] Invoking RPC callback (rank 1 tag 0) rc: DER_HG(-1020):
'Transport layer mercury error'
02/03-01:04:32.78 test62 DAOS[29842/29842] rpc DBUG src/cart/crt_context.c:321 crt_rpc_complete(0x1f7ea90) [opc=0x1010003 rpcid=0x636fb8e100000000 rank:tag=1:0] decref to 2.
02/03-01:04:32.78 test62 DAOS[29842/29842] rpc DBUG src/cart/crt_rpc.c:1260 crt_req_send(0x1f7ea90) [opc=0x1010003 rpcid=0x636fb8e100000000 rank:tag=1:0] decref to 1.
02/03-01:04:32.78 test62 DAOS[29842/29842] mgmt DBUG src/mgmt/cli_mgmt.c:808 dc_mgmt_get_pool_svc_ranks() a9177073: daos_rpc_send_wait() failed, DER_HG(-1020): 'Transport layer mercury error'
02/03-01:04:32.78 test62 DAOS[29842/29842] rpc DBUG src/cart/crt_rpc.c:537 crt_req_decref(0x1f7ea90) [opc=0x1010003 rpcid=0x636fb8e100000000 rank:tag=1:0] decref to 0.
02/03-01:04:32.78 test62 DAOS[29842/29842] hg DBUG src/cart/crt_hg.c:971 crt_hg_req_destroy(0x1f7ea90) [opc=0x1010003 rpcid=0x636fb8e100000000 rank:tag=1:0] destroying
02/03-01:04:32.78 test62 DAOS[29842/29842] crt ERR src/cart/crt_init.c:537 crt_finalize() cannot finalize, current ctx_num(1). ***********************************
02/03-01:04:32.78 test62 DAOS[29842/29842] crt ERR src/cart/crt_init.c:596 crt_finalize() crt_finalize failed, rc: -1001.
02/03-01:04:32.78 test62 DAOS[29842/29842] client ERR src/client/api/event.c:147 daos_eq_lib_fini() failed to shutdown crt: DER_NO_PERM(-1001): 'Operation not permitted'
02/03-01:04:32.78 test62 DAOS[29842/29842] client ERR src/client/api/init.c:267 daos_fini() failed to finalize eq: DER_NO_PERM(-1001): 'Operation not permitted' ******************
I cannot find any documentation in the Deployment Guide about ofi+tcp;ofi_rxm settings on the server side and on the client side. Perhaps, I missed some settings in some .yml file.
Thanks
Ping