Re: pool creation failed in recent master commits
Cain, Kenneth C
Can you try to set the server RPC timeout by using the daos_server.yml file crt_timeout setting (and not using the env_vars section with the CRT_TIMEOUT variable)? See daos/utils/config/daos_server.yml. And take a look at the daos_io_server log near the beginning with the dump_envariables() output (looking for the CRT_TIMEOUT value printed)? I think a change has been made on the daos server to configure RPC timeouts using this new crt_timeout interface. I suspect your configuration tries to set the CRT_TIMEOUT environment variable using the env_vars section of daos_server.yml and it is not taking effect, resulting in pool create timeouts in all cases.
The master commit 68ddb557753cf4bbf657347d28baa7bed15d09ef (Aug 10) and later should be useful for large pool creates if they do happen to fail due to timeouts.
From: email@example.com <firstname.lastname@example.org> On Behalf Of Zhang, Jiafu
Sent: Monday, September 28, 2020 8:29 PM
Subject: Re: [daos] pool creation failed in recent master commits
The most recent worked commit I can track is 681b827527a0587d8496d3adbbd77a175370766c (Feb 28).
From: Zhang, Jiafu
I just recalled that I re-opened the ticket on Aug 10. The issue has been existed for long time. Please see detailed info in the ticket.
What was the previous commit that you know of that works in your setup?
I failed to create pool with recent master commits back to 6726e272e2a0e821c0676838c39a2b133a7e0612 (9th Sep). The error in terminal is,
Pool-create command FAILED: pool create failed: rpc error: code = DeadlineExceeded desc = context deadline exceeded
ERROR: dmg: pool create failed: rpc error: code = DeadlineExceeded desc = context deadline exceeded.
After enabling debug, I didn’t see more valuable info, but below error about timedout.
09/28-17:33:42.25 DAOS[285589/285602] swim ERR src/cart/swim/swim.c:659 swim_progress() The progress callback was not called for too long: 11515 ms after expected.
09/28-17:33:42.25 DAOS[285589/285602] rdb WARN src/rdb/rdb_raft.c:1980 rdb_timerd() 64616f73: not scheduled for 12.683030 second
09/28-17:33:42.29 DAOS[285589/285602] mgmt ERR src/mgmt/srv_pool.c:515 ds_mgmt_create_pool() creating pool on ranks cf7aa844 failed: rc DER_TIMEDOUT(-1011)
09/28-17:33:42.29 DAOS[285589/285602] mgmt ERR src/mgmt/srv_drpc.c:496 ds_mgmt_drpc_pool_create() failed to create pool: DER_TIMEDOUT(-1011)
09/28-17:33:42.29 DAOS[285589/285603] daos INFO src/iosrv/drpc_progress.c:409 process_session_activity() Session 664 connection has been terminated
09/28-17:33:42.29 DAOS[285589/285603] daos INFO src/common/drpc.c:717 drpc_close() Closing dRPC socket fd=664
09/28-17:33:43.80 DAOS[285589/285602] daos INFO src/iosrv/drpc_progress.c:295 drpc_handler_ult() dRPC handler ULT for module=2 method=207
09/28-17:33:43.80 DAOS[285589/285602] mgmt INFO src/mgmt/srv_drpc.c:468 ds_mgmt_drpc_pool_create() Received request to create pool
09/28-17:34:43.80 DAOS[285589/285602] rpc ERR src/cart/crt_context.c:790 crt_context_timeout_check(0x7f61017447d0) [opc=0x1010007 rpcid=0x32444975000000ba rank:tag=1:0] ctx_id 0, (status: 0x38) timed out, tgt rank 1, tag 0
09/28-17:34:43.80 DAOS[285589/285602] rpc ERR src/cart/crt_context.c:748 crt_req_timeout_hdlr(0x7f61017447d0) [opc=0x1010007 rpcid=0x32444975000000ba rank:tag=1:0] aborting to group daos_server, rank 1, tgt_uri (null)
09/28-17:34:43.80 DAOS[285589/285602] hg ERR src/cart/crt_hg.c:1031 crt_hg_req_send_cb(0x7f61017447d0) [opc=0x1010007 rpcid=0x32444975000000ba rank:tag=1:0] RPC failed; rc: -1011
09/28-17:34:43.80 DAOS[285589/285602] corpc ERR src/cart/crt_corpc.c:646 crt_corpc_reply_hdlr() RPC(opc: 0x1010007) error, rc: -1011.