Topics

pool creation failed in recent master commits


Zhang, Jiafu
 

Hi Guys,

 

I failed to create pool with recent master commits back to 6726e272e2a0e821c0676838c39a2b133a7e0612 (9th Sep). The error in terminal is,

 

Pool-create command FAILED: pool create failed: rpc error: code = DeadlineExceeded desc = context deadline exceeded

ERROR: dmg: pool create failed: rpc error: code = DeadlineExceeded desc = context deadline exceeded.

 

After enabling debug, I didn’t see more valuable info, but below error about timedout.

 

09/28-17:33:42.25  DAOS[285589/285602] swim ERR  src/cart/swim/swim.c:659 swim_progress() The progress callback was not called for too long: 11515 ms after expected.

09/28-17:33:42.25  DAOS[285589/285602] rdb  WARN src/rdb/rdb_raft.c:1980 rdb_timerd() 64616f73[0]: not scheduled for 12.683030 second

09/28-17:33:42.29  DAOS[285589/285602] mgmt ERR  src/mgmt/srv_pool.c:515 ds_mgmt_create_pool() creating pool on ranks cf7aa844 failed: rc DER_TIMEDOUT(-1011)

09/28-17:33:42.29  DAOS[285589/285602] mgmt ERR  src/mgmt/srv_drpc.c:496 ds_mgmt_drpc_pool_create() failed to create pool: DER_TIMEDOUT(-1011)

09/28-17:33:42.29  DAOS[285589/285603] daos INFO src/iosrv/drpc_progress.c:409 process_session_activity() Session 664 connection has been terminated

09/28-17:33:42.29  DAOS[285589/285603] daos INFO src/common/drpc.c:717 drpc_close() Closing dRPC socket fd=664

09/28-17:33:43.80  DAOS[285589/285602] daos INFO src/iosrv/drpc_progress.c:295 drpc_handler_ult() dRPC handler ULT for module=2 method=207

09/28-17:33:43.80  DAOS[285589/285602] mgmt INFO src/mgmt/srv_drpc.c:468 ds_mgmt_drpc_pool_create() Received request to create pool

09/28-17:34:43.80  DAOS[285589/285602] rpc  ERR  src/cart/crt_context.c:790 crt_context_timeout_check(0x7f61017447d0) [opc=0x1010007 rpcid=0x32444975000000ba rank:tag=1:0] ctx_id 0, (status: 0x38) timed out, tgt rank 1, tag 0

09/28-17:34:43.80  DAOS[285589/285602] rpc  ERR  src/cart/crt_context.c:748 crt_req_timeout_hdlr(0x7f61017447d0) [opc=0x1010007 rpcid=0x32444975000000ba rank:tag=1:0] aborting to group daos_server, rank 1, tgt_uri (null)

09/28-17:34:43.80  DAOS[285589/285602] hg   ERR  src/cart/crt_hg.c:1031 crt_hg_req_send_cb(0x7f61017447d0) [opc=0x1010007 rpcid=0x32444975000000ba rank:tag=1:0] RPC failed; rc: -1011

09/28-17:34:43.80  DAOS[285589/285602] corpc ERR  src/cart/crt_corpc.c:646 crt_corpc_reply_hdlr() RPC(opc: 0x1010007) error, rc: -1011.

 

Any idea?

 

Thanks.


Oganezov, Alexander A
 

Hi Jiafu,

 

What was the previous commit that you know of that works in your setup?

 

Thanks,

~~Alex.

 

From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of Zhang, Jiafu
Sent: Monday, September 28, 2020 3:05 AM
To: daos@daos.groups.io
Subject: [daos] pool creation failed in recent master commits

 

Hi Guys,

 

I failed to create pool with recent master commits back to 6726e272e2a0e821c0676838c39a2b133a7e0612 (9th Sep). The error in terminal is,

 

Pool-create command FAILED: pool create failed: rpc error: code = DeadlineExceeded desc = context deadline exceeded

ERROR: dmg: pool create failed: rpc error: code = DeadlineExceeded desc = context deadline exceeded.

 

After enabling debug, I didn’t see more valuable info, but below error about timedout.

 

09/28-17:33:42.25  DAOS[285589/285602] swim ERR  src/cart/swim/swim.c:659 swim_progress() The progress callback was not called for too long: 11515 ms after expected.

09/28-17:33:42.25  DAOS[285589/285602] rdb  WARN src/rdb/rdb_raft.c:1980 rdb_timerd() 64616f73[0]: not scheduled for 12.683030 second

09/28-17:33:42.29  DAOS[285589/285602] mgmt ERR  src/mgmt/srv_pool.c:515 ds_mgmt_create_pool() creating pool on ranks cf7aa844 failed: rc DER_TIMEDOUT(-1011)

09/28-17:33:42.29  DAOS[285589/285602] mgmt ERR  src/mgmt/srv_drpc.c:496 ds_mgmt_drpc_pool_create() failed to create pool: DER_TIMEDOUT(-1011)

09/28-17:33:42.29  DAOS[285589/285603] daos INFO src/iosrv/drpc_progress.c:409 process_session_activity() Session 664 connection has been terminated

09/28-17:33:42.29  DAOS[285589/285603] daos INFO src/common/drpc.c:717 drpc_close() Closing dRPC socket fd=664

09/28-17:33:43.80  DAOS[285589/285602] daos INFO src/iosrv/drpc_progress.c:295 drpc_handler_ult() dRPC handler ULT for module=2 method=207

09/28-17:33:43.80  DAOS[285589/285602] mgmt INFO src/mgmt/srv_drpc.c:468 ds_mgmt_drpc_pool_create() Received request to create pool

09/28-17:34:43.80  DAOS[285589/285602] rpc  ERR  src/cart/crt_context.c:790 crt_context_timeout_check(0x7f61017447d0) [opc=0x1010007 rpcid=0x32444975000000ba rank:tag=1:0] ctx_id 0, (status: 0x38) timed out, tgt rank 1, tag 0

09/28-17:34:43.80  DAOS[285589/285602] rpc  ERR  src/cart/crt_context.c:748 crt_req_timeout_hdlr(0x7f61017447d0) [opc=0x1010007 rpcid=0x32444975000000ba rank:tag=1:0] aborting to group daos_server, rank 1, tgt_uri (null)

09/28-17:34:43.80  DAOS[285589/285602] hg   ERR  src/cart/crt_hg.c:1031 crt_hg_req_send_cb(0x7f61017447d0) [opc=0x1010007 rpcid=0x32444975000000ba rank:tag=1:0] RPC failed; rc: -1011

09/28-17:34:43.80  DAOS[285589/285602] corpc ERR  src/cart/crt_corpc.c:646 crt_corpc_reply_hdlr() RPC(opc: 0x1010007) error, rc: -1011.

 

Any idea?

 

Thanks.


Zhang, Jiafu
 

I just recalled that I re-opened the ticket on Aug 10. The issue has been existed for long time. Please see detailed info in the ticket.

 

From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of Oganezov, Alexander A
Sent: Tuesday, September 29, 2020 5:33 AM
To: daos@daos.groups.io
Subject: Re: [daos] pool creation failed in recent master commits

 

Hi Jiafu,

 

What was the previous commit that you know of that works in your setup?

 

Thanks,

~~Alex.

 

From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of Zhang, Jiafu
Sent: Monday, September 28, 2020 3:05 AM
To: daos@daos.groups.io
Subject: [daos] pool creation failed in recent master commits

 

Hi Guys,

 

I failed to create pool with recent master commits back to 6726e272e2a0e821c0676838c39a2b133a7e0612 (9th Sep). The error in terminal is,

 

Pool-create command FAILED: pool create failed: rpc error: code = DeadlineExceeded desc = context deadline exceeded

ERROR: dmg: pool create failed: rpc error: code = DeadlineExceeded desc = context deadline exceeded.

 

After enabling debug, I didn’t see more valuable info, but below error about timedout.

 

09/28-17:33:42.25  DAOS[285589/285602] swim ERR  src/cart/swim/swim.c:659 swim_progress() The progress callback was not called for too long: 11515 ms after expected.

09/28-17:33:42.25  DAOS[285589/285602] rdb  WARN src/rdb/rdb_raft.c:1980 rdb_timerd() 64616f73[0]: not scheduled for 12.683030 second

09/28-17:33:42.29  DAOS[285589/285602] mgmt ERR  src/mgmt/srv_pool.c:515 ds_mgmt_create_pool() creating pool on ranks cf7aa844 failed: rc DER_TIMEDOUT(-1011)

09/28-17:33:42.29  DAOS[285589/285602] mgmt ERR  src/mgmt/srv_drpc.c:496 ds_mgmt_drpc_pool_create() failed to create pool: DER_TIMEDOUT(-1011)

09/28-17:33:42.29  DAOS[285589/285603] daos INFO src/iosrv/drpc_progress.c:409 process_session_activity() Session 664 connection has been terminated

09/28-17:33:42.29  DAOS[285589/285603] daos INFO src/common/drpc.c:717 drpc_close() Closing dRPC socket fd=664

09/28-17:33:43.80  DAOS[285589/285602] daos INFO src/iosrv/drpc_progress.c:295 drpc_handler_ult() dRPC handler ULT for module=2 method=207

09/28-17:33:43.80  DAOS[285589/285602] mgmt INFO src/mgmt/srv_drpc.c:468 ds_mgmt_drpc_pool_create() Received request to create pool

09/28-17:34:43.80  DAOS[285589/285602] rpc  ERR  src/cart/crt_context.c:790 crt_context_timeout_check(0x7f61017447d0) [opc=0x1010007 rpcid=0x32444975000000ba rank:tag=1:0] ctx_id 0, (status: 0x38) timed out, tgt rank 1, tag 0

09/28-17:34:43.80  DAOS[285589/285602] rpc  ERR  src/cart/crt_context.c:748 crt_req_timeout_hdlr(0x7f61017447d0) [opc=0x1010007 rpcid=0x32444975000000ba rank:tag=1:0] aborting to group daos_server, rank 1, tgt_uri (null)

09/28-17:34:43.80  DAOS[285589/285602] hg   ERR  src/cart/crt_hg.c:1031 crt_hg_req_send_cb(0x7f61017447d0) [opc=0x1010007 rpcid=0x32444975000000ba rank:tag=1:0] RPC failed; rc: -1011

09/28-17:34:43.80  DAOS[285589/285602] corpc ERR  src/cart/crt_corpc.c:646 crt_corpc_reply_hdlr() RPC(opc: 0x1010007) error, rc: -1011.

 

Any idea?

 

Thanks.


Zhang, Jiafu
 

The most recent worked commit I can track is 681b827527a0587d8496d3adbbd77a175370766c (Feb 28).

 

From: Zhang, Jiafu
Sent: Tuesday, September 29, 2020 8:25 AM
To: daos@daos.groups.io
Subject: RE: [daos] pool creation failed in recent master commits

 

I just recalled that I re-opened the ticket on Aug 10. The issue has been existed for long time. Please see detailed info in the ticket.

 

From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of Oganezov, Alexander A
Sent: Tuesday, September 29, 2020 5:33 AM
To: daos@daos.groups.io
Subject: Re: [daos] pool creation failed in recent master commits

 

Hi Jiafu,

 

What was the previous commit that you know of that works in your setup?

 

Thanks,

~~Alex.

 

From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of Zhang, Jiafu
Sent: Monday, September 28, 2020 3:05 AM
To: daos@daos.groups.io
Subject: [daos] pool creation failed in recent master commits

 

Hi Guys,

 

I failed to create pool with recent master commits back to 6726e272e2a0e821c0676838c39a2b133a7e0612 (9th Sep). The error in terminal is,

 

Pool-create command FAILED: pool create failed: rpc error: code = DeadlineExceeded desc = context deadline exceeded

ERROR: dmg: pool create failed: rpc error: code = DeadlineExceeded desc = context deadline exceeded.

 

After enabling debug, I didn’t see more valuable info, but below error about timedout.

 

09/28-17:33:42.25  DAOS[285589/285602] swim ERR  src/cart/swim/swim.c:659 swim_progress() The progress callback was not called for too long: 11515 ms after expected.

09/28-17:33:42.25  DAOS[285589/285602] rdb  WARN src/rdb/rdb_raft.c:1980 rdb_timerd() 64616f73[0]: not scheduled for 12.683030 second

09/28-17:33:42.29  DAOS[285589/285602] mgmt ERR  src/mgmt/srv_pool.c:515 ds_mgmt_create_pool() creating pool on ranks cf7aa844 failed: rc DER_TIMEDOUT(-1011)

09/28-17:33:42.29  DAOS[285589/285602] mgmt ERR  src/mgmt/srv_drpc.c:496 ds_mgmt_drpc_pool_create() failed to create pool: DER_TIMEDOUT(-1011)

09/28-17:33:42.29  DAOS[285589/285603] daos INFO src/iosrv/drpc_progress.c:409 process_session_activity() Session 664 connection has been terminated

09/28-17:33:42.29  DAOS[285589/285603] daos INFO src/common/drpc.c:717 drpc_close() Closing dRPC socket fd=664

09/28-17:33:43.80  DAOS[285589/285602] daos INFO src/iosrv/drpc_progress.c:295 drpc_handler_ult() dRPC handler ULT for module=2 method=207

09/28-17:33:43.80  DAOS[285589/285602] mgmt INFO src/mgmt/srv_drpc.c:468 ds_mgmt_drpc_pool_create() Received request to create pool

09/28-17:34:43.80  DAOS[285589/285602] rpc  ERR  src/cart/crt_context.c:790 crt_context_timeout_check(0x7f61017447d0) [opc=0x1010007 rpcid=0x32444975000000ba rank:tag=1:0] ctx_id 0, (status: 0x38) timed out, tgt rank 1, tag 0

09/28-17:34:43.80  DAOS[285589/285602] rpc  ERR  src/cart/crt_context.c:748 crt_req_timeout_hdlr(0x7f61017447d0) [opc=0x1010007 rpcid=0x32444975000000ba rank:tag=1:0] aborting to group daos_server, rank 1, tgt_uri (null)

09/28-17:34:43.80  DAOS[285589/285602] hg   ERR  src/cart/crt_hg.c:1031 crt_hg_req_send_cb(0x7f61017447d0) [opc=0x1010007 rpcid=0x32444975000000ba rank:tag=1:0] RPC failed; rc: -1011

09/28-17:34:43.80  DAOS[285589/285602] corpc ERR  src/cart/crt_corpc.c:646 crt_corpc_reply_hdlr() RPC(opc: 0x1010007) error, rc: -1011.

 

Any idea?

 

Thanks.


Cain, Kenneth C
 

Hello Jaifu,

 

Can you try to set the server RPC timeout by using the daos_server.yml file crt_timeout setting (and not using the env_vars section with the CRT_TIMEOUT variable)? See daos/utils/config/daos_server.yml. And take a look at the daos_io_server log near the beginning with the dump_envariables() output (looking for the CRT_TIMEOUT value printed)? I think a change has been made on the daos server to configure RPC timeouts using this new crt_timeout interface. I suspect your configuration tries to set the CRT_TIMEOUT environment variable using the env_vars section of daos_server.yml and it is not taking effect, resulting in pool create timeouts in all cases.

 

The master commit 68ddb557753cf4bbf657347d28baa7bed15d09ef (Aug 10) and later should be useful for large pool creates if they do happen to fail due to timeouts.

 

Thanks,

 

Ken

 

From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of Zhang, Jiafu
Sent: Monday, September 28, 2020 8:29 PM
To: daos@daos.groups.io
Subject: Re: [daos] pool creation failed in recent master commits

 

The most recent worked commit I can track is 681b827527a0587d8496d3adbbd77a175370766c (Feb 28).

 

From: Zhang, Jiafu
Sent: Tuesday, September 29, 2020 8:25 AM
To: daos@daos.groups.io
Subject: RE: [daos] pool creation failed in recent master commits

 

I just recalled that I re-opened the ticket on Aug 10. The issue has been existed for long time. Please see detailed info in the ticket.

 

From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of Oganezov, Alexander A
Sent: Tuesday, September 29, 2020 5:33 AM
To: daos@daos.groups.io
Subject: Re: [daos] pool creation failed in recent master commits

 

Hi Jiafu,

 

What was the previous commit that you know of that works in your setup?

 

Thanks,

~~Alex.

 

From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of Zhang, Jiafu
Sent: Monday, September 28, 2020 3:05 AM
To: daos@daos.groups.io
Subject: [daos] pool creation failed in recent master commits

 

Hi Guys,

 

I failed to create pool with recent master commits back to 6726e272e2a0e821c0676838c39a2b133a7e0612 (9th Sep). The error in terminal is,

 

Pool-create command FAILED: pool create failed: rpc error: code = DeadlineExceeded desc = context deadline exceeded

ERROR: dmg: pool create failed: rpc error: code = DeadlineExceeded desc = context deadline exceeded.

 

After enabling debug, I didn’t see more valuable info, but below error about timedout.

 

09/28-17:33:42.25  DAOS[285589/285602] swim ERR  src/cart/swim/swim.c:659 swim_progress() The progress callback was not called for too long: 11515 ms after expected.

09/28-17:33:42.25  DAOS[285589/285602] rdb  WARN src/rdb/rdb_raft.c:1980 rdb_timerd() 64616f73[0]: not scheduled for 12.683030 second

09/28-17:33:42.29  DAOS[285589/285602] mgmt ERR  src/mgmt/srv_pool.c:515 ds_mgmt_create_pool() creating pool on ranks cf7aa844 failed: rc DER_TIMEDOUT(-1011)

09/28-17:33:42.29  DAOS[285589/285602] mgmt ERR  src/mgmt/srv_drpc.c:496 ds_mgmt_drpc_pool_create() failed to create pool: DER_TIMEDOUT(-1011)

09/28-17:33:42.29  DAOS[285589/285603] daos INFO src/iosrv/drpc_progress.c:409 process_session_activity() Session 664 connection has been terminated

09/28-17:33:42.29  DAOS[285589/285603] daos INFO src/common/drpc.c:717 drpc_close() Closing dRPC socket fd=664

09/28-17:33:43.80  DAOS[285589/285602] daos INFO src/iosrv/drpc_progress.c:295 drpc_handler_ult() dRPC handler ULT for module=2 method=207

09/28-17:33:43.80  DAOS[285589/285602] mgmt INFO src/mgmt/srv_drpc.c:468 ds_mgmt_drpc_pool_create() Received request to create pool

09/28-17:34:43.80  DAOS[285589/285602] rpc  ERR  src/cart/crt_context.c:790 crt_context_timeout_check(0x7f61017447d0) [opc=0x1010007 rpcid=0x32444975000000ba rank:tag=1:0] ctx_id 0, (status: 0x38) timed out, tgt rank 1, tag 0

09/28-17:34:43.80  DAOS[285589/285602] rpc  ERR  src/cart/crt_context.c:748 crt_req_timeout_hdlr(0x7f61017447d0) [opc=0x1010007 rpcid=0x32444975000000ba rank:tag=1:0] aborting to group daos_server, rank 1, tgt_uri (null)

09/28-17:34:43.80  DAOS[285589/285602] hg   ERR  src/cart/crt_hg.c:1031 crt_hg_req_send_cb(0x7f61017447d0) [opc=0x1010007 rpcid=0x32444975000000ba rank:tag=1:0] RPC failed; rc: -1011

09/28-17:34:43.80  DAOS[285589/285602] corpc ERR  src/cart/crt_corpc.c:646 crt_corpc_reply_hdlr() RPC(opc: 0x1010007) error, rc: -1011.

 

Any idea?

 

Thanks.


Zhang, Jiafu
 

The issue is gone after adopting Kennth’s suggestion to set “crt_timeout: 1200” in global area of daos_server.yml instead of under “servers/env_vars”.

 

@Cain, Kenneth C, thanks!

 

 

 

From: Zhang, Jiafu
Sent: Tuesday, September 29, 2020 8:29 AM
To: 'daos@daos.groups.io' <daos@daos.groups.io>
Subject: RE: [daos] pool creation failed in recent master commits

 

The most recent worked commit I can track is 681b827527a0587d8496d3adbbd77a175370766c (Feb 28).

 

From: Zhang, Jiafu
Sent: Tuesday, September 29, 2020 8:25 AM
To: daos@daos.groups.io
Subject: RE: [daos] pool creation failed in recent master commits

 

I just recalled that I re-opened the ticket on Aug 10. The issue has been existed for long time. Please see detailed info in the ticket.

 

From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of Oganezov, Alexander A
Sent: Tuesday, September 29, 2020 5:33 AM
To: daos@daos.groups.io
Subject: Re: [daos] pool creation failed in recent master commits

 

Hi Jiafu,

 

What was the previous commit that you know of that works in your setup?

 

Thanks,

~~Alex.

 

From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of Zhang, Jiafu
Sent: Monday, September 28, 2020 3:05 AM
To: daos@daos.groups.io
Subject: [daos] pool creation failed in recent master commits

 

Hi Guys,

 

I failed to create pool with recent master commits back to 6726e272e2a0e821c0676838c39a2b133a7e0612 (9th Sep). The error in terminal is,

 

Pool-create command FAILED: pool create failed: rpc error: code = DeadlineExceeded desc = context deadline exceeded

ERROR: dmg: pool create failed: rpc error: code = DeadlineExceeded desc = context deadline exceeded.

 

After enabling debug, I didn’t see more valuable info, but below error about timedout.

 

09/28-17:33:42.25  DAOS[285589/285602] swim ERR  src/cart/swim/swim.c:659 swim_progress() The progress callback was not called for too long: 11515 ms after expected.

09/28-17:33:42.25  DAOS[285589/285602] rdb  WARN src/rdb/rdb_raft.c:1980 rdb_timerd() 64616f73[0]: not scheduled for 12.683030 second

09/28-17:33:42.29  DAOS[285589/285602] mgmt ERR  src/mgmt/srv_pool.c:515 ds_mgmt_create_pool() creating pool on ranks cf7aa844 failed: rc DER_TIMEDOUT(-1011)

09/28-17:33:42.29  DAOS[285589/285602] mgmt ERR  src/mgmt/srv_drpc.c:496 ds_mgmt_drpc_pool_create() failed to create pool: DER_TIMEDOUT(-1011)

09/28-17:33:42.29  DAOS[285589/285603] daos INFO src/iosrv/drpc_progress.c:409 process_session_activity() Session 664 connection has been terminated

09/28-17:33:42.29  DAOS[285589/285603] daos INFO src/common/drpc.c:717 drpc_close() Closing dRPC socket fd=664

09/28-17:33:43.80  DAOS[285589/285602] daos INFO src/iosrv/drpc_progress.c:295 drpc_handler_ult() dRPC handler ULT for module=2 method=207

09/28-17:33:43.80  DAOS[285589/285602] mgmt INFO src/mgmt/srv_drpc.c:468 ds_mgmt_drpc_pool_create() Received request to create pool

09/28-17:34:43.80  DAOS[285589/285602] rpc  ERR  src/cart/crt_context.c:790 crt_context_timeout_check(0x7f61017447d0) [opc=0x1010007 rpcid=0x32444975000000ba rank:tag=1:0] ctx_id 0, (status: 0x38) timed out, tgt rank 1, tag 0

09/28-17:34:43.80  DAOS[285589/285602] rpc  ERR  src/cart/crt_context.c:748 crt_req_timeout_hdlr(0x7f61017447d0) [opc=0x1010007 rpcid=0x32444975000000ba rank:tag=1:0] aborting to group daos_server, rank 1, tgt_uri (null)

09/28-17:34:43.80  DAOS[285589/285602] hg   ERR  src/cart/crt_hg.c:1031 crt_hg_req_send_cb(0x7f61017447d0) [opc=0x1010007 rpcid=0x32444975000000ba rank:tag=1:0] RPC failed; rc: -1011

09/28-17:34:43.80  DAOS[285589/285602] corpc ERR  src/cart/crt_corpc.c:646 crt_corpc_reply_hdlr() RPC(opc: 0x1010007) error, rc: -1011.

 

Any idea?

 

Thanks.