FW: [External] [gpfsug-discuss] IO500 SC20 Call for Submission
Lombardi, Johann
FYI. If you need any help, feel free to ping us 😊
Call for IO500 SubmissionDeadline: 30 October 2020 AoE Stabilization period: 1st October -- 9th October 2020 AoE The IO500 is now accepting and encouraging submissions for the upcoming 7th IO500 list, to be revealed at the IO500 Virtual BOF during SC20. Once again, we are also accepting submissions to the 10 Node I/O Challenge to encourage submission of small scale results. The new ranked lists will be announced at our Virtual SC20 BoF. We hope to see you, and your results, there. A new change for the upcoming submission procedure is the introduction of a stabilization period that aims to harden the benchmark. The final benchmark is released at the end of this period. During the stabilization we encourage the community to test the proper execution of the benchmark and provide us with feedback. We will apply bug fixes to the code base and expect that results obtained will be valid as full submission. We also continue with another list for the Student Cluster Competition, since IO500 is used during this competition. Also new this year is that we have partnered with Anthony Kougkas’ team at Illinois Institute of Technology to evaluate the submission metadata describing the storage system on which the test was run to improve the quality and usefulness of the data IO500 collects. You may be contacted by one of his students to clarify one or more of the metadata items from your submission(s). We would appreciate, but do not require, your cooperation to help improve the submission metadata quality. Results from their work will be fed back to improve our submission process for future lists. The IO500 benchmark suite is designed to be easy to run, and the community has multiple active support channels to help with any questions. Please submit results from your system, and we look forward to seeing many of you at SC20! Please note that submissions of all sizes are welcome, including multiple submissions from different storage systems/tiers at a single site. The website has customizable sorting so it is possible to submit on a small system and still get a very good per-client score, for example. Additionally, the list is about much more than just the raw rank; all submissions help the community by collecting and publishing a wider corpus of data. More details below. Following the success of the Top500 in collecting and analyzing historical trends in supercomputer technology and evolution, the IO500 was created in 2017, published its first list at SC17, and has grown continuously since then. The need for such an initiative has long been known within High-Performance Computing; however, defining appropriate benchmarks had long been challenging. Despite this challenge, the community, after long and spirited discussion, finally reached consensus on a suite of benchmarks and a metric for resolving the scores into a single ranking. The multi-fold goals of the benchmark suite are as follows: 1. Maximizing simplicity in running the benchmark suite 2. Encouraging complexity in tuning for performance 3. Allowing submitters to highlight their “hero run” performance numbers 4. Forcing submitters to simultaneously report performance for challenging IO patterns. Specifically, the benchmark suite includes a hero-run of both IOR and mdtest configured however possible to maximize performance and establish an upper-bound for performance. It also includes an IOR and mdtest run with highly prescribed parameters in an attempt to determine a lower-bound on the performance. Finally, it includes a namespace search, as this has been determined to be a highly sought-after feature in HPC storage systems that have historically not been well-measured. Submitters are encouraged to share their tuning insights for publication. The goals of the community are also multi-fold: 1. Gather historical data for the sake of analysis and to aid predictions of storage futures 2. Collect tuning information to share valuable performance optimizations across the community 3. Encourage vendors and designers to optimize for workloads beyond “hero runs” 4. Establish bounded expectations for users, procurers, and administrators 10 Node I/O ChallengeThe 10 Node Challenge is conducted using the regular IO500 benchmark, however, with the rule that exactly 10 client nodes must be used to run the benchmark. You may use any shared storage with, e.g., any number of servers. When submitting for the IO500 list, you can opt-in for “Participate in the 10 compute node challenge only”, then we will not include the results into the ranked list. Other 10-node node submissions will be included in the full list and in the ranked list. We will announce the result in a separate derived list and in the full list but not on the ranked IO500 list at https://io500.org/ Birds-of-a-featherOnce again, we encourage you to submit [1], to join our community, and to attend our virtual BoF “The IO500 and the Virtual Institute of I/O” at SC20, where we will announce the new IO500 list, the 10 node challenge list, and the Student Cluster Competition list. We look forward to answering any questions or concerns you might have. · [1] http://www.vi4io.org/io500/submission
Thanks,
The IO500 Committee <committee@...>
--------------------------------------------------------------------- This e-mail and any attachments may contain confidential material for
|
||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||
Re: Webex meeting changed: DAOS User Group 2020 (DUG'20)
Lombardi, Johann
Really sorry for the spam, but I used the wrong timezone when creating the invite ☹ The last one that was sent to the mailing list (attached for reference) is the correct one.
Cheers, Johann
From:
<daos@daos.groups.io> on behalf of "Lombardi, Johann" <johann.lombardi@...>
Hi there,
Please find below the Webex meeting invite for the DUG’20. The agenda will be posted there: https://wiki.hpdd.intel.com/display/DC/DUG20
Cheers, Johann --------------------------------------------------------------------- This e-mail and any attachments may contain confidential material for
|
||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||
Webex meeting changed: DAOS User Group 2020 (DUG'20)
Johann Lombardi <messenger@...>
|
||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||
Re: Webex meeting changed: DAOS User Group 2020 (DUG'20)
Lombardi, Johann
Hi there,
Please find below the Webex meeting invite for the DUG’20. The agenda will be posted there: https://wiki.hpdd.intel.com/display/DC/DUG20
Cheers, Johann
From:
<daos@daos.groups.io> on behalf of "Johann Lombardi via groups.io" <messenger@...>
--------------------------------------------------------------------- This e-mail and any attachments may contain confidential material for
|
||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||
Webex meeting changed: DAOS User Group 2020 (DUG'20)
Johann Lombardi <messenger@...>
|
||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||
Webex meeting invitation: DAOS User Group 2020 (DUG'20)
Johann Lombardi <messenger@...>
|
||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||
Re: DAOS with NVMe-over-Fabrics
anton.brekhov@...
On Thu, Sep 17, 2020 at 12:56 AM, Lombardi, Johann wrote:
adrfam:IPv4 traddr:10.9.1.118 trsvcid:4420 subnqn:test I've tried to change daos_nvme.conf in runtime of daos server and before starting due to connect disk through rdma. In both ways I cannot see it in daos_system. nvme discover see exported disk. When SPDK taking this disk in system? Or should I write this in other files? My daos_nvme.conf TransportID "trtype:PCIe traddr:0000:b1:00.0" Nvme_apache512_0 TransportID "trtype:PCIe traddr:0000:b2:00.0" Nvme_apache512_1 TransportID "trtype:PCIe traddr:0000:b3:00.0" Nvme_apache512_2 TransportID "trtype:PCIe traddr:0000:b4:00.0" Nvme_apache512_3 TransportID "trtype:rdma adrfam:IPv4 traddr:10.0.1.2 trsvcid:4420 subnqn:nvme-subsystem-name" Nvme_apache512_4 RetryCount 4 TimeoutUsec 0 ActionOnTimeout None AdminPollRate 100000 HotplugEnable No HotplugPollRate 0 [root@apache512 ~]# nvme discover -t rdma -a 10.0.1.2 -s 4420
Discovery Log Number of Records 1, Generation counter 2 =====Discovery Log Entry 0====== trtype: rdma adrfam: ipv4 subtype: nvme subsystem treq: not specified, sq flow control disable supported portid: 1 trsvcid: 4420 subnqn: nvme-subsystem-name traddr: 10.0.1.2 rdma_prtype: not specified rdma_qptype: connected rdma_cms: rdma-cm
rdma_pkey: 0x0000
|
||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||
Re: Error attempting to mount via DFUSE
Pittman, Ashley M
This specific issue is different appears to be at the dfuse/fuse kernel module level. I note you’re using a newer fuse driver than I am (7.31 vs 7.23) and also a newer libfuse3 as evidenced by the “Unknown flags 0x3000000” error. Purely from the output you’ve included the we are not explicitly disabling SPLICE_READ although we should not be using it, however the error from fuse seems to indicate that it’s use is attempted.
We do have Ubuntu 20.04 in our CI so I know we test on this, I’ll see if I can find any CI results or if not try Ubuntu 20.04 on a test machine here.
Ashley.
From:
<daos@daos.groups.io> on behalf of Gert Pauwels <gert.pauwels@...>
I'm experiencing (most likely) the same issue since a week or more on the master branch.
--------------------------------------------------------------------- This e-mail and any attachments may contain confidential material for
|
||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||
Re: Error attempting to mount via DFUSE
Pittman, Ashley M
This case is different, the mount point here has not been used to create a container so dfuse is attempting to use the pool and container on the command line. The error reported is a DAOS error rather than a dfuse one, but I suspect this is a server error, either because the wrong svc value was specified or because on of the servers isn’t running or contactable.
Ashley,
From:
<daos@daos.groups.io> on behalf of Peter <magpiesaresoawesome@...>
Thanks for the response, I tried again with: --------------------------------------------------------------------- This e-mail and any attachments may contain confidential material for
|
||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||
Re: Error attempting to mount via DFUSE
Pittman, Ashley M
Hi,
This is the same issue as Gert hit last week, specifically that you’re providing the pool/container UUIDs twice to dfuse, one via the path and once on the command line. The fix in this case would not be to provide –pool or –container options on the command line.
It would however made sense for dfuse to support this usage where the uuids match, so I’ve filed DAOS-5778 to allow this.
Ashley.
From:
<daos@daos.groups.io> on behalf of Peter <magpiesaresoawesome@...>
Hello!
Peter --------------------------------------------------------------------- This e-mail and any attachments may contain confidential material for
|
||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||
Re: Error attempting to mount via DFUSE
I'm experiencing (most likely) the same issue since a week or more on the master branch. When --pool and --cont ar not specified and --mountpoint pointing to the path specified when creating the container, you get the same output as above. Calling dfuse --mountpoint=/mnt/mycontainer --svc=0 --foreground gives the same output as as dfuse --pool=fe6475b3-74dc-464f-b8b0-cac50778a9f9 --cont=0912dece-00e1-4e8f-8d7f-ac63603f52ac --mountpoint=/mnt/1 --svc=0 --foreground Gert,
|
||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||
Re: Error attempting to mount via DFUSE
Peter
Thanks for the response, I tried again with:
|
||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||
Re: Error attempting to mount via DFUSE
Hello!
Could you try using the mountpoint different from container path? Afaik, the path given to `daos cont create` is used as an alias to the container. Yunjae
|
||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||
Error attempting to mount via DFUSE
Peter
Hello!
Peter
|
||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||
Re: pool creation failed in recent master commits
Zhang, Jiafu
The issue is gone after adopting Kennth’s suggestion to set “crt_timeout: 1200” in global area of daos_server.yml instead of under “servers/env_vars”.
@Cain, Kenneth C, thanks!
From: Zhang, Jiafu
Sent: Tuesday, September 29, 2020 8:29 AM To: 'daos@daos.groups.io' <daos@daos.groups.io> Subject: RE: [daos] pool creation failed in recent master commits
The most recent worked commit I can track is 681b827527a0587d8496d3adbbd77a175370766c (Feb 28).
From: Zhang, Jiafu
I just recalled that I re-opened the ticket on Aug 10. The issue has been existed for long time. Please see detailed info in the ticket.
From: daos@daos.groups.io <daos@daos.groups.io>
On Behalf Of Oganezov, Alexander A
Hi Jiafu,
What was the previous commit that you know of that works in your setup?
Thanks, ~~Alex.
From: daos@daos.groups.io <daos@daos.groups.io>
On Behalf Of Zhang, Jiafu
Hi Guys,
I failed to create pool with recent master commits back to 6726e272e2a0e821c0676838c39a2b133a7e0612 (9th Sep). The error in terminal is,
Pool-create command FAILED: pool create failed: rpc error: code = DeadlineExceeded desc = context deadline exceeded ERROR: dmg: pool create failed: rpc error: code = DeadlineExceeded desc = context deadline exceeded.
After enabling debug, I didn’t see more valuable info, but below error about timedout.
09/28-17:33:42.25 DAOS[285589/285602] swim ERR src/cart/swim/swim.c:659 swim_progress() The progress callback was not called for too long: 11515 ms after expected. 09/28-17:33:42.25 DAOS[285589/285602] rdb WARN src/rdb/rdb_raft.c:1980 rdb_timerd() 64616f73[0]: not scheduled for 12.683030 second 09/28-17:33:42.29 DAOS[285589/285602] mgmt ERR src/mgmt/srv_pool.c:515 ds_mgmt_create_pool() creating pool on ranks cf7aa844 failed: rc DER_TIMEDOUT(-1011) 09/28-17:33:42.29 DAOS[285589/285602] mgmt ERR src/mgmt/srv_drpc.c:496 ds_mgmt_drpc_pool_create() failed to create pool: DER_TIMEDOUT(-1011) 09/28-17:33:42.29 DAOS[285589/285603] daos INFO src/iosrv/drpc_progress.c:409 process_session_activity() Session 664 connection has been terminated 09/28-17:33:42.29 DAOS[285589/285603] daos INFO src/common/drpc.c:717 drpc_close() Closing dRPC socket fd=664 09/28-17:33:43.80 DAOS[285589/285602] daos INFO src/iosrv/drpc_progress.c:295 drpc_handler_ult() dRPC handler ULT for module=2 method=207 09/28-17:33:43.80 DAOS[285589/285602] mgmt INFO src/mgmt/srv_drpc.c:468 ds_mgmt_drpc_pool_create() Received request to create pool 09/28-17:34:43.80 DAOS[285589/285602] rpc ERR src/cart/crt_context.c:790 crt_context_timeout_check(0x7f61017447d0) [opc=0x1010007 rpcid=0x32444975000000ba rank:tag=1:0] ctx_id 0, (status: 0x38) timed out, tgt rank 1, tag 0 09/28-17:34:43.80 DAOS[285589/285602] rpc ERR src/cart/crt_context.c:748 crt_req_timeout_hdlr(0x7f61017447d0) [opc=0x1010007 rpcid=0x32444975000000ba rank:tag=1:0] aborting to group daos_server, rank 1, tgt_uri (null) 09/28-17:34:43.80 DAOS[285589/285602] hg ERR src/cart/crt_hg.c:1031 crt_hg_req_send_cb(0x7f61017447d0) [opc=0x1010007 rpcid=0x32444975000000ba rank:tag=1:0] RPC failed; rc: -1011 09/28-17:34:43.80 DAOS[285589/285602] corpc ERR src/cart/crt_corpc.c:646 crt_corpc_reply_hdlr() RPC(opc: 0x1010007) error, rc: -1011.
Any idea?
Thanks.
|
||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||
Re: pool creation failed in recent master commits
Cain, Kenneth C
Hello Jaifu,
Can you try to set the server RPC timeout by using the daos_server.yml file crt_timeout setting (and not using the env_vars section with the CRT_TIMEOUT variable)? See daos/utils/config/daos_server.yml. And take a look at the daos_io_server log near the beginning with the dump_envariables() output (looking for the CRT_TIMEOUT value printed)? I think a change has been made on the daos server to configure RPC timeouts using this new crt_timeout interface. I suspect your configuration tries to set the CRT_TIMEOUT environment variable using the env_vars section of daos_server.yml and it is not taking effect, resulting in pool create timeouts in all cases.
The master commit 68ddb557753cf4bbf657347d28baa7bed15d09ef (Aug 10) and later should be useful for large pool creates if they do happen to fail due to timeouts.
Thanks,
Ken
From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of
Zhang, Jiafu
Sent: Monday, September 28, 2020 8:29 PM To: daos@daos.groups.io Subject: Re: [daos] pool creation failed in recent master commits
The most recent worked commit I can track is 681b827527a0587d8496d3adbbd77a175370766c (Feb 28).
From: Zhang, Jiafu
I just recalled that I re-opened the ticket on Aug 10. The issue has been existed for long time. Please see detailed info in the ticket.
From: daos@daos.groups.io <daos@daos.groups.io>
On Behalf Of Oganezov, Alexander A
Hi Jiafu,
What was the previous commit that you know of that works in your setup?
Thanks, ~~Alex.
From: daos@daos.groups.io <daos@daos.groups.io>
On Behalf Of Zhang, Jiafu
Hi Guys,
I failed to create pool with recent master commits back to 6726e272e2a0e821c0676838c39a2b133a7e0612 (9th Sep). The error in terminal is,
Pool-create command FAILED: pool create failed: rpc error: code = DeadlineExceeded desc = context deadline exceeded ERROR: dmg: pool create failed: rpc error: code = DeadlineExceeded desc = context deadline exceeded.
After enabling debug, I didn’t see more valuable info, but below error about timedout.
09/28-17:33:42.25 DAOS[285589/285602] swim ERR src/cart/swim/swim.c:659 swim_progress() The progress callback was not called for too long: 11515 ms after expected. 09/28-17:33:42.25 DAOS[285589/285602] rdb WARN src/rdb/rdb_raft.c:1980 rdb_timerd() 64616f73[0]: not scheduled for 12.683030 second 09/28-17:33:42.29 DAOS[285589/285602] mgmt ERR src/mgmt/srv_pool.c:515 ds_mgmt_create_pool() creating pool on ranks cf7aa844 failed: rc DER_TIMEDOUT(-1011) 09/28-17:33:42.29 DAOS[285589/285602] mgmt ERR src/mgmt/srv_drpc.c:496 ds_mgmt_drpc_pool_create() failed to create pool: DER_TIMEDOUT(-1011) 09/28-17:33:42.29 DAOS[285589/285603] daos INFO src/iosrv/drpc_progress.c:409 process_session_activity() Session 664 connection has been terminated 09/28-17:33:42.29 DAOS[285589/285603] daos INFO src/common/drpc.c:717 drpc_close() Closing dRPC socket fd=664 09/28-17:33:43.80 DAOS[285589/285602] daos INFO src/iosrv/drpc_progress.c:295 drpc_handler_ult() dRPC handler ULT for module=2 method=207 09/28-17:33:43.80 DAOS[285589/285602] mgmt INFO src/mgmt/srv_drpc.c:468 ds_mgmt_drpc_pool_create() Received request to create pool 09/28-17:34:43.80 DAOS[285589/285602] rpc ERR src/cart/crt_context.c:790 crt_context_timeout_check(0x7f61017447d0) [opc=0x1010007 rpcid=0x32444975000000ba rank:tag=1:0] ctx_id 0, (status: 0x38) timed out, tgt rank 1, tag 0 09/28-17:34:43.80 DAOS[285589/285602] rpc ERR src/cart/crt_context.c:748 crt_req_timeout_hdlr(0x7f61017447d0) [opc=0x1010007 rpcid=0x32444975000000ba rank:tag=1:0] aborting to group daos_server, rank 1, tgt_uri (null) 09/28-17:34:43.80 DAOS[285589/285602] hg ERR src/cart/crt_hg.c:1031 crt_hg_req_send_cb(0x7f61017447d0) [opc=0x1010007 rpcid=0x32444975000000ba rank:tag=1:0] RPC failed; rc: -1011 09/28-17:34:43.80 DAOS[285589/285602] corpc ERR src/cart/crt_corpc.c:646 crt_corpc_reply_hdlr() RPC(opc: 0x1010007) error, rc: -1011.
Any idea?
Thanks.
|
||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||
Re: pool creation failed in recent master commits
Zhang, Jiafu
The most recent worked commit I can track is 681b827527a0587d8496d3adbbd77a175370766c (Feb 28).
From: Zhang, Jiafu
Sent: Tuesday, September 29, 2020 8:25 AM To: daos@daos.groups.io Subject: RE: [daos] pool creation failed in recent master commits
I just recalled that I re-opened the ticket on Aug 10. The issue has been existed for long time. Please see detailed info in the ticket.
From: daos@daos.groups.io <daos@daos.groups.io>
On Behalf Of Oganezov, Alexander A
Hi Jiafu,
What was the previous commit that you know of that works in your setup?
Thanks, ~~Alex.
From: daos@daos.groups.io <daos@daos.groups.io>
On Behalf Of Zhang, Jiafu
Hi Guys,
I failed to create pool with recent master commits back to 6726e272e2a0e821c0676838c39a2b133a7e0612 (9th Sep). The error in terminal is,
Pool-create command FAILED: pool create failed: rpc error: code = DeadlineExceeded desc = context deadline exceeded ERROR: dmg: pool create failed: rpc error: code = DeadlineExceeded desc = context deadline exceeded.
After enabling debug, I didn’t see more valuable info, but below error about timedout.
09/28-17:33:42.25 DAOS[285589/285602] swim ERR src/cart/swim/swim.c:659 swim_progress() The progress callback was not called for too long: 11515 ms after expected. 09/28-17:33:42.25 DAOS[285589/285602] rdb WARN src/rdb/rdb_raft.c:1980 rdb_timerd() 64616f73[0]: not scheduled for 12.683030 second 09/28-17:33:42.29 DAOS[285589/285602] mgmt ERR src/mgmt/srv_pool.c:515 ds_mgmt_create_pool() creating pool on ranks cf7aa844 failed: rc DER_TIMEDOUT(-1011) 09/28-17:33:42.29 DAOS[285589/285602] mgmt ERR src/mgmt/srv_drpc.c:496 ds_mgmt_drpc_pool_create() failed to create pool: DER_TIMEDOUT(-1011) 09/28-17:33:42.29 DAOS[285589/285603] daos INFO src/iosrv/drpc_progress.c:409 process_session_activity() Session 664 connection has been terminated 09/28-17:33:42.29 DAOS[285589/285603] daos INFO src/common/drpc.c:717 drpc_close() Closing dRPC socket fd=664 09/28-17:33:43.80 DAOS[285589/285602] daos INFO src/iosrv/drpc_progress.c:295 drpc_handler_ult() dRPC handler ULT for module=2 method=207 09/28-17:33:43.80 DAOS[285589/285602] mgmt INFO src/mgmt/srv_drpc.c:468 ds_mgmt_drpc_pool_create() Received request to create pool 09/28-17:34:43.80 DAOS[285589/285602] rpc ERR src/cart/crt_context.c:790 crt_context_timeout_check(0x7f61017447d0) [opc=0x1010007 rpcid=0x32444975000000ba rank:tag=1:0] ctx_id 0, (status: 0x38) timed out, tgt rank 1, tag 0 09/28-17:34:43.80 DAOS[285589/285602] rpc ERR src/cart/crt_context.c:748 crt_req_timeout_hdlr(0x7f61017447d0) [opc=0x1010007 rpcid=0x32444975000000ba rank:tag=1:0] aborting to group daos_server, rank 1, tgt_uri (null) 09/28-17:34:43.80 DAOS[285589/285602] hg ERR src/cart/crt_hg.c:1031 crt_hg_req_send_cb(0x7f61017447d0) [opc=0x1010007 rpcid=0x32444975000000ba rank:tag=1:0] RPC failed; rc: -1011 09/28-17:34:43.80 DAOS[285589/285602] corpc ERR src/cart/crt_corpc.c:646 crt_corpc_reply_hdlr() RPC(opc: 0x1010007) error, rc: -1011.
Any idea?
Thanks.
|
||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||
Re: pool creation failed in recent master commits
Zhang, Jiafu
I just recalled that I re-opened the ticket on Aug 10. The issue has been existed for long time. Please see detailed info in the ticket.
From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of
Oganezov, Alexander A
Sent: Tuesday, September 29, 2020 5:33 AM To: daos@daos.groups.io Subject: Re: [daos] pool creation failed in recent master commits
Hi Jiafu,
What was the previous commit that you know of that works in your setup?
Thanks, ~~Alex.
From: daos@daos.groups.io <daos@daos.groups.io>
On Behalf Of Zhang, Jiafu
Hi Guys,
I failed to create pool with recent master commits back to 6726e272e2a0e821c0676838c39a2b133a7e0612 (9th Sep). The error in terminal is,
Pool-create command FAILED: pool create failed: rpc error: code = DeadlineExceeded desc = context deadline exceeded ERROR: dmg: pool create failed: rpc error: code = DeadlineExceeded desc = context deadline exceeded.
After enabling debug, I didn’t see more valuable info, but below error about timedout.
09/28-17:33:42.25 DAOS[285589/285602] swim ERR src/cart/swim/swim.c:659 swim_progress() The progress callback was not called for too long: 11515 ms after expected. 09/28-17:33:42.25 DAOS[285589/285602] rdb WARN src/rdb/rdb_raft.c:1980 rdb_timerd() 64616f73[0]: not scheduled for 12.683030 second 09/28-17:33:42.29 DAOS[285589/285602] mgmt ERR src/mgmt/srv_pool.c:515 ds_mgmt_create_pool() creating pool on ranks cf7aa844 failed: rc DER_TIMEDOUT(-1011) 09/28-17:33:42.29 DAOS[285589/285602] mgmt ERR src/mgmt/srv_drpc.c:496 ds_mgmt_drpc_pool_create() failed to create pool: DER_TIMEDOUT(-1011) 09/28-17:33:42.29 DAOS[285589/285603] daos INFO src/iosrv/drpc_progress.c:409 process_session_activity() Session 664 connection has been terminated 09/28-17:33:42.29 DAOS[285589/285603] daos INFO src/common/drpc.c:717 drpc_close() Closing dRPC socket fd=664 09/28-17:33:43.80 DAOS[285589/285602] daos INFO src/iosrv/drpc_progress.c:295 drpc_handler_ult() dRPC handler ULT for module=2 method=207 09/28-17:33:43.80 DAOS[285589/285602] mgmt INFO src/mgmt/srv_drpc.c:468 ds_mgmt_drpc_pool_create() Received request to create pool 09/28-17:34:43.80 DAOS[285589/285602] rpc ERR src/cart/crt_context.c:790 crt_context_timeout_check(0x7f61017447d0) [opc=0x1010007 rpcid=0x32444975000000ba rank:tag=1:0] ctx_id 0, (status: 0x38) timed out, tgt rank 1, tag 0 09/28-17:34:43.80 DAOS[285589/285602] rpc ERR src/cart/crt_context.c:748 crt_req_timeout_hdlr(0x7f61017447d0) [opc=0x1010007 rpcid=0x32444975000000ba rank:tag=1:0] aborting to group daos_server, rank 1, tgt_uri (null) 09/28-17:34:43.80 DAOS[285589/285602] hg ERR src/cart/crt_hg.c:1031 crt_hg_req_send_cb(0x7f61017447d0) [opc=0x1010007 rpcid=0x32444975000000ba rank:tag=1:0] RPC failed; rc: -1011 09/28-17:34:43.80 DAOS[285589/285602] corpc ERR src/cart/crt_corpc.c:646 crt_corpc_reply_hdlr() RPC(opc: 0x1010007) error, rc: -1011.
Any idea?
Thanks.
|
||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||
Re: pool creation failed in recent master commits
Oganezov, Alexander A
Hi Jiafu,
What was the previous commit that you know of that works in your setup?
Thanks, ~~Alex.
From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of
Zhang, Jiafu
Sent: Monday, September 28, 2020 3:05 AM To: daos@daos.groups.io Subject: [daos] pool creation failed in recent master commits
Hi Guys,
I failed to create pool with recent master commits back to 6726e272e2a0e821c0676838c39a2b133a7e0612 (9th Sep). The error in terminal is,
Pool-create command FAILED: pool create failed: rpc error: code = DeadlineExceeded desc = context deadline exceeded ERROR: dmg: pool create failed: rpc error: code = DeadlineExceeded desc = context deadline exceeded.
After enabling debug, I didn’t see more valuable info, but below error about timedout.
09/28-17:33:42.25 DAOS[285589/285602] swim ERR src/cart/swim/swim.c:659 swim_progress() The progress callback was not called for too long: 11515 ms after expected. 09/28-17:33:42.25 DAOS[285589/285602] rdb WARN src/rdb/rdb_raft.c:1980 rdb_timerd() 64616f73[0]: not scheduled for 12.683030 second 09/28-17:33:42.29 DAOS[285589/285602] mgmt ERR src/mgmt/srv_pool.c:515 ds_mgmt_create_pool() creating pool on ranks cf7aa844 failed: rc DER_TIMEDOUT(-1011) 09/28-17:33:42.29 DAOS[285589/285602] mgmt ERR src/mgmt/srv_drpc.c:496 ds_mgmt_drpc_pool_create() failed to create pool: DER_TIMEDOUT(-1011) 09/28-17:33:42.29 DAOS[285589/285603] daos INFO src/iosrv/drpc_progress.c:409 process_session_activity() Session 664 connection has been terminated 09/28-17:33:42.29 DAOS[285589/285603] daos INFO src/common/drpc.c:717 drpc_close() Closing dRPC socket fd=664 09/28-17:33:43.80 DAOS[285589/285602] daos INFO src/iosrv/drpc_progress.c:295 drpc_handler_ult() dRPC handler ULT for module=2 method=207 09/28-17:33:43.80 DAOS[285589/285602] mgmt INFO src/mgmt/srv_drpc.c:468 ds_mgmt_drpc_pool_create() Received request to create pool 09/28-17:34:43.80 DAOS[285589/285602] rpc ERR src/cart/crt_context.c:790 crt_context_timeout_check(0x7f61017447d0) [opc=0x1010007 rpcid=0x32444975000000ba rank:tag=1:0] ctx_id 0, (status: 0x38) timed out, tgt rank 1, tag 0 09/28-17:34:43.80 DAOS[285589/285602] rpc ERR src/cart/crt_context.c:748 crt_req_timeout_hdlr(0x7f61017447d0) [opc=0x1010007 rpcid=0x32444975000000ba rank:tag=1:0] aborting to group daos_server, rank 1, tgt_uri (null) 09/28-17:34:43.80 DAOS[285589/285602] hg ERR src/cart/crt_hg.c:1031 crt_hg_req_send_cb(0x7f61017447d0) [opc=0x1010007 rpcid=0x32444975000000ba rank:tag=1:0] RPC failed; rc: -1011 09/28-17:34:43.80 DAOS[285589/285602] corpc ERR src/cart/crt_corpc.c:646 crt_corpc_reply_hdlr() RPC(opc: 0x1010007) error, rc: -1011.
Any idea?
Thanks.
|
||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||
Re: Any method to check object location: SCM or NVMe?
Thanks for the quick reply, Patrick.
I was also wondering how small the I/O size should be to go to SCM rather than NVMe. I'll test performances following your advice. It helped me a lot.
|
||||||||||||||||||||||||||||||||||
|