Re: Qustion about Questions about data placement
Lombardi, Johann
You can monitor the output of pool query that reports the space usage on PMEM and SSD separately. That been said, we don’t have a metric reporting the total amount of data migrated by aggregation for each pool. We should add that since it can be helpful to differentiate the bandwidth used by regular I/O vs aggregation when analyzing performance issues.
Cheers, Johann
From: <daos@daos.groups.io> on behalf of
段世博 <duanshibo.d@...>
Is there a way to know how much data has been migrated from PMEM to SSD --------------------------------------------------------------------- This e-mail and any attachments may contain confidential material for |
|
Re: Qustion about Questions about data placement
段世博
Is there a way to know how much data has been migrated from PMEM to SSD
|
|
Re: Qustion about Questions about data placement
Lombardi, Johann
Extents smaller than 4KiB that cannot be aggregated with other contiguous extents remain in PMEM and are not migrated to SSDs. As for overwrites, extents that are eventually not readable any longer (i.e., completely overwritten or truncated and no snapshots) are deleted in the background. This is true for both extents on SSDs and PMEM.
Cheers, Johann
From: <daos@daos.groups.io> on behalf of
段世博 <duanshibo.d@...>
Thank you very much for your answer! --------------------------------------------------------------------- This e-mail and any attachments may contain confidential material for |
|
Re: Qustion about Questions about data placement
段世博
Thank you very much for your answer!
I have another question: whether all the data written to PMEM will be written to SSD, for example under Zipfian workload, if the data still in PMEM is overwritten by new writes, will the old data still be written to SSD |
|
Re: Qustion about Questions about data placement
Lombardi, Johann
By default, any contiguous extents strictly smaller than 4KiB are written to SCM and the ones bigger than or equal to 4KiB are written to SSDs. The 4KiB threshold is configurable starting DAOS v2.0 at the pool level via the “policy” property.
$ dmg pool get-prop test | grep placement Tier placement policy (policy) type=io_size $ dmg pool set-prop test policy:type=io_size/th1=16384 pool set-prop succeeded $ dmg pool get-prop test | grep placement Tier placement policy (policy) type=io_size/th1= 16384
HTH
Cheers, Johann --------------------------------------------------------------------- This e-mail and any attachments may contain confidential material for |
|
Re: DAOS 2.3.103
#rocky
#ubuntu
#chat
#docker
#installation
Macdonald, Mjmac
In this case, the core problem is that there is an empty storage tier in the configuration, and the config parser doesn’t handle this correctly. Created https://daosio.atlassian.net/browse/DAOS-12826 to address the defect. Once the empty storage tier is removed, the command should fail with a more sensible error when the system does not have any PMem modules installed.
mjmac
From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of
Lombardi, Johann
Sent: Monday, 6 March, 2023 05:26 To: daos@daos.groups.io; Nabarro, Tom <tom.nabarro@...> Subject: Re: [daos] DAOS 2.3.103 #chat #docker #installation #2.3.103 #rocky #ubuntu
Hi there,
scm prepare is only required when using Optane PMEM. Since you use DRAM, you don’t need to run scm prepare. That being said, it would be great for scm prepare to fail nicely in this case, @Nabarro, Tom?
Cheers, Johann
From: <daos@daos.groups.io> on behalf of "salma.salem@..."
<salma.salem@...>
I'm trying to set up daos 2.3.103 using the EL8 Dockerfile but I ended up with this error when trying to run scm prepare --------------------------------------------------------------------- This e-mail and any attachments may contain confidential material for |
|
Re: DAOS 2.3.103
#rocky
#ubuntu
#chat
#docker
#installation
Lombardi, Johann
Hi there,
scm prepare is only required when using Optane PMEM. Since you use DRAM, you don’t need to run scm prepare. That being said, it would be great for scm prepare to fail nicely in this case, @Nabarro, Tom?
Cheers, Johann
From: <daos@daos.groups.io> on behalf of "salma.salem@..." <salma.salem@...>
I'm trying to set up daos 2.3.103 using the EL8 Dockerfile but I ended up with this error when trying to run scm prepare --------------------------------------------------------------------- This e-mail and any attachments may contain confidential material for |
|
DAOS 2.3.103
#rocky
#ubuntu
#chat
#docker
#installation
salma.salem@...
I'm trying to set up daos 2.3.103 using the EL8 Dockerfile but I ended up with this error when trying to run scm prepare
this is what my configuration file looks like I also tried this with the ubuntu file and stopped at the same point but my focus now is to have the rocky version working. Has anyone encountered this error before? |
|
Qustion about Questions about data placement
段世博
How does DAOS decide whether to write data to SSD or PMEM?
|
|
Re: Fail to create pool
Oganezov, Alexander A
Tianzy,
Good to hear it is working now, however in general reboot should not be needed after applying those sysctl settings; dual_iface_server test also worked before... I wonder if there was something else stale running such as old daos agent on client nodes or some other stale cache somewhere.
~~Alex.
From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of
landen.tian@...
Sent: Saturday, March 4, 2023 8:13 AM To: daos@daos.groups.io Subject: Re: [daos] Fail to create pool
Alex, |
|
Re: Fail to create pool
landen.tian@...
Alex,
For some reasons, my cluster rebooted. After that, I tried what you have said. My issue is fixed! I guess after applying https://docs.daos.io/v2.0/admin/predeployment_check/#setup-for-multiple-network-links, the system should reboot. Thanks, Tianzy |
|
Re: Fail to create pool
Oganezov, Alexander A
Can you change log_mask: debug in daos server yaml file and provide logs from all engines involved? On a client side please provide ofi logs by doing export FI_LOG_LEVEL=warn before running self_test -u
It would be best if you file an issue on https://daosio.atlassian.net/ and attach all logs there.
Thanks, ~~Alex.
From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of
landen.tian@...
Sent: Friday, March 3, 2023 8:44 PM To: daos@daos.groups.io Subject: Re: [daos] Fail to create pool
1、 https://docs.daos.io/v2.0/admin/predeployment_check/#setup-for-multiple-network-links |
|
Re: Fail to create pool
landen.tian@...
1、 https://docs.daos.io/v2.0/admin/predeployment_check/#setup-for-multiple-network-links
Lombardi had pointed out it and I had done it. 2. the test: Any further suggestion? |
|
Re: Fail to create pool
Oganezov, Alexander A
Hi Tian,
I noticed that you are running 2 engines per node. Did you follow this guide in order to set sysctl settings properly? https://docs.daos.io/v2.0/admin/predeployment_check/#setup-for-multiple-network-links
Also can you run the following test on your server nodes? ./install/lib/daos/TESTING/tests/dual_iface_server -p 'ofi+verbs;ofi_rxm' -i 'ib0,ib1' -d 'mlx5_0,mlx5_1'
This is a basic sanity check to ensure you can run dual-interface setup for servers. If successful you are supposed to see something like this in the output:
SRV [rank=1 pid=1678185] Starting server rank=1 SRV [rank=0 pid=1678189] Starting server rank=0 SRV [rank=1 pid=1678185] my_rank=1 uri=ofi+verbs;ofi_rxm://192.168.101.21:32337 SRV [rank=0 pid=1678189] my_rank=0 uri=ofi+verbs;ofi_rxm://192.168.100.21:31337 SRV [rank=1 pid=1678185] Other servers uri is 'ofi+verbs;ofi_rxm://192.168.100.21:31337' SRV [rank=1 pid=1678185] Ping successful to rank=0 tag=0 SRV [rank=0 pid=1678189] Other servers uri is 'ofi+verbs;ofi_rxm://192.168.101.21:32337' SRV [rank=0 pid=1678189] Ping successful to rank=1 tag=0
If this test does not work then its likely still something missing in sysctl network settings preventing libfabric from communicating between servers.
Thanks, ~~Alex.
From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of
landen.tian@...
Sent: Thursday, March 2, 2023 5:21 PM To: daos@daos.groups.io Subject: Re: [daos] Fail to create pool
[Edited Message Follows] It makes progress either, but it still failed. 03/03-08:17:03.29 storage01 DAOS[13113/0/614] daos INFO src/engine/drpc_progress.c:278 drpc_handler_ult() dRPC handler ULT for module=2 method=207 03/03-08:17:03.29 storage01 DAOS[13113/0/614] mgmt INFO src/mgmt/srv_drpc.c:434 ds_mgmt_drpc_pool_create() Received request to create pool on 6 ranks. 03/03-08:17:04.25 storage01 DAOS[13113/0/615] telem INFO src/gurt/telemetry.c:211 new_shmem() creating new shared memory segment, key=0x4f11c3a9, size=172592 03/03-08:17:04.25 storage01 DAOS[13113/0/615] pool INFO src/pool/srv_metrics.c:152 ds_pool_metrics_start() f59eddb9: created metrics for pool 03/03-08:17:04.32 storage01 DAOS[13113/0/7] server ERR src/engine/sched.c:1964 sched_watchdog_post() WATCHDOG: Thread 0x43f230 took 59 ms. symbol:/mnt/nfs/daos/install/bin/daos_engine() [0x43f230] 03/03-08:17:05.97 storage01 DAOS[13113/0/3] rpc ERR src/cart/crt_context.c:939 crt_context_timeout_check(0x7f57c4576a00) [opc=0x102000b (DAOS) rpcid=0x95f87550002353e rank:tag=2:0] ctx_id 0, (status: 0x38) timed out (60 seconds), target (2:0) 03/03-08:17:05.97 storage01 DAOS[13113/0/3] rpc ERR src/cart/crt_context.c:888 crt_req_timeout_hdlr(0x7f57c4576a00) [opc=0x102000b (DAOS) rpcid=0x95f87550002353e rank:tag=2:0] aborting to group daos_server, rank 2, tgt_uri (null) 03/03-08:17:05.97 storage01 DAOS[13113/0/3] rpc ERR src/cart/crt_context.c:939 crt_context_timeout_check(0x7f57c45dc900) [opc=0x102000b (DAOS) rpcid=0x95f875500023541 rank:tag=5:0] ctx_id 0, (status: 0x38) timed out (60 seconds), target (5:0) 03/03-08:17:05.97 storage01 DAOS[13113/0/3] rpc ERR src/cart/crt_context.c:888 crt_req_timeout_hdlr(0x7f57c45dc900) [opc=0x102000b (DAOS) rpcid=0x95f875500023541 rank:tag=5:0] aborting to group daos_server, rank 5, tgt_uri (null) 03/03-08:17:05.97 storage01 DAOS[13113/0/3] hg ERR src/cart/crt_hg.c:1246 crt_hg_req_send_cb(0x7f57c4576a00) [opc=0x102000b (DAOS) rpcid=0x95f87550002353e rank:tag=2:0] RPC failed; rc: DER_TIMEDOUT(-1011): 'Time out' 03/03-08:17:05.97 storage01 DAOS[13113/0/3] corpc ERR src/cart/crt_corpc.c:648 crt_corpc_reply_hdlr(0x7f57c4576a00) [opc=0x102000b (DAOS) rpcid=0x95f87550002353e rank:tag=2:0] error, rc: DER_TIMEDOUT(-1011): 'Time out' 03/03-08:17:05.97 storage01 DAOS[13113/0/3] hg ERR src/cart/crt_hg.c:1246 crt_hg_req_send_cb(0x7f57c45dc900) [opc=0x102000b (DAOS) rpcid=0x95f875500023541 rank:tag=5:0] RPC failed; rc: DER_TIMEDOUT(-1011): 'Time out' 03/03-08:17:05.97 storage01 DAOS[13113/0/3] corpc ERR src/cart/crt_corpc.c:648 crt_corpc_reply_hdlr(0x7f57c45dc900) [opc=0x102000b (DAOS) rpcid=0x95f875500023541 rank:tag=5:0] error, rc: DER_TIMEDOUT(-1011): 'Time out' 03/03-08:17:05.97 storage01 DAOS[13113/0/3] rpc ERR src/cart/crt_context.c:381 crt_rpc_complete(0x7f57c459c040) [opc=0x102000b (DAOS) rpcid=0x95f87550002353c rank:tag=0:0] failed, DER_TIMEDOUT(-1011): 'Time out' 03/03-08:18:03.29 storage01 DAOS[13113/0/3] rpc ERR src/cart/crt_context.c:939 crt_context_timeout_check(0x7f57c45dc6f0) [opc=0x1020007 (DAOS) rpcid=0x95f87550002357d rank:tag=2:0] ctx_id 0, (status: 0x38) timed out (60 seconds), target (2:0) 03/03-08:18:03.29 storage01 DAOS[13113/0/3] rpc ERR src/cart/crt_context.c:888 crt_req_timeout_hdlr(0x7f57c45dc6f0) [opc=0x1020007 (DAOS) rpcid=0x95f87550002357d rank:tag=2:0] aborting to group daos_server, rank 2, tgt_uri (null) 03/03-08:18:03.29 storage01 DAOS[13113/0/3] hg ERR src/cart/crt_hg.c:1246 crt_hg_req_send_cb(0x7f57c45dc6f0) [opc=0x1020007 (DAOS) rpcid=0x95f87550002357d rank:tag=2:0] RPC failed; rc: DER_TIMEDOUT(-1011): 'Time out' 03/03-08:18:03.29 storage01 DAOS[13113/0/3] corpc ERR src/cart/crt_corpc.c:648 crt_corpc_reply_hdlr(0x7f57c45dc6f0) [opc=0x1020007 (DAOS) rpcid=0x95f87550002357d rank:tag=2:0] error, rc: DER_TIMEDOUT(-1011): 'Time out' 03/03-08:18:03.29 storage01 DAOS[13113/0/3] rpc ERR src/cart/crt_context.c:381 crt_rpc_complete(0x7f57c45b2460) [opc=0x1020007 (DAOS) rpcid=0x95f87550002357b rank:tag=0:0] failed, DER_TIMEDOUT(-1011): 'Time out' 03/03-08:18:03.29 storage01 DAOS[13113/0/614] mgmt ERR src/mgmt/srv_pool.c:97 ds_mgmt_tgt_pool_create_ranks() f59eddb9: dss_rpc_send MGMT_TGT_CREATE: rc=DER_TIMEDOUT(-1011): 'Time out' 03/03-08:18:03.29 storage01 DAOS[13113/0/616] server INFO src/engine/server_iv.c:876 ds_iv_ns_stop() f59eddb9 ns stopped 03/03-08:18:03.29 storage01 DAOS[13113/0/617] container INFO src/container/srv_target.c:2408 ds_cont_tgt_ec_eph_query_ult() f59eddb9 stop tgt ec aggregation 03/03-08:18:03.29 storage01 DAOS[13113/0/616] pool INFO src/pool/srv_target.c:650 ds_pool_tgt_ec_eph_query_abort() f59eddb9: EC query ULT stopped 03/03-08:18:03.30 storage01 DAOS[13113/0/616] pool INFO src/pool/srv_target.c:668 pool_fetch_hdls_ult_abort() f59eddb9: fetch hdls ULT aborted 03/03-08:18:03.30 storage01 DAOS[13113/0/616] rebuild INFO src/rebuild/srv.c:1605 ds_rebuild_abort() f59eddb9 rebuild aborted 03/03-08:18:03.30 storage01 DAOS[13113/0/616] server INFO src/object/srv_obj_migrate.c:2747 ds_migrate_stop() f59eddb9 migrate stopped 03/03-08:18:03.41 storage01 DAOS[13113/0/616] telem INFO src/gurt/telemetry.c:235 destroy_shmem() Destroying shared memory segment (shmid=98344) 03/03-08:18:03.41 storage01 DAOS[13113/0/616] telem INFO src/gurt/telemetry.c:2457 d_tm_del_ephemeral_dir() Removed ephemeral directory [pool/f59eddb9-d930-448c-bc98-c6316c0455ca] 03/03-08:18:03.41 storage01 DAOS[13113/0/616] pool INFO src/pool/srv_metrics.c:176 ds_pool_metrics_stop() f59eddb9: destroyed ds_pool metrics 03/03-08:18:03.41 storage01 DAOS[13113/0/616] pool INFO src/pool/srv_target.c:808 ds_pool_stop() f59eddb9: pool service is aborted 03/03-08:18:08.97 storage01 DAOS[13113/0/3] rpc ERR src/cart/crt_context.c:939 crt_context_timeout_check(0x7f57c457b9c0) [opc=0x102000b (DAOS) rpcid=0x95f875500023587 rank:tag=2:0] ctx_id 0, (status: 0x38) timed out (60 seconds), target (2:0) 03/03-08:18:08.97 storage01 DAOS[13113/0/3] rpc ERR src/cart/crt_context.c:888 crt_req_timeout_hdlr(0x7f57c457b9c0) [opc=0x102000b (DAOS) rpcid=0x95f875500023587 rank:tag=2:0] aborting to group daos_server, rank 2, tgt_uri (null) 03/03-08:18:08.97 storage01 DAOS[13113/0/3] rpc ERR src/cart/crt_context.c:939 crt_context_timeout_check(0x7f57c46113d0) [opc=0x102000b (DAOS) rpcid=0x95f87550002358a rank:tag=5:0] ctx_id 0, (status: 0x38) timed out (60 seconds), target (5:0) 03/03-08:18:08.97 storage01 DAOS[13113/0/3] rpc ERR src/cart/crt_context.c:888 crt_req_timeout_hdlr(0x7f57c46113d0) [opc=0x102000b (DAOS) rpcid=0x95f87550002358a rank:tag=5:0] aborting to group daos_server, rank 5, tgt_uri (null) 03/03-08:18:08.97 storage01 DAOS[13113/0/3] hg ERR src/cart/crt_hg.c:1246 crt_hg_req_send_cb(0x7f57c457b9c0) [opc=0x102000b (DAOS) rpcid=0x95f875500023587 rank:tag=2:0] RPC failed; rc: DER_TIMEDOUT(-1011): 'Time out' 03/03-08:18:08.97 storage01 DAOS[13113/0/3] corpc ERR src/cart/crt_corpc.c:648 crt_corpc_reply_hdlr(0x7f57c457b9c0) [opc=0x102000b (DAOS) rpcid=0x95f875500023587 rank:tag=2:0] error, rc: DER_TIMEDOUT(-1011): 'Time out' 03/03-08:18:08.97 storage01 DAOS[13113/0/3] hg ERR src/cart/crt_hg.c:1246 crt_hg_req_send_cb(0x7f57c46113d0) [opc=0x102000b (DAOS) rpcid=0x95f87550002358a rank:tag=5:0] RPC failed; rc: DER_TIMEDOUT(-1011): 'Time out' 03/03-08:18:08.98 storage01 DAOS[13113/0/3] corpc ERR src/cart/crt_corpc.c:648 crt_corpc_reply_hdlr(0x7f57c46113d0) [opc=0x102000b (DAOS) rpcid=0x95f87550002358a rank:tag=5:0] error, rc: DER_TIMEDOUT(-1011): 'Time out' 03/03-08:18:08.98 storage01 DAOS[13113/0/3] rpc ERR src/cart/crt_context.c:381 crt_rpc_complete(0x7f57c459c040) [opc=0x102000b (DAOS) rpcid=0x95f875500023585 rank:tag=0:0] failed, DER_TIMEDOUT(-1011): 'Time out' 03/03-08:19:03.29 storage01 DAOS[13113/0/3] rpc ERR src/cart/crt_context.c:939 crt_context_timeout_check(0x7f57c4576a00) [opc=0x1020008 (DAOS) rpcid=0x95f8755000235c4 rank:tag=2:0] ctx_id 0, (status: 0x38) timed out (60 seconds), target (2:0) 03/03-08:19:03.29 storage01 DAOS[13113/0/3] rpc ERR src/cart/crt_context.c:888 crt_req_timeout_hdlr(0x7f57c4576a00) [opc=0x1020008 (DAOS) rpcid=0x95f8755000235c4 rank:tag=2:0] aborting to group daos_server, rank 2, tgt_uri (null) 03/03-08:19:03.29 storage01 DAOS[13113/0/3] hg ERR src/cart/crt_hg.c:1246 crt_hg_req_send_cb(0x7f57c4576a00) [opc=0x1020008 (DAOS) rpcid=0x95f8755000235c4 rank:tag=2:0] RPC failed; rc: DER_TIMEDOUT(-1011): 'Time out' 03/03-08:19:03.29 storage01 DAOS[13113/0/3] corpc ERR src/cart/crt_corpc.c:648 crt_corpc_reply_hdlr(0x7f57c4576a00) [opc=0x1020008 (DAOS) rpcid=0x95f8755000235c4 rank:tag=2:0] error, rc: DER_TIMEDOUT(-1011): 'Time out' 03/03-08:19:03.29 storage01 DAOS[13113/0/3] rpc ERR src/cart/crt_context.c:381 crt_rpc_complete(0x7f57c45b2460) [opc=0x1020008 (DAOS) rpcid=0x95f8755000235c2 rank:tag=0:0] failed, DER_TIMEDOUT(-1011): 'Time out' 03/03-08:19:03.30 storage01 DAOS[13113/0/614] mgmt ERR src/mgmt/srv_pool.c:122 ds_mgmt_tgt_pool_create_ranks() f59eddb9: failed to clean up failed pool: DER_TIMEDOUT(-1011): 'Time out' 03/03-08:19:03.30 storage01 DAOS[13113/0/614] mgmt ERR src/mgmt/srv_pool.c:193 ds_mgmt_create_pool() creating pool f59eddb9 on ranks failed: rc DER_TIMEDOUT(-1011): 'Time out' 03/03-08:19:03.30 storage01 DAOS[13113/0/614] mgmt ERR src/mgmt/srv_drpc.c:485 ds_mgmt_drpc_pool_create() failed to create pool: DER_TIMEDOUT(-1011): 'Time out'
[root@client01 ~]# dmg pool create -r 0,3 --scm-size=1T --nvme-size=15T --nsvc=1 Pool1 Creating DAOS pool with manual per-engine storage allocation: 1.0 TB SCM, 15 TB NVMe (6.67% ratio) Pool created with 6.25%,93.75% storage tier ratio ------------------------------------------------- UUID : 31cf7760-fc1a-45b8-ab9b-0b539726bfe3 Service Ranks : 0 Storage Ranks : [0,3] Total Size : 32 TB Storage tier 0 (SCM) : 2.0 TB (1.0 TB / rank) Storage tier 1 (NVMe): 30 TB (15 TB / rank)
|
|
Re: Fail to create pool
landen.tian@...
CART didn't work |
|
Re: Fail to create pool
It makes progress either, but it still failed.
client01: On storage01: 03/03-08:17:03.29 storage01 DAOS[13113/0/614] daos INFO src/engine/drpc_progress.c:278 drpc_handler_ult() dRPC handler ULT for module=2 method=207
03/03-08:17:03.29 storage01 DAOS[13113/0/614] mgmt INFO src/mgmt/srv_drpc.c:434 ds_mgmt_drpc_pool_create() Received request to create pool on 6 ranks.
03/03-08:17:04.25 storage01 DAOS[13113/0/615] telem INFO src/gurt/telemetry.c:211 new_shmem() creating new shared memory segment, key=0x4f11c3a9, size=172592
03/03-08:17:04.25 storage01 DAOS[13113/0/615] pool INFO src/pool/srv_metrics.c:152 ds_pool_metrics_start() f59eddb9: created metrics for pool
03/03-08:17:04.32 storage01 DAOS[13113/0/7] server ERR src/engine/sched.c:1964 sched_watchdog_post() WATCHDOG: Thread 0x43f230 took 59 ms. symbol:/mnt/nfs/daos/install/bin/daos_engine() [0x43f230]
03/03-08:17:05.97 storage01 DAOS[13113/0/3] rpc ERR src/cart/crt_context.c:939 crt_context_timeout_check(0x7f57c4576a00) [opc=0x102000b (DAOS) rpcid=0x95f87550002353e rank:tag=2:0] ctx_id 0, (status: 0x38) timed out (60 seconds), target (2:0)
03/03-08:17:05.97 storage01 DAOS[13113/0/3] rpc ERR src/cart/crt_context.c:888 crt_req_timeout_hdlr(0x7f57c4576a00) [opc=0x102000b (DAOS) rpcid=0x95f87550002353e rank:tag=2:0] aborting to group daos_server, rank 2, tgt_uri (null)
03/03-08:17:05.97 storage01 DAOS[13113/0/3] rpc ERR src/cart/crt_context.c:939 crt_context_timeout_check(0x7f57c45dc900) [opc=0x102000b (DAOS) rpcid=0x95f875500023541 rank:tag=5:0] ctx_id 0, (status: 0x38) timed out (60 seconds), target (5:0)
03/03-08:17:05.97 storage01 DAOS[13113/0/3] rpc ERR src/cart/crt_context.c:888 crt_req_timeout_hdlr(0x7f57c45dc900) [opc=0x102000b (DAOS) rpcid=0x95f875500023541 rank:tag=5:0] aborting to group daos_server, rank 5, tgt_uri (null)
03/03-08:17:05.97 storage01 DAOS[13113/0/3] hg ERR src/cart/crt_hg.c:1246 crt_hg_req_send_cb(0x7f57c4576a00) [opc=0x102000b (DAOS) rpcid=0x95f87550002353e rank:tag=2:0] RPC failed; rc: DER_TIMEDOUT(-1011): 'Time out'
03/03-08:17:05.97 storage01 DAOS[13113/0/3] corpc ERR src/cart/crt_corpc.c:648 crt_corpc_reply_hdlr(0x7f57c4576a00) [opc=0x102000b (DAOS) rpcid=0x95f87550002353e rank:tag=2:0] error, rc: DER_TIMEDOUT(-1011): 'Time out'
03/03-08:17:05.97 storage01 DAOS[13113/0/3] hg ERR src/cart/crt_hg.c:1246 crt_hg_req_send_cb(0x7f57c45dc900) [opc=0x102000b (DAOS) rpcid=0x95f875500023541 rank:tag=5:0] RPC failed; rc: DER_TIMEDOUT(-1011): 'Time out'
03/03-08:17:05.97 storage01 DAOS[13113/0/3] corpc ERR src/cart/crt_corpc.c:648 crt_corpc_reply_hdlr(0x7f57c45dc900) [opc=0x102000b (DAOS) rpcid=0x95f875500023541 rank:tag=5:0] error, rc: DER_TIMEDOUT(-1011): 'Time out'
03/03-08:17:05.97 storage01 DAOS[13113/0/3] rpc ERR src/cart/crt_context.c:381 crt_rpc_complete(0x7f57c459c040) [opc=0x102000b (DAOS) rpcid=0x95f87550002353c rank:tag=0:0] failed, DER_TIMEDOUT(-1011): 'Time out'
03/03-08:18:03.29 storage01 DAOS[13113/0/3] rpc ERR src/cart/crt_context.c:939 crt_context_timeout_check(0x7f57c45dc6f0) [opc=0x1020007 (DAOS) rpcid=0x95f87550002357d rank:tag=2:0] ctx_id 0, (status: 0x38) timed out (60 seconds), target (2:0)
03/03-08:18:03.29 storage01 DAOS[13113/0/3] rpc ERR src/cart/crt_context.c:888 crt_req_timeout_hdlr(0x7f57c45dc6f0) [opc=0x1020007 (DAOS) rpcid=0x95f87550002357d rank:tag=2:0] aborting to group daos_server, rank 2, tgt_uri (null)
03/03-08:18:03.29 storage01 DAOS[13113/0/3] hg ERR src/cart/crt_hg.c:1246 crt_hg_req_send_cb(0x7f57c45dc6f0) [opc=0x1020007 (DAOS) rpcid=0x95f87550002357d rank:tag=2:0] RPC failed; rc: DER_TIMEDOUT(-1011): 'Time out'
03/03-08:18:03.29 storage01 DAOS[13113/0/3] corpc ERR src/cart/crt_corpc.c:648 crt_corpc_reply_hdlr(0x7f57c45dc6f0) [opc=0x1020007 (DAOS) rpcid=0x95f87550002357d rank:tag=2:0] error, rc: DER_TIMEDOUT(-1011): 'Time out'
03/03-08:18:03.29 storage01 DAOS[13113/0/3] rpc ERR src/cart/crt_context.c:381 crt_rpc_complete(0x7f57c45b2460) [opc=0x1020007 (DAOS) rpcid=0x95f87550002357b rank:tag=0:0] failed, DER_TIMEDOUT(-1011): 'Time out'
03/03-08:18:03.29 storage01 DAOS[13113/0/614] mgmt ERR src/mgmt/srv_pool.c:97 ds_mgmt_tgt_pool_create_ranks() f59eddb9: dss_rpc_send MGMT_TGT_CREATE: rc=DER_TIMEDOUT(-1011): 'Time out'
03/03-08:18:03.29 storage01 DAOS[13113/0/616] server INFO src/engine/server_iv.c:876 ds_iv_ns_stop() f59eddb9 ns stopped
03/03-08:18:03.29 storage01 DAOS[13113/0/617] container INFO src/container/srv_target.c:2408 ds_cont_tgt_ec_eph_query_ult() f59eddb9 stop tgt ec aggregation
03/03-08:18:03.29 storage01 DAOS[13113/0/616] pool INFO src/pool/srv_target.c:650 ds_pool_tgt_ec_eph_query_abort() f59eddb9: EC query ULT stopped
03/03-08:18:03.30 storage01 DAOS[13113/0/616] pool INFO src/pool/srv_target.c:668 pool_fetch_hdls_ult_abort() f59eddb9: fetch hdls ULT aborted
03/03-08:18:03.30 storage01 DAOS[13113/0/616] rebuild INFO src/rebuild/srv.c:1605 ds_rebuild_abort() f59eddb9 rebuild aborted
03/03-08:18:03.30 storage01 DAOS[13113/0/616] server INFO src/object/srv_obj_migrate.c:2747 ds_migrate_stop() f59eddb9 migrate stopped
03/03-08:18:03.41 storage01 DAOS[13113/0/616] telem INFO src/gurt/telemetry.c:235 destroy_shmem() Destroying shared memory segment (shmid=98344)
03/03-08:18:03.41 storage01 DAOS[13113/0/616] telem INFO src/gurt/telemetry.c:2457 d_tm_del_ephemeral_dir() Removed ephemeral directory [pool/f59eddb9-d930-448c-bc98-c6316c0455ca]
03/03-08:18:03.41 storage01 DAOS[13113/0/616] pool INFO src/pool/srv_metrics.c:176 ds_pool_metrics_stop() f59eddb9: destroyed ds_pool metrics
03/03-08:18:03.41 storage01 DAOS[13113/0/616] pool INFO src/pool/srv_target.c:808 ds_pool_stop() f59eddb9: pool service is aborted
03/03-08:18:08.97 storage01 DAOS[13113/0/3] rpc ERR src/cart/crt_context.c:939 crt_context_timeout_check(0x7f57c457b9c0) [opc=0x102000b (DAOS) rpcid=0x95f875500023587 rank:tag=2:0] ctx_id 0, (status: 0x38) timed out (60 seconds), target (2:0)
03/03-08:18:08.97 storage01 DAOS[13113/0/3] rpc ERR src/cart/crt_context.c:888 crt_req_timeout_hdlr(0x7f57c457b9c0) [opc=0x102000b (DAOS) rpcid=0x95f875500023587 rank:tag=2:0] aborting to group daos_server, rank 2, tgt_uri (null)
03/03-08:18:08.97 storage01 DAOS[13113/0/3] rpc ERR src/cart/crt_context.c:939 crt_context_timeout_check(0x7f57c46113d0) [opc=0x102000b (DAOS) rpcid=0x95f87550002358a rank:tag=5:0] ctx_id 0, (status: 0x38) timed out (60 seconds), target (5:0)
03/03-08:18:08.97 storage01 DAOS[13113/0/3] rpc ERR src/cart/crt_context.c:888 crt_req_timeout_hdlr(0x7f57c46113d0) [opc=0x102000b (DAOS) rpcid=0x95f87550002358a rank:tag=5:0] aborting to group daos_server, rank 5, tgt_uri (null)
03/03-08:18:08.97 storage01 DAOS[13113/0/3] hg ERR src/cart/crt_hg.c:1246 crt_hg_req_send_cb(0x7f57c457b9c0) [opc=0x102000b (DAOS) rpcid=0x95f875500023587 rank:tag=2:0] RPC failed; rc: DER_TIMEDOUT(-1011): 'Time out'
03/03-08:18:08.97 storage01 DAOS[13113/0/3] corpc ERR src/cart/crt_corpc.c:648 crt_corpc_reply_hdlr(0x7f57c457b9c0) [opc=0x102000b (DAOS) rpcid=0x95f875500023587 rank:tag=2:0] error, rc: DER_TIMEDOUT(-1011): 'Time out'
03/03-08:18:08.97 storage01 DAOS[13113/0/3] hg ERR src/cart/crt_hg.c:1246 crt_hg_req_send_cb(0x7f57c46113d0) [opc=0x102000b (DAOS) rpcid=0x95f87550002358a rank:tag=5:0] RPC failed; rc: DER_TIMEDOUT(-1011): 'Time out'
03/03-08:18:08.98 storage01 DAOS[13113/0/3] corpc ERR src/cart/crt_corpc.c:648 crt_corpc_reply_hdlr(0x7f57c46113d0) [opc=0x102000b (DAOS) rpcid=0x95f87550002358a rank:tag=5:0] error, rc: DER_TIMEDOUT(-1011): 'Time out'
03/03-08:18:08.98 storage01 DAOS[13113/0/3] rpc ERR src/cart/crt_context.c:381 crt_rpc_complete(0x7f57c459c040) [opc=0x102000b (DAOS) rpcid=0x95f875500023585 rank:tag=0:0] failed, DER_TIMEDOUT(-1011): 'Time out'
03/03-08:19:03.29 storage01 DAOS[13113/0/3] rpc ERR src/cart/crt_context.c:939 crt_context_timeout_check(0x7f57c4576a00) [opc=0x1020008 (DAOS) rpcid=0x95f8755000235c4 rank:tag=2:0] ctx_id 0, (status: 0x38) timed out (60 seconds), target (2:0)
03/03-08:19:03.29 storage01 DAOS[13113/0/3] rpc ERR src/cart/crt_context.c:888 crt_req_timeout_hdlr(0x7f57c4576a00) [opc=0x1020008 (DAOS) rpcid=0x95f8755000235c4 rank:tag=2:0] aborting to group daos_server, rank 2, tgt_uri (null)
03/03-08:19:03.29 storage01 DAOS[13113/0/3] hg ERR src/cart/crt_hg.c:1246 crt_hg_req_send_cb(0x7f57c4576a00) [opc=0x1020008 (DAOS) rpcid=0x95f8755000235c4 rank:tag=2:0] RPC failed; rc: DER_TIMEDOUT(-1011): 'Time out'
03/03-08:19:03.29 storage01 DAOS[13113/0/3] corpc ERR src/cart/crt_corpc.c:648 crt_corpc_reply_hdlr(0x7f57c4576a00) [opc=0x1020008 (DAOS) rpcid=0x95f8755000235c4 rank:tag=2:0] error, rc: DER_TIMEDOUT(-1011): 'Time out'
03/03-08:19:03.29 storage01 DAOS[13113/0/3] rpc ERR src/cart/crt_context.c:381 crt_rpc_complete(0x7f57c45b2460) [opc=0x1020008 (DAOS) rpcid=0x95f8755000235c2 rank:tag=0:0] failed, DER_TIMEDOUT(-1011): 'Time out'
03/03-08:19:03.30 storage01 DAOS[13113/0/614] mgmt ERR src/mgmt/srv_pool.c:122 ds_mgmt_tgt_pool_create_ranks() f59eddb9: failed to clean up failed pool: DER_TIMEDOUT(-1011): 'Time out'
03/03-08:19:03.30 storage01 DAOS[13113/0/614] mgmt ERR src/mgmt/srv_pool.c:193 ds_mgmt_create_pool() creating pool f59eddb9 on ranks failed: rc DER_TIMEDOUT(-1011): 'Time out'
03/03-08:19:03.30 storage01 DAOS[13113/0/614] mgmt ERR src/mgmt/srv_drpc.c:485 ds_mgmt_drpc_pool_create() failed to create pool: DER_TIMEDOUT(-1011): 'Time out'
It seems it has some errors on network. If I create a pool used racks in a same server, I works: [root@client01 ~]# dmg pool create -r 0,3 --scm-size=1T --nvme-size=15T --nsvc=1 Pool1
Creating DAOS pool with manual per-engine storage allocation: 1.0 TB SCM, 15 TB NVMe (6.67% ratio)
Pool created with 6.25%,93.75% storage tier ratio
-------------------------------------------------
UUID : 31cf7760-fc1a-45b8-ab9b-0b539726bfe3
Service Ranks : 0
Storage Ranks : [0,3]
Total Size : 32 TB
Storage tier 0 (SCM) : 2.0 TB (1.0 TB / rank)
Storage tier 1 (NVMe): 30 TB (15 TB / rank)
Problems should be on CART comunications between servers. |
|
Re: Fail to create pool
Lombardi, Johann
That’s progress! IIUC, you have 5x 3.8TB = 19TB per engine, so 20TB is probably too much. Could you please try with a smaller nvme-size? Since you use 2.2, you could also run “dmg pool create --size 100% Pool1” to allocate all the SCM and NVMe space.
Cheers, Johann
From: <daos@daos.groups.io> on behalf of "landen.tian@..." <landen.tian@...>
[Edited Message Follows] Lombardi, [root@client01 ~]# dmg pool create -r 0,3,4,5 --scm-size=1T --nvme-size=20T --nsvc=1 Pool1 Creating DAOS pool with manual per-engine storage allocation: 1.0 TB SCM, 20 TB NVMe (5.00% ratio) ERROR: dmg: client: code = 509 description = "the *control.PoolCreateReq request timed out after 10m0s" ERROR: dmg: client: code = 509 resolution = "retry the request or check server logs for more information"
03/02-19:51:25.68 storage01 DAOS[13113/0/561] mgmt INFO src/mgmt/srv_drpc.c:434 ds_mgmt_drpc_pool_create() Received request to create pool on 4 ranks. 03/02-19:51:26.48 storage01 DAOS[13113/3/562] bio ERR src/bio/bio_context.c:396 bio_blob_create() Create blob failed for xs:0x7f578844ea20 pool:d62847b7 rc:-1007 03/02-19:51:26.48 storage01 DAOS[13113/3/562] vos ERR src/vos/vos_pool.c:402 vos_pool_create() Error creating blob for xs:0x7f578844ea20 pool:d62847b7 DER_NOSPACE(-1007): 'No space on storage target' 03/02-19:51:26.53 storage01 DAOS[13113/5/563] bio ERR src/bio/bio_context.c:396 bio_blob_create() Create blob failed for xs:0x7f575c64ea20 pool:d62847b7 rc:-1007 03/02-19:51:26.53 storage01 DAOS[13113/5/563] vos ERR src/vos/vos_pool.c:402 vos_pool_create() Error creating blob for xs:0x7f575c64ea20 pool:d62847b7 DER_NOSPACE(-1007): 'No space on storage target' 03/02-19:51:26.58 storage01 DAOS[13113/4/564] bio ERR src/bio/bio_context.c:396 bio_blob_create() Create blob failed for xs:0x7f577464ea20 pool:d62847b7 rc:-1007 03/02-19:51:26.58 storage01 DAOS[13113/4/564] vos ERR src/vos/vos_pool.c:402 vos_pool_create() Error creating blob for xs:0x7f577464ea20 pool:d62847b7 DER_NOSPACE(-1007): 'No space on storage target' 03/02-19:51:26.59 storage01 DAOS[13113/3/562] mgmt ERR src/mgmt/srv_target.c:480 tgt_vos_create_one() d62847b7: failed to init vos pool /mnt/daos0/NEWBORNS/d62847b7-10d6-4c42-a996-ea4f4dfce486/vos-0: -1007 03/02-19:51:26.59 storage01 DAOS[13113/5/563] mgmt ERR src/mgmt/srv_target.c:480 tgt_vos_create_one() d62847b7: failed to init vos pool /mnt/daos0/NEWBORNS/d62847b7-10d6-4c42-a996-ea4f4dfce486/vos-2: -1007 03/02-19:51:26.59 storage01 DAOS[13113/4/564] mgmt ERR src/mgmt/srv_target.c:480 tgt_vos_create_one() d62847b7: failed to init vos pool /mnt/daos0/NEWBORNS/d62847b7-10d6-4c42-a996-ea4f4dfce486/vos-1: -1007 03/02-19:51:26.60 storage01 DAOS[13113/4/565] bio WARN src/bio/bio_context.c:279 bio_blob_delete() Blob for xs:0x7f577464ea20, pool:d62847b7 doesn't exist 03/02-19:51:26.60 storage01 DAOS[13113/5/566] bio WARN src/bio/bio_context.c:279 bio_blob_delete() Blob for xs:0x7f575c64ea20, pool:d62847b7 doesn't exist 03/02-19:51:26.60 storage01 DAOS[13113/3/567] bio WARN src/bio/bio_context.c:279 bio_blob_delete() Blob for xs:0x7f578844ea20, pool:d62847b7 doesn't exist 03/02-19:51:28.53 storage01 DAOS[13113/0/3] rpc ERR src/cart/crt_context.c:939 crt_context_timeout_check(0x7f57c457b9c0) [opc=0x102000b (DAOS) rpcid=0x95f8755000175dd rank:tag=2:0] ctx_id 0, (status: 0x38) timed out (60 seconds), target (2:0) 03/02-19:51:28.53 storage01 DAOS[13113/0/3] rpc ERR src/cart/crt_context.c:888 crt_req_timeout_hdlr(0x7f57c457b9c0) [opc=0x102000b (DAOS) rpcid=0x95f8755000175dd rank:tag=2:0] aborting to group daos_server, rank 2, tgt_uri (null) 03/02-19:51:28.53 storage01 DAOS[13113/0/3] rpc ERR src/cart/crt_context.c:939 crt_context_timeout_check(0x7f57c459c040) [opc=0x102000b (DAOS) rpcid=0x95f8755000175e0 rank:tag=5:0] ctx_id 0, (status: 0x38) timed out (60 seconds), target (5:0) 03/02-19:51:28.53 storage01 DAOS[13113/0/3] rpc ERR src/cart/crt_context.c:888 crt_req_timeout_hdlr(0x7f57c459c040) [opc=0x102000b (DAOS) rpcid=0x95f8755000175e0 rank:tag=5:0] aborting to group daos_server, rank 5, tgt_uri (null) 03/02-19:51:28.53 storage01 DAOS[13113/0/3] hg ERR src/cart/crt_hg.c:1246 crt_hg_req_send_cb(0x7f57c457b9c0) [opc=0x102000b (DAOS) rpcid=0x95f8755000175dd rank:tag=2:0] RPC failed; rc: DER_TIMEDOUT(-1011): 'Time out' 03/02-19:51:28.53 storage01 DAOS[13113/0/3] corpc ERR src/cart/crt_corpc.c:648 crt_corpc_reply_hdlr(0x7f57c457b9c0) [opc=0x102000b (DAOS) rpcid=0x95f8755000175dd rank:tag=2:0] error, rc: DER_TIMEDOUT(-1011): 'Time out' 03/02-19:51:28.56 storage01 DAOS[13113/0/3] hg ERR src/cart/crt_hg.c:1246 crt_hg_req_send_cb(0x7f57c459c040) [opc=0x102000b (DAOS) rpcid=0x95f8755000175e0 rank:tag=5:0] RPC failed; rc: DER_TIMEDOUT(-1011): 'Time out' 03/02-19:51:28.56 storage01 DAOS[13113/0/3] corpc ERR src/cart/crt_corpc.c:648 crt_corpc_reply_hdlr(0x7f57c459c040) [opc=0x102000b (DAOS) rpcid=0x95f8755000175e0 rank:tag=5:0] error, rc: DER_TIMEDOUT(-1011): 'Time out' 03/02-19:51:28.56 storage01 DAOS[13113/0/3] rpc ERR src/cart/crt_context.c:381 crt_rpc_complete(0x7f57c459c500) [opc=0x102000b (DAOS) rpcid=0x95f8755000175db rank:tag=0:0] failed, DER_TIMEDOUT(-1011): 'Time out' 03/02-19:52:25.68 storage01 DAOS[13113/0/3] rpc ERR src/cart/crt_context.c:939 crt_context_timeout_check(0x7f57c4576a00) [opc=0x1020007 (DAOS) rpcid=0x95f87550001761d rank:tag=5:0] ctx_id 0, (status: 0x38) timed out (60 seconds), target (5:0) 03/02-19:52:25.68 storage01 DAOS[13113/0/3] rpc ERR src/cart/crt_context.c:888 crt_req_timeout_hdlr(0x7f57c4576a00) [opc=0x1020007 (DAOS) rpcid=0x95f87550001761d rank:tag=5:0] aborting to group daos_server, rank 5, tgt_uri (null) 03/02-19:52:25.68 storage01 DAOS[13113/0/3] hg ERR src/cart/crt_hg.c:1246 crt_hg_req_send_cb(0x7f57c4576a00) [opc=0x1020007 (DAOS) rpcid=0x95f87550001761d rank:tag=5:0] RPC failed; rc: DER_TIMEDOUT(-1011): 'Time out' 03/02-19:52:25.68 storage01 DAOS[13113/0/3] corpc ERR src/cart/crt_corpc.c:648 crt_corpc_reply_hdlr(0x7f57c4576a00) [opc=0x1020007 (DAOS) rpcid=0x95f87550001761d rank:tag=5:0] error, rc: DER_TIMEDOUT(-1011): 'Time out' 03/02-19:52:25.68 storage01 DAOS[13113/0/3] rpc ERR src/cart/crt_context.c:381 crt_rpc_complete(0x7f57c45b2460) [opc=0x1020007 (DAOS) rpcid=0x95f87550001761a rank:tag=0:0] failed, DER_TIMEDOUT(-1011): 'Time out' 03/02-19:52:25.68 storage01 DAOS[13113/0/561] mgmt ERR src/mgmt/srv_pool.c:97 ds_mgmt_tgt_pool_create_ranks() d62847b7: dss_rpc_send MGMT_TGT_CREATE: rc=DER_TIMEDOUT(-1011): 'Time out' 03/02-19:52:31.58 storage01 DAOS[13113/0/3] rpc ERR src/cart/crt_context.c:939 crt_context_timeout_check(0x7f57c457b9c0) [opc=0x102000b (DAOS) rpcid=0x95f875500017626 rank:tag=2:0] ctx_id 0, (status: 0x38) timed out (60 seconds), target (2:0) 03/02-19:52:31.58 storage01 DAOS[13113/0/3] rpc ERR src/cart/crt_context.c:888 crt_req_timeout_hdlr(0x7f57c457b9c0) [opc=0x102000b (DAOS) rpcid=0x95f875500017626 rank:tag=2:0] aborting to group daos_server, rank 2, tgt_uri (null) 03/02-19:52:31.58 storage01 DAOS[13113/0/3] rpc ERR src/cart/crt_context.c:939 crt_context_timeout_check(0x7f57c46113d0) [opc=0x102000b (DAOS) rpcid=0x95f875500017629 rank:tag=5:0] ctx_id 0, (status: 0x38) timed out (60 seconds), target (5:0) 03/02-19:52:31.58 storage01 DAOS[13113/0/3] rpc ERR src/cart/crt_context.c:888 crt_req_timeout_hdlr(0x7f57c46113d0) [opc=0x102000b (DAOS) rpcid=0x95f875500017629 rank:tag=5:0] aborting to group daos_server, rank 5, tgt_uri (null) 03/02-19:52:31.58 storage01 DAOS[13113/0/3] hg ERR src/cart/crt_hg.c:1246 crt_hg_req_send_cb(0x7f57c457b9c0) [opc=0x102000b (DAOS) rpcid=0x95f875500017626 rank:tag=2:0] RPC failed; rc: DER_TIMEDOUT(-1011): 'Time out' 03/02-19:52:31.58 storage01 DAOS[13113/0/3] corpc ERR src/cart/crt_corpc.c:648 crt_corpc_reply_hdlr(0x7f57c457b9c0) [opc=0x102000b (DAOS) rpcid=0x95f875500017626 rank:tag=2:0] error, rc: DER_TIMEDOUT(-1011): 'Time out' 03/02-19:52:31.58 storage01 DAOS[13113/0/3] hg ERR src/cart/crt_hg.c:1246 crt_hg_req_send_cb(0x7f57c46113d0) [opc=0x102000b (DAOS) rpcid=0x95f875500017629 rank:tag=5:0] RPC failed; rc: DER_TIMEDOUT(-1011): 'Time out' 03/02-19:52:31.58 storage01 DAOS[13113/0/3] corpc ERR src/cart/crt_corpc.c:648 crt_corpc_reply_hdlr(0x7f57c46113d0) [opc=0x102000b (DAOS) rpcid=0x95f875500017629 rank:tag=5:0] error, rc: DER_TIMEDOUT(-1011): 'Time out' 03/02-19:52:31.58 storage01 DAOS[13113/0/3] rpc ERR src/cart/crt_context.c:381 crt_rpc_complete(0x7f57c459c040) [opc=0x102000b (DAOS) rpcid=0x95f875500017624 rank:tag=0:0] failed, DER_TIMEDOUT(-1011): 'Time out' 03/02-19:53:25.68 storage01 DAOS[13113/0/3] rpc ERR src/cart/crt_context.c:939 crt_context_timeout_check(0x7f57c4576a00) [opc=0x1020008 (DAOS) rpcid=0x95f875500017663 rank:tag=5:0] ctx_id 0, (status: 0x38) timed out (60 seconds), target (5:0) 03/02-19:53:25.68 storage01 DAOS[13113/0/3] rpc ERR src/cart/crt_context.c:888 crt_req_timeout_hdlr(0x7f57c4576a00) [opc=0x1020008 (DAOS) rpcid=0x95f875500017663 rank:tag=5:0] aborting to group daos_server, rank 5, tgt_uri (null) 03/02-19:53:25.68 storage01 DAOS[13113/0/3] hg ERR src/cart/crt_hg.c:1246 crt_hg_req_send_cb(0x7f57c4576a00) [opc=0x1020008 (DAOS) rpcid=0x95f875500017663 rank:tag=5:0] RPC failed; rc: DER_TIMEDOUT(-1011): 'Time out' 03/02-19:53:25.68 storage01 DAOS[13113/0/3] corpc ERR src/cart/crt_corpc.c:648 crt_corpc_reply_hdlr(0x7f57c4576a00) [opc=0x1020008 (DAOS) rpcid=0x95f875500017663 rank:tag=5:0] error, rc: DER_TIMEDOUT(-1011): 'Time out' 03/02-19:53:25.68 storage01 DAOS[13113/0/3] rpc ERR src/cart/crt_context.c:381 crt_rpc_complete(0x7f57c45b2460) [opc=0x1020008 (DAOS) rpcid=0x95f875500017660 rank:tag=0:0] failed, DER_TIMEDOUT(-1011): 'Time out' 03/02-19:53:25.69 storage01 DAOS[13113/0/561] mgmt ERR src/mgmt/srv_pool.c:122 ds_mgmt_tgt_pool_create_ranks() d62847b7: failed to clean up failed pool: DER_TIMEDOUT(-1011): 'Time out' 03/02-19:53:25.69 storage01 DAOS[13113/0/561] mgmt ERR src/mgmt/srv_pool.c:193 ds_mgmt_create_pool() creating pool d62847b7 on ranks failed: rc DER_TIMEDOUT(-1011): 'Time out' 03/02-19:53:25.69 storage01 DAOS[13113/0/561] mgmt ERR src/mgmt/srv_drpc.c:485 ds_mgmt_drpc_pool_create() failed to create pool: DER_TIMEDOUT(-1011): 'Time out' 03/02-19:53:25.69 storage01 DAOS[13113/2/1] daos INFO src/engine/drpc_progress.c:392 process_session_activity() Session 1619 connection has been terminated
253 sysctl -w net.ipv4.conf.all.accept_local=1 254 sysctl -w net.ipv4.conf.all.arp_ignore=2 255 sysctl -w net.ipv4.conf.ib0.rp_filter=2 256 sysctl -w net.ipv4.conf.ib1.rp_filter=2 258 sysctl -w net.ipv4.conf.ens21f0.rp_filter=2
--------------------------------------------------------------------- This e-mail and any attachments may contain confidential material for |
|
Re: Fail to create pool
Lombardi,
Glad to see your response! I followed https://docs.daos.io/v2.2/admin/predeployment_check/#multi-railnic-setup, error is changed from DER_HG(-1020) to DER_NOSPACE(-1007) command: [root@client01 ~]# dmg pool create -r 0,3,4,5 --scm-size=1T --nvme-size=20T --nsvc=1 Pool1
Creating DAOS pool with manual per-engine storage allocation: 1.0 TB SCM, 20 TB NVMe (5.00% ratio)
ERROR: dmg: client: code = 509 description = "the *control.PoolCreateReq request timed out after 10m0s"
ERROR: dmg: client: code = 509 resolution = "retry the request or check server logs for more information"
Errors log on storage01: 03/02-19:51:25.68 storage01 DAOS[13113/0/561] mgmt INFO src/mgmt/srv_drpc.c:434 ds_mgmt_drpc_pool_create() Received request to create pool on 4 ranks.
03/02-19:51:26.48 storage01 DAOS[13113/3/562] bio ERR src/bio/bio_context.c:396 bio_blob_create() Create blob failed for xs:0x7f578844ea20 pool:d62847b7 rc:-1007
03/02-19:51:26.48 storage01 DAOS[13113/3/562] vos ERR src/vos/vos_pool.c:402 vos_pool_create() Error creating blob for xs:0x7f578844ea20 pool:d62847b7 DER_NOSPACE(-1007): 'No space on storage target'
03/02-19:51:26.53 storage01 DAOS[13113/5/563] bio ERR src/bio/bio_context.c:396 bio_blob_create() Create blob failed for xs:0x7f575c64ea20 pool:d62847b7 rc:-1007
03/02-19:51:26.53 storage01 DAOS[13113/5/563] vos ERR src/vos/vos_pool.c:402 vos_pool_create() Error creating blob for xs:0x7f575c64ea20 pool:d62847b7 DER_NOSPACE(-1007): 'No space on storage target'
03/02-19:51:26.58 storage01 DAOS[13113/4/564] bio ERR src/bio/bio_context.c:396 bio_blob_create() Create blob failed for xs:0x7f577464ea20 pool:d62847b7 rc:-1007
03/02-19:51:26.58 storage01 DAOS[13113/4/564] vos ERR src/vos/vos_pool.c:402 vos_pool_create() Error creating blob for xs:0x7f577464ea20 pool:d62847b7 DER_NOSPACE(-1007): 'No space on storage target'
03/02-19:51:26.59 storage01 DAOS[13113/3/562] mgmt ERR src/mgmt/srv_target.c:480 tgt_vos_create_one() d62847b7: failed to init vos pool /mnt/daos0/NEWBORNS/d62847b7-10d6-4c42-a996-ea4f4dfce486/vos-0: -1007
03/02-19:51:26.59 storage01 DAOS[13113/5/563] mgmt ERR src/mgmt/srv_target.c:480 tgt_vos_create_one() d62847b7: failed to init vos pool /mnt/daos0/NEWBORNS/d62847b7-10d6-4c42-a996-ea4f4dfce486/vos-2: -1007
03/02-19:51:26.59 storage01 DAOS[13113/4/564] mgmt ERR src/mgmt/srv_target.c:480 tgt_vos_create_one() d62847b7: failed to init vos pool /mnt/daos0/NEWBORNS/d62847b7-10d6-4c42-a996-ea4f4dfce486/vos-1: -1007
03/02-19:51:26.60 storage01 DAOS[13113/4/565] bio WARN src/bio/bio_context.c:279 bio_blob_delete() Blob for xs:0x7f577464ea20, pool:d62847b7 doesn't exist
03/02-19:51:26.60 storage01 DAOS[13113/5/566] bio WARN src/bio/bio_context.c:279 bio_blob_delete() Blob for xs:0x7f575c64ea20, pool:d62847b7 doesn't exist
03/02-19:51:26.60 storage01 DAOS[13113/3/567] bio WARN src/bio/bio_context.c:279 bio_blob_delete() Blob for xs:0x7f578844ea20, pool:d62847b7 doesn't exist
03/02-19:51:28.53 storage01 DAOS[13113/0/3] rpc ERR src/cart/crt_context.c:939 crt_context_timeout_check(0x7f57c457b9c0) [opc=0x102000b (DAOS) rpcid=0x95f8755000175dd rank:tag=2:0] ctx_id 0, (status: 0x38) timed out (60 seconds), target (2:0)
03/02-19:51:28.53 storage01 DAOS[13113/0/3] rpc ERR src/cart/crt_context.c:888 crt_req_timeout_hdlr(0x7f57c457b9c0) [opc=0x102000b (DAOS) rpcid=0x95f8755000175dd rank:tag=2:0] aborting to group daos_server, rank 2, tgt_uri (null)
03/02-19:51:28.53 storage01 DAOS[13113/0/3] rpc ERR src/cart/crt_context.c:939 crt_context_timeout_check(0x7f57c459c040) [opc=0x102000b (DAOS) rpcid=0x95f8755000175e0 rank:tag=5:0] ctx_id 0, (status: 0x38) timed out (60 seconds), target (5:0)
03/02-19:51:28.53 storage01 DAOS[13113/0/3] rpc ERR src/cart/crt_context.c:888 crt_req_timeout_hdlr(0x7f57c459c040) [opc=0x102000b (DAOS) rpcid=0x95f8755000175e0 rank:tag=5:0] aborting to group daos_server, rank 5, tgt_uri (null)
03/02-19:51:28.53 storage01 DAOS[13113/0/3] hg ERR src/cart/crt_hg.c:1246 crt_hg_req_send_cb(0x7f57c457b9c0) [opc=0x102000b (DAOS) rpcid=0x95f8755000175dd rank:tag=2:0] RPC failed; rc: DER_TIMEDOUT(-1011): 'Time out'
03/02-19:51:28.53 storage01 DAOS[13113/0/3] corpc ERR src/cart/crt_corpc.c:648 crt_corpc_reply_hdlr(0x7f57c457b9c0) [opc=0x102000b (DAOS) rpcid=0x95f8755000175dd rank:tag=2:0] error, rc: DER_TIMEDOUT(-1011): 'Time out'
03/02-19:51:28.56 storage01 DAOS[13113/0/3] hg ERR src/cart/crt_hg.c:1246 crt_hg_req_send_cb(0x7f57c459c040) [opc=0x102000b (DAOS) rpcid=0x95f8755000175e0 rank:tag=5:0] RPC failed; rc: DER_TIMEDOUT(-1011): 'Time out'
03/02-19:51:28.56 storage01 DAOS[13113/0/3] corpc ERR src/cart/crt_corpc.c:648 crt_corpc_reply_hdlr(0x7f57c459c040) [opc=0x102000b (DAOS) rpcid=0x95f8755000175e0 rank:tag=5:0] error, rc: DER_TIMEDOUT(-1011): 'Time out'
03/02-19:51:28.56 storage01 DAOS[13113/0/3] rpc ERR src/cart/crt_context.c:381 crt_rpc_complete(0x7f57c459c500) [opc=0x102000b (DAOS) rpcid=0x95f8755000175db rank:tag=0:0] failed, DER_TIMEDOUT(-1011): 'Time out'
03/02-19:52:25.68 storage01 DAOS[13113/0/3] rpc ERR src/cart/crt_context.c:939 crt_context_timeout_check(0x7f57c4576a00) [opc=0x1020007 (DAOS) rpcid=0x95f87550001761d rank:tag=5:0] ctx_id 0, (status: 0x38) timed out (60 seconds), target (5:0)
03/02-19:52:25.68 storage01 DAOS[13113/0/3] rpc ERR src/cart/crt_context.c:888 crt_req_timeout_hdlr(0x7f57c4576a00) [opc=0x1020007 (DAOS) rpcid=0x95f87550001761d rank:tag=5:0] aborting to group daos_server, rank 5, tgt_uri (null)
03/02-19:52:25.68 storage01 DAOS[13113/0/3] hg ERR src/cart/crt_hg.c:1246 crt_hg_req_send_cb(0x7f57c4576a00) [opc=0x1020007 (DAOS) rpcid=0x95f87550001761d rank:tag=5:0] RPC failed; rc: DER_TIMEDOUT(-1011): 'Time out'
03/02-19:52:25.68 storage01 DAOS[13113/0/3] corpc ERR src/cart/crt_corpc.c:648 crt_corpc_reply_hdlr(0x7f57c4576a00) [opc=0x1020007 (DAOS) rpcid=0x95f87550001761d rank:tag=5:0] error, rc: DER_TIMEDOUT(-1011): 'Time out'
03/02-19:52:25.68 storage01 DAOS[13113/0/3] rpc ERR src/cart/crt_context.c:381 crt_rpc_complete(0x7f57c45b2460) [opc=0x1020007 (DAOS) rpcid=0x95f87550001761a rank:tag=0:0] failed, DER_TIMEDOUT(-1011): 'Time out'
03/02-19:52:25.68 storage01 DAOS[13113/0/561] mgmt ERR src/mgmt/srv_pool.c:97 ds_mgmt_tgt_pool_create_ranks() d62847b7: dss_rpc_send MGMT_TGT_CREATE: rc=DER_TIMEDOUT(-1011): 'Time out'
03/02-19:52:31.58 storage01 DAOS[13113/0/3] rpc ERR src/cart/crt_context.c:939 crt_context_timeout_check(0x7f57c457b9c0) [opc=0x102000b (DAOS) rpcid=0x95f875500017626 rank:tag=2:0] ctx_id 0, (status: 0x38) timed out (60 seconds), target (2:0)
03/02-19:52:31.58 storage01 DAOS[13113/0/3] rpc ERR src/cart/crt_context.c:888 crt_req_timeout_hdlr(0x7f57c457b9c0) [opc=0x102000b (DAOS) rpcid=0x95f875500017626 rank:tag=2:0] aborting to group daos_server, rank 2, tgt_uri (null)
03/02-19:52:31.58 storage01 DAOS[13113/0/3] rpc ERR src/cart/crt_context.c:939 crt_context_timeout_check(0x7f57c46113d0) [opc=0x102000b (DAOS) rpcid=0x95f875500017629 rank:tag=5:0] ctx_id 0, (status: 0x38) timed out (60 seconds), target (5:0)
03/02-19:52:31.58 storage01 DAOS[13113/0/3] rpc ERR src/cart/crt_context.c:888 crt_req_timeout_hdlr(0x7f57c46113d0) [opc=0x102000b (DAOS) rpcid=0x95f875500017629 rank:tag=5:0] aborting to group daos_server, rank 5, tgt_uri (null)
03/02-19:52:31.58 storage01 DAOS[13113/0/3] hg ERR src/cart/crt_hg.c:1246 crt_hg_req_send_cb(0x7f57c457b9c0) [opc=0x102000b (DAOS) rpcid=0x95f875500017626 rank:tag=2:0] RPC failed; rc: DER_TIMEDOUT(-1011): 'Time out'
03/02-19:52:31.58 storage01 DAOS[13113/0/3] corpc ERR src/cart/crt_corpc.c:648 crt_corpc_reply_hdlr(0x7f57c457b9c0) [opc=0x102000b (DAOS) rpcid=0x95f875500017626 rank:tag=2:0] error, rc: DER_TIMEDOUT(-1011): 'Time out'
03/02-19:52:31.58 storage01 DAOS[13113/0/3] hg ERR src/cart/crt_hg.c:1246 crt_hg_req_send_cb(0x7f57c46113d0) [opc=0x102000b (DAOS) rpcid=0x95f875500017629 rank:tag=5:0] RPC failed; rc: DER_TIMEDOUT(-1011): 'Time out'
03/02-19:52:31.58 storage01 DAOS[13113/0/3] corpc ERR src/cart/crt_corpc.c:648 crt_corpc_reply_hdlr(0x7f57c46113d0) [opc=0x102000b (DAOS) rpcid=0x95f875500017629 rank:tag=5:0] error, rc: DER_TIMEDOUT(-1011): 'Time out'
03/02-19:52:31.58 storage01 DAOS[13113/0/3] rpc ERR src/cart/crt_context.c:381 crt_rpc_complete(0x7f57c459c040) [opc=0x102000b (DAOS) rpcid=0x95f875500017624 rank:tag=0:0] failed, DER_TIMEDOUT(-1011): 'Time out'
03/02-19:53:25.68 storage01 DAOS[13113/0/3] rpc ERR src/cart/crt_context.c:939 crt_context_timeout_check(0x7f57c4576a00) [opc=0x1020008 (DAOS) rpcid=0x95f875500017663 rank:tag=5:0] ctx_id 0, (status: 0x38) timed out (60 seconds), target (5:0)
03/02-19:53:25.68 storage01 DAOS[13113/0/3] rpc ERR src/cart/crt_context.c:888 crt_req_timeout_hdlr(0x7f57c4576a00) [opc=0x1020008 (DAOS) rpcid=0x95f875500017663 rank:tag=5:0] aborting to group daos_server, rank 5, tgt_uri (null)
03/02-19:53:25.68 storage01 DAOS[13113/0/3] hg ERR src/cart/crt_hg.c:1246 crt_hg_req_send_cb(0x7f57c4576a00) [opc=0x1020008 (DAOS) rpcid=0x95f875500017663 rank:tag=5:0] RPC failed; rc: DER_TIMEDOUT(-1011): 'Time out'
03/02-19:53:25.68 storage01 DAOS[13113/0/3] corpc ERR src/cart/crt_corpc.c:648 crt_corpc_reply_hdlr(0x7f57c4576a00) [opc=0x1020008 (DAOS) rpcid=0x95f875500017663 rank:tag=5:0] error, rc: DER_TIMEDOUT(-1011): 'Time out'
03/02-19:53:25.68 storage01 DAOS[13113/0/3] rpc ERR src/cart/crt_context.c:381 crt_rpc_complete(0x7f57c45b2460) [opc=0x1020008 (DAOS) rpcid=0x95f875500017660 rank:tag=0:0] failed, DER_TIMEDOUT(-1011): 'Time out'
03/02-19:53:25.69 storage01 DAOS[13113/0/561] mgmt ERR src/mgmt/srv_pool.c:122 ds_mgmt_tgt_pool_create_ranks() d62847b7: failed to clean up failed pool: DER_TIMEDOUT(-1011): 'Time out'
03/02-19:53:25.69 storage01 DAOS[13113/0/561] mgmt ERR src/mgmt/srv_pool.c:193 ds_mgmt_create_pool() creating pool d62847b7 on ranks failed: rc DER_TIMEDOUT(-1011): 'Time out'
03/02-19:53:25.69 storage01 DAOS[13113/0/561] mgmt ERR src/mgmt/srv_drpc.c:485 ds_mgmt_drpc_pool_create() failed to create pool: DER_TIMEDOUT(-1011): 'Time out'
03/02-19:53:25.69 storage01 DAOS[13113/2/1] daos INFO src/engine/drpc_progress.c:392 process_session_activity() Session 1619 connection has been terminated
Furthermore, there are one 1G ethernet and two IB 200 nics per node. Ethernet has subnet 10.166.15.*/24, ib0 and ib1 have subnet like 192.168.15.*/24. 253 sysctl -w net.ipv4.conf.all.accept_local=1
254 sysctl -w net.ipv4.conf.all.arp_ignore=2
255 sysctl -w net.ipv4.conf.ib0.rp_filter=2
256 sysctl -w net.ipv4.conf.ib1.rp_filter=2
258 sysctl -w net.ipv4.conf.ens21f0.rp_filter=2
|
|
Re: Questions about daos evolution and design
Niu, Yawei
I see, so it looks to me same to the question 4 of "append write"? Hope my answer is helpful.
Thanks -Niu |
|
Re: Questions about daos evolution and design
hongxunlinpub@...
Q3:I mean ROW(Redirect-on-write ) .
Thanks -Lin |
|