Re: Message looks serious?
Colin Ngam
Hi WangDi,
commit 8200a7fb403e091b51b4b00c1aec57dafefb1ada
[daos@hl-d106 scripts]$ daos pool list-cont --pool 96670dea-d357-4235-8659-dac16d01b1c2 --svc 40 [daos@hl-d106 scripts]$
… No error message in terminal window ..
[daos@hl-d106 scripts]$ which daos ~/daos/install/bin/daos [daos@hl-d106 scripts]$ ldd ~/daos/install/bin/daos linux-vdso.so.1 => (0x00007ffece314000) libdaos.so.0 => /home/users/daos/daos/install/lib64/libdaos.so.0 (0x00007f635ae8c000) libdaos_common.so => /home/users/daos/daos/install/lib64/libdaos_common.so (0x00007f635ac1d000) libuuid.so.1 => /lib64/libuuid.so.1 (0x00007f635aa18000) libdfs.so => /home/users/daos/daos/install/lib64/libdfs.so (0x00007f635a7fa000) libduns.so => /home/users/daos/daos/install/lib64/libduns.so (0x00007f635a5f5000) libgurt.so.4 => /home/users/daos/daos/install/lib64/libgurt.so.4 (0x00007f635a3d2000) libcart.so.4 => /home/users/daos/daos/install/lib64/libcart.so.4 (0x00007f635a0ef000) libc.so.6 => /lib64/libc.so.6 (0x00007f6359d21000) /lib64/ld-linux-x86-64.so.2 (0x00007f635b167000) libpmemobj.so.1 => /home/users/daos/daos/install/lib/libpmemobj.so.1 (0x00007f6359ae0000) libisal.so.2 => /home/users/daos/daos/install/lib/libisal.so.2 (0x00007f63598a2000) libprotobuf-c.so.1 => /home/users/daos/daos/install/lib/libprotobuf-c.so.1 (0x00007f6359699000) libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f635947d000) libyaml-0.so.2 => /lib64/libyaml-0.so.2 (0x00007f635925d000) libmercury.so.2 => /home/users/daos/daos/install/lib/libmercury.so.2 (0x00007f6359043000) libna.so.2 => /home/users/daos/daos/install/lib/libna.so.2 (0x00007f6358e25000) libmercury_util.so.2 => /home/users/daos/daos/install/lib/libmercury_util.so.2 (0x00007f6358c1e000) libpmem.so.1 => /home/users/daos/daos/install/lib/libpmem.so.1 (0x00007f63589f5000) libdl.so.2 => /lib64/libdl.so.2 (0x00007f63587f1000) librt.so.1 => /lib64/librt.so.1 (0x00007f63585e9000) libfabric.so.1 => /home/users/daos/daos/install/lib/libfabric.so.1 (0x00007f6358210000) librdmacm.so.1 => /lib64/librdmacm.so.1 (0x00007f6357ff7000) libibverbs.so.1 => /lib64/libibverbs.so.1 (0x00007f6357ddc000) libnl-3.so.200 => /lib64/libnl-3.so.200 (0x00007f6357bbb000) libnl-route-3.so.200 => /lib64/libnl-route-3.so.200 (0x00007f635794e000) libpsm2.so.2 => /home/users/daos/daos/install/lib64/libpsm2.so.2 (0x00007f63576eb000) libm.so.6 => /lib64/libm.so.6 (0x00007f63573e9000) libnuma.so.1 => /lib64/libnuma.so.1 (0x00007f63571dd000) libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f6356fc7000) [daos@hl-d106 scripts]$
daos.log: 05/15-09:57:58.58 hl-d106 DAOS[38462/38462] fi INFO src/cart/src/gurt/fault_inject.c:486 d_fault_inject_init() No config file, fault injection is OFF. 05/15-09:57:58.58 hl-d106 DAOS[38462/38462] crt INFO src/cart/src/cart/crt_init.c:282 crt_init_opt() libcart version 4.7.0 initializing 05/15-09:57:58.58 hl-d106 DAOS[38462/38462] crt WARN src/cart/src/cart/crt_init.c:174 data_init() FI_UNIVERSE_SIZE was not set; setting to 2048 05/15-09:57:58.58 hl-d106 DAOS[38462/38462] crt WARN src/cart/src/cart/crt_init.c:393 crt_init_opt() FI_OFI_RXM_USE_SRX not set, set=1 05/15-09:57:58.95 hl-d106 DAOS[38462/38462] client INFO src/utils/daos.c:142 cmd_args_print() DAOS system name: daos_server 05/15-09:57:58.95 hl-d106 DAOS[38462/38462] client INFO src/utils/daos.c:143 cmd_args_print() pool UUID: 96670dea-d357-4235-8659-dac16d01b1c2 05/15-09:57:58.95 hl-d106 DAOS[38462/38462] client INFO src/utils/daos.c:144 cmd_args_print() cont UUID: 00000000-0000-0000-0000-000000000000 05/15-09:57:58.95 hl-d106 DAOS[38462/38462] client INFO src/utils/daos.c:148 cmd_args_print() pool svc: parsed 1 ranks from input 40 05/15-09:57:58.95 hl-d106 DAOS[38462/38462] client INFO src/utils/daos.c:152 cmd_args_print() attr: name=NULL, value=NULL 05/15-09:57:58.95 hl-d106 DAOS[38462/38462] client INFO src/utils/daos.c:156 cmd_args_print() path=NULL, type=unknown, oclass=UNKNOWN, chunk_size=0 05/15-09:57:58.95 hl-d106 DAOS[38462/38462] client INFO src/utils/daos.c:162 cmd_args_print() snapshot: name=NULL, epoch=0, epoch range=NULL (0-0) 05/15-09:57:58.95 hl-d106 DAOS[38462/38462] client INFO src/utils/daos.c:163 cmd_args_print() oid: 0.0 05/15-09:57:59.03 hl-d106 DAOS[38462/38462] daos INFO src/common/drpc.c:664 drpc_close() Closing dRPC socket fd=19 05/15-09:57:59.06 hl-d106 DAOS[38462/38462] daos INFO src/common/drpc.c:664 drpc_close() Closing dRPC socket fd=19 05/15-09:57:59.11 hl-d106 DAOS[38462/38462] rpc ERR src/cart/src/cart/crt_context.c:302 crt_rpc_complete(0xc5cfe0) [opc=0x2010001 xid=0x0 rank:tag=40:0] RPC failed; rc: -1019 05/15-09:57:59.11 hl-d106 DAOS[38462/38462] common ERR src/common/rsvc.c:142 rsvc_client_process_error() removed rank 40 from replica list due to DER_OOG(-1019) 05/15-09:57:59.11 hl-d106 DAOS[38462/38462] common WARN src/common/rsvc.c:102 rsvc_client_choose() replica list empty 05/15-09:57:59.11 hl-d106 DAOS[38462/38462] pool ERR src/pool/cli.c:539 dc_pool_connect() 96670dea: cannot find pool service: DER_NOTREPLICA(-2020) failed to connect to pool: -1005
Thanks.
Colin
From: <daos@daos.groups.io> on behalf of "Wang, Di" <di.wang@...>
Hello, Colin
I tried with this commit, and it can generate the failure message on my env.
I assume you build the source yourself? And what is your output of “ldd install/bin/daos” ?
Thanks WangDi From: <daos@daos.groups.io> on behalf of Colin Ngam <colin.ngam@...>
Greetings,
commit 8200a7fb403e091b51b4b00c1aec57dafefb1ada
Thanks.
Colin
From: <daos@daos.groups.io> on behalf of "Wang, Di" <di.wang@...>
Hello,
Thanks. This does show the connection failed with –svc 40. I am not sure why it does not output any failure messages. But I do see others also complained about zero failure message. Which version are you using 0.9 or master?
Btw: these server log might be seen during pool creation as well. It is a known issue, and we will fix it. Thanks.
"05/13-11:57:11.02 delphi-006 DAOS[26509/26552] pool WARN src/pool/srv_target.c:1020 ds_pool_tgt_map_update() Ignore update pool a68b3845 1 -> 1
Thanks Wangdi From: <daos@daos.groups.io> on behalf of Colin Ngam <colin.ngam@...>
Hi WangDi,
Is this what you need:
05/14-09:53:00.86 hl-d106 DAOS[20928/20928] fi INFO src/cart/src/gurt/fault_inject.c:486 d_fault_inject_init() No config file, fault injection is OFF. 05/14-09:53:00.86 hl-d106 DAOS[20928/20928] crt INFO src/cart/src/cart/crt_init.c:282 crt_init_opt() libcart version 4.7.0 initializing 05/14-09:53:00.86 hl-d106 DAOS[20928/20928] crt WARN src/cart/src/cart/crt_init.c:174 data_init() FI_UNIVERSE_SIZE was not set; setting to 2048 05/14-09:53:00.86 hl-d106 DAOS[20928/20928] crt WARN src/cart/src/cart/crt_init.c:393 crt_init_opt() FI_OFI_RXM_USE_SRX not set, set=1 05/14-09:53:01.23 hl-d106 DAOS[20928/20928] client INFO src/utils/daos.c:142 cmd_args_print() DAOS system name: daos_server 05/14-09:53:01.23 hl-d106 DAOS[20928/20928] client INFO src/utils/daos.c:143 cmd_args_print() pool UUID: 6eb32fb0-49e9-49fd-96e8-bba14728a8c3 05/14-09:53:01.23 hl-d106 DAOS[20928/20928] client INFO src/utils/daos.c:144 cmd_args_print() cont UUID: 00000000-0000-0000-0000-000000000000 05/14-09:53:01.23 hl-d106 DAOS[20928/20928] client INFO src/utils/daos.c:148 cmd_args_print() pool svc: parsed 1 ranks from input 1 05/14-09:53:01.23 hl-d106 DAOS[20928/20928] client INFO src/utils/daos.c:152 cmd_args_print() attr: name=NULL, value=NULL 05/14-09:53:01.23 hl-d106 DAOS[20928/20928] client INFO src/utils/daos.c:156 cmd_args_print() path=NULL, type=unknown, oclass=UNKNOWN, chunk_size=0 05/14-09:53:01.23 hl-d106 DAOS[20928/20928] client INFO src/utils/daos.c:162 cmd_args_print() snapshot: name=NULL, epoch=0, epoch range=NULL (0-0) 05/14-09:53:01.23 hl-d106 DAOS[20928/20928] client INFO src/utils/daos.c:163 cmd_args_print() oid: 0.0 05/14-09:53:01.31 hl-d106 DAOS[20928/20928] daos INFO src/common/drpc.c:664 drpc_close() Closing dRPC socket fd=19 05/14-09:53:01.35 hl-d106 DAOS[20928/20928] daos INFO src/common/drpc.c:664 drpc_close() Closing dRPC socket fd=19 05/14-09:53:01.40 hl-d106 DAOS[20928/20928] common ERR src/common/rsvc.c:142 rsvc_client_process_error() removed rank 1 from replica list due to DER_NOTREPLICA(-2020) 05/14-09:53:01.40 hl-d106 DAOS[20928/20928] common WARN src/common/rsvc.c:102 rsvc_client_choose() replica list empty 05/14-09:53:01.40 hl-d106 DAOS[20928/20928] pool ERR src/pool/cli.c:539 dc_pool_connect() 6eb32fb0: cannot find pool service: DER_NOTREPLICA(-2020) failed to connect to pool: -1005
This comes from the daos.log on the Client.
Thanks.
Colin
From: <daos@daos.groups.io> on behalf of "Wang, Di" <di.wang@...>
If 40 does not exist, it should not be able to connect to the pool at all, I.e. it should output sth like "failed to connect to pool: …”.
These logs seems suggesting the pool connection did happen. Would you please collect the client side daos log? (by "export D_LOG_FILE=xxx “? It might tell us what really happened. Thanks.
Thanks WangDi From: <daos@daos.groups.io> on behalf of Colin Ngam <cngam@...>
Greetings,
Executing the command: daos pool list-cont --pool a68b3845-fe78-481e-aa84-164e851d5f52 --svc 40
Note that 40 does not exist.
We did not get an error from the daos command.
In the log:
05/13-11:57:11.02 delphi-006 DAOS[26509/26552] pool WARN src/pool/srv_target.c:1020 ds_pool_tgt_map_update() Ignore update pool a68b3845 1 -> 1 05/13-11:57:11.02 delphi-006 DAOS[26509/26552] pool WARN src/pool/srv_target.c:1020 ds_pool_tgt_map_update() Ignore update pool a68b3845 1 -> 1 05/13-11:57:11.02 delphi-006 DAOS[26509/26552] pool WARN src/pool/srv_target.c:1020 ds_pool_tgt_map_update() Ignore update pool a68b3845 1 -> 1
My guess is that ds_pool_tgt_map_update() should not even be called?
Cheers,
Colin
|
|