Message looks serious?


Colin Ngam <cngam@...>
 

Greetings,

 

Executing the command:

daos pool list-cont --pool a68b3845-fe78-481e-aa84-164e851d5f52 --svc 40

 

Note that 40 does not exist.

 

We did not get an error from the daos command.

 

In the log:

 

05/13-11:57:11.02 delphi-006 DAOS[26509/26552] pool WARN src/pool/srv_target.c:1020 ds_pool_tgt_map_update() Ignore update pool a68b3845 1 -> 1

05/13-11:57:11.02 delphi-006 DAOS[26509/26552] pool WARN src/pool/srv_target.c:1020 ds_pool_tgt_map_update() Ignore update pool a68b3845 1 -> 1

05/13-11:57:11.02 delphi-006 DAOS[26509/26552] pool WARN src/pool/srv_target.c:1020 ds_pool_tgt_map_update() Ignore update pool a68b3845 1 -> 1

 

My guess is that ds_pool_tgt_map_update() should not even be called?

 

Cheers,

 

Colin

 


Wang, Di
 

If 40 does not exist, it should not be able to connect to the pool at all, I.e. it should output sth like "failed to connect to pool: …”.

These logs seems suggesting the pool connection did happen.  Would you please collect the client side daos log? (by "export D_LOG_FILE=xxx “? It might tell us what really happened. Thanks.

Thanks
WangDi
From: <daos@daos.groups.io> on behalf of Colin Ngam <cngam@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Wednesday, May 13, 2020 at 10:08 AM
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: [daos] Message looks serious?

Greetings,

 

Executing the command:

daos pool list-cont --pool a68b3845-fe78-481e-aa84-164e851d5f52 --svc 40

 

Note that 40 does not exist.

 

We did not get an error from the daos command.

 

In the log:

 

05/13-11:57:11.02 delphi-006 DAOS[26509/26552] pool WARN src/pool/srv_target.c:1020 ds_pool_tgt_map_update() Ignore update pool a68b3845 1 -> 1

05/13-11:57:11.02 delphi-006 DAOS[26509/26552] pool WARN src/pool/srv_target.c:1020 ds_pool_tgt_map_update() Ignore update pool a68b3845 1 -> 1

05/13-11:57:11.02 delphi-006 DAOS[26509/26552] pool WARN src/pool/srv_target.c:1020 ds_pool_tgt_map_update() Ignore update pool a68b3845 1 -> 1

 

My guess is that ds_pool_tgt_map_update() should not even be called?

 

Cheers,

 

Colin

 


Colin Ngam
 

Hi WangDi,

 

Is this what you need:

 

05/14-09:53:00.86 hl-d106 DAOS[20928/20928] fi   INFO src/cart/src/gurt/fault_inject.c:486 d_fault_inject_init() No config file, fault injection is OFF.

05/14-09:53:00.86 hl-d106 DAOS[20928/20928] crt  INFO src/cart/src/cart/crt_init.c:282 crt_init_opt() libcart version 4.7.0 initializing

05/14-09:53:00.86 hl-d106 DAOS[20928/20928] crt  WARN src/cart/src/cart/crt_init.c:174 data_init() FI_UNIVERSE_SIZE was not set; setting to 2048

05/14-09:53:00.86 hl-d106 DAOS[20928/20928] crt  WARN src/cart/src/cart/crt_init.c:393 crt_init_opt() FI_OFI_RXM_USE_SRX not set, set=1

05/14-09:53:01.23 hl-d106 DAOS[20928/20928] client INFO src/utils/daos.c:142 cmd_args_print()   DAOS system name: daos_server

05/14-09:53:01.23 hl-d106 DAOS[20928/20928] client INFO src/utils/daos.c:143 cmd_args_print()   pool UUID: 6eb32fb0-49e9-49fd-96e8-bba14728a8c3

05/14-09:53:01.23 hl-d106 DAOS[20928/20928] client INFO src/utils/daos.c:144 cmd_args_print()   cont UUID: 00000000-0000-0000-0000-000000000000

05/14-09:53:01.23 hl-d106 DAOS[20928/20928] client INFO src/utils/daos.c:148 cmd_args_print()   pool svc: parsed 1 ranks from input 1

05/14-09:53:01.23 hl-d106 DAOS[20928/20928] client INFO src/utils/daos.c:152 cmd_args_print()   attr: name=NULL, value=NULL

05/14-09:53:01.23 hl-d106 DAOS[20928/20928] client INFO src/utils/daos.c:156 cmd_args_print()   path=NULL, type=unknown, oclass=UNKNOWN, chunk_size=0

05/14-09:53:01.23 hl-d106 DAOS[20928/20928] client INFO src/utils/daos.c:162 cmd_args_print()   snapshot: name=NULL, epoch=0, epoch range=NULL (0-0)

05/14-09:53:01.23 hl-d106 DAOS[20928/20928] client INFO src/utils/daos.c:163 cmd_args_print()   oid: 0.0

05/14-09:53:01.31 hl-d106 DAOS[20928/20928] daos INFO src/common/drpc.c:664 drpc_close() Closing dRPC socket fd=19

05/14-09:53:01.35 hl-d106 DAOS[20928/20928] daos INFO src/common/drpc.c:664 drpc_close() Closing dRPC socket fd=19

05/14-09:53:01.40 hl-d106 DAOS[20928/20928] common ERR  src/common/rsvc.c:142 rsvc_client_process_error() removed rank 1 from replica list due to DER_NOTREPLICA(-2020)

05/14-09:53:01.40 hl-d106 DAOS[20928/20928] common WARN src/common/rsvc.c:102 rsvc_client_choose() replica list empty

05/14-09:53:01.40 hl-d106 DAOS[20928/20928] pool ERR  src/pool/cli.c:539 dc_pool_connect() 6eb32fb0: cannot find pool service: DER_NOTREPLICA(-2020)

failed to connect to pool: -1005

 

This comes from the daos.log on the Client.

 

Thanks.

 

Colin

 

From: <daos@daos.groups.io> on behalf of "Wang, Di" <di.wang@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Wednesday, May 13, 2020 at 4:07 PM
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: Re: [daos] Message looks serious?

 

If 40 does not exist, it should not be able to connect to the pool at all, I.e. it should output sth like "failed to connect to pool: …”.

 

These logs seems suggesting the pool connection did happen.  Would you please collect the client side daos log? (by "export D_LOG_FILE=xxx “? It might tell us what really happened. Thanks.

 

Thanks

WangDi

From: <daos@daos.groups.io> on behalf of Colin Ngam <cngam@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Wednesday, May 13, 2020 at 10:08 AM
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: [daos] Message looks serious?

 

Greetings,

 

Executing the command:

daos pool list-cont --pool a68b3845-fe78-481e-aa84-164e851d5f52 --svc 40

 

Note that 40 does not exist.

 

We did not get an error from the daos command.

 

In the log:

 

05/13-11:57:11.02 delphi-006 DAOS[26509/26552] pool WARN src/pool/srv_target.c:1020 ds_pool_tgt_map_update() Ignore update pool a68b3845 1 -> 1

05/13-11:57:11.02 delphi-006 DAOS[26509/26552] pool WARN src/pool/srv_target.c:1020 ds_pool_tgt_map_update() Ignore update pool a68b3845 1 -> 1

05/13-11:57:11.02 delphi-006 DAOS[26509/26552] pool WARN src/pool/srv_target.c:1020 ds_pool_tgt_map_update() Ignore update pool a68b3845 1 -> 1

 

My guess is that ds_pool_tgt_map_update() should not even be called?

 

Cheers,

 

Colin

 


Wang, Di
 

Hello, 

Thanks.  This does show the connection failed with –svc 40.  I am not sure why it does not output any failure messages. But I do see others also complained about zero failure message.  Which version are you using 0.9 or master?

Btw: these server log might be seen during pool creation as well.  It is a known issue, and we will fix it. Thanks.

"05/13-11:57:11.02 delphi-006 DAOS[26509/26552] pool WARN src/pool/srv_target.c:1020 ds_pool_tgt_map_update() Ignore update pool a68b3845 1 -> 1


Thanks

Wangdi

From: <daos@daos.groups.io> on behalf of Colin Ngam <colin.ngam@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Thursday, May 14, 2020 at 8:06 AM
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: Re: [daos] Message looks serious?

Hi WangDi,

 

Is this what you need:

 

05/14-09:53:00.86 hl-d106 DAOS[20928/20928] fi   INFO src/cart/src/gurt/fault_inject.c:486 d_fault_inject_init() No config file, fault injection is OFF.

05/14-09:53:00.86 hl-d106 DAOS[20928/20928] crt  INFO src/cart/src/cart/crt_init.c:282 crt_init_opt() libcart version 4.7.0 initializing

05/14-09:53:00.86 hl-d106 DAOS[20928/20928] crt  WARN src/cart/src/cart/crt_init.c:174 data_init() FI_UNIVERSE_SIZE was not set; setting to 2048

05/14-09:53:00.86 hl-d106 DAOS[20928/20928] crt  WARN src/cart/src/cart/crt_init.c:393 crt_init_opt() FI_OFI_RXM_USE_SRX not set, set=1

05/14-09:53:01.23 hl-d106 DAOS[20928/20928] client INFO src/utils/daos.c:142 cmd_args_print()   DAOS system name: daos_server

05/14-09:53:01.23 hl-d106 DAOS[20928/20928] client INFO src/utils/daos.c:143 cmd_args_print()   pool UUID: 6eb32fb0-49e9-49fd-96e8-bba14728a8c3

05/14-09:53:01.23 hl-d106 DAOS[20928/20928] client INFO src/utils/daos.c:144 cmd_args_print()   cont UUID: 00000000-0000-0000-0000-000000000000

05/14-09:53:01.23 hl-d106 DAOS[20928/20928] client INFO src/utils/daos.c:148 cmd_args_print()   pool svc: parsed 1 ranks from input 1

05/14-09:53:01.23 hl-d106 DAOS[20928/20928] client INFO src/utils/daos.c:152 cmd_args_print()   attr: name=NULL, value=NULL

05/14-09:53:01.23 hl-d106 DAOS[20928/20928] client INFO src/utils/daos.c:156 cmd_args_print()   path=NULL, type=unknown, oclass=UNKNOWN, chunk_size=0

05/14-09:53:01.23 hl-d106 DAOS[20928/20928] client INFO src/utils/daos.c:162 cmd_args_print()   snapshot: name=NULL, epoch=0, epoch range=NULL (0-0)

05/14-09:53:01.23 hl-d106 DAOS[20928/20928] client INFO src/utils/daos.c:163 cmd_args_print()   oid: 0.0

05/14-09:53:01.31 hl-d106 DAOS[20928/20928] daos INFO src/common/drpc.c:664 drpc_close() Closing dRPC socket fd=19

05/14-09:53:01.35 hl-d106 DAOS[20928/20928] daos INFO src/common/drpc.c:664 drpc_close() Closing dRPC socket fd=19

05/14-09:53:01.40 hl-d106 DAOS[20928/20928] common ERR  src/common/rsvc.c:142 rsvc_client_process_error() removed rank 1 from replica list due to DER_NOTREPLICA(-2020)

05/14-09:53:01.40 hl-d106 DAOS[20928/20928] common WARN src/common/rsvc.c:102 rsvc_client_choose() replica list empty

05/14-09:53:01.40 hl-d106 DAOS[20928/20928] pool ERR  src/pool/cli.c:539 dc_pool_connect() 6eb32fb0: cannot find pool service: DER_NOTREPLICA(-2020)

failed to connect to pool: -1005

 

This comes from the daos.log on the Client.

 

Thanks.

 

Colin

 

From: <daos@daos.groups.io> on behalf of "Wang, Di" <di.wang@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Wednesday, May 13, 2020 at 4:07 PM
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: Re: [daos] Message looks serious?

 

If 40 does not exist, it should not be able to connect to the pool at all, I.e. it should output sth like "failed to connect to pool: …”.

 

These logs seems suggesting the pool connection did happen.  Would you please collect the client side daos log? (by "export D_LOG_FILE=xxx “? It might tell us what really happened. Thanks.

 

Thanks

WangDi

From: <daos@daos.groups.io> on behalf of Colin Ngam <cngam@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Wednesday, May 13, 2020 at 10:08 AM
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: [daos] Message looks serious?

 

Greetings,

 

Executing the command:

daos pool list-cont --pool a68b3845-fe78-481e-aa84-164e851d5f52 --svc 40

 

Note that 40 does not exist.

 

We did not get an error from the daos command.

 

In the log:

 

05/13-11:57:11.02 delphi-006 DAOS[26509/26552] pool WARN src/pool/srv_target.c:1020 ds_pool_tgt_map_update() Ignore update pool a68b3845 1 -> 1

05/13-11:57:11.02 delphi-006 DAOS[26509/26552] pool WARN src/pool/srv_target.c:1020 ds_pool_tgt_map_update() Ignore update pool a68b3845 1 -> 1

05/13-11:57:11.02 delphi-006 DAOS[26509/26552] pool WARN src/pool/srv_target.c:1020 ds_pool_tgt_map_update() Ignore update pool a68b3845 1 -> 1

 

My guess is that ds_pool_tgt_map_update() should not even be called?

 

Cheers,

 

Colin

 


Colin Ngam
 

Greetings,

 

commit 8200a7fb403e091b51b4b00c1aec57dafefb1ada

 

Thanks.

 

Colin

 

From: <daos@daos.groups.io> on behalf of "Wang, Di" <di.wang@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Thursday, May 14, 2020 at 11:37 AM
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: Re: [daos] Message looks serious?

 

Hello, 

 

Thanks.  This does show the connection failed with –svc 40.  I am not sure why it does not output any failure messages. But I do see others also complained about zero failure message.  Which version are you using 0.9 or master?

 

Btw: these server log might be seen during pool creation as well.  It is a known issue, and we will fix it. Thanks.

 

"05/13-11:57:11.02 delphi-006 DAOS[26509/26552] pool WARN src/pool/srv_target.c:1020 ds_pool_tgt_map_update() Ignore update pool a68b3845 1 -> 1

 

Thanks

Wangdi

From: <daos@daos.groups.io> on behalf of Colin Ngam <colin.ngam@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Thursday, May 14, 2020 at 8:06 AM
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: Re: [daos] Message looks serious?

 

Hi WangDi,

 

Is this what you need:

 

05/14-09:53:00.86 hl-d106 DAOS[20928/20928] fi   INFO src/cart/src/gurt/fault_inject.c:486 d_fault_inject_init() No config file, fault injection is OFF.

05/14-09:53:00.86 hl-d106 DAOS[20928/20928] crt  INFO src/cart/src/cart/crt_init.c:282 crt_init_opt() libcart version 4.7.0 initializing

05/14-09:53:00.86 hl-d106 DAOS[20928/20928] crt  WARN src/cart/src/cart/crt_init.c:174 data_init() FI_UNIVERSE_SIZE was not set; setting to 2048

05/14-09:53:00.86 hl-d106 DAOS[20928/20928] crt  WARN src/cart/src/cart/crt_init.c:393 crt_init_opt() FI_OFI_RXM_USE_SRX not set, set=1

05/14-09:53:01.23 hl-d106 DAOS[20928/20928] client INFO src/utils/daos.c:142 cmd_args_print()   DAOS system name: daos_server

05/14-09:53:01.23 hl-d106 DAOS[20928/20928] client INFO src/utils/daos.c:143 cmd_args_print()   pool UUID: 6eb32fb0-49e9-49fd-96e8-bba14728a8c3

05/14-09:53:01.23 hl-d106 DAOS[20928/20928] client INFO src/utils/daos.c:144 cmd_args_print()   cont UUID: 00000000-0000-0000-0000-000000000000

05/14-09:53:01.23 hl-d106 DAOS[20928/20928] client INFO src/utils/daos.c:148 cmd_args_print()   pool svc: parsed 1 ranks from input 1

05/14-09:53:01.23 hl-d106 DAOS[20928/20928] client INFO src/utils/daos.c:152 cmd_args_print()   attr: name=NULL, value=NULL

05/14-09:53:01.23 hl-d106 DAOS[20928/20928] client INFO src/utils/daos.c:156 cmd_args_print()   path=NULL, type=unknown, oclass=UNKNOWN, chunk_size=0

05/14-09:53:01.23 hl-d106 DAOS[20928/20928] client INFO src/utils/daos.c:162 cmd_args_print()   snapshot: name=NULL, epoch=0, epoch range=NULL (0-0)

05/14-09:53:01.23 hl-d106 DAOS[20928/20928] client INFO src/utils/daos.c:163 cmd_args_print()   oid: 0.0

05/14-09:53:01.31 hl-d106 DAOS[20928/20928] daos INFO src/common/drpc.c:664 drpc_close() Closing dRPC socket fd=19

05/14-09:53:01.35 hl-d106 DAOS[20928/20928] daos INFO src/common/drpc.c:664 drpc_close() Closing dRPC socket fd=19

05/14-09:53:01.40 hl-d106 DAOS[20928/20928] common ERR  src/common/rsvc.c:142 rsvc_client_process_error() removed rank 1 from replica list due to DER_NOTREPLICA(-2020)

05/14-09:53:01.40 hl-d106 DAOS[20928/20928] common WARN src/common/rsvc.c:102 rsvc_client_choose() replica list empty

05/14-09:53:01.40 hl-d106 DAOS[20928/20928] pool ERR  src/pool/cli.c:539 dc_pool_connect() 6eb32fb0: cannot find pool service: DER_NOTREPLICA(-2020)

failed to connect to pool: -1005

 

This comes from the daos.log on the Client.

 

Thanks.

 

Colin

 

From: <daos@daos.groups.io> on behalf of "Wang, Di" <di.wang@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Wednesday, May 13, 2020 at 4:07 PM
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: Re: [daos] Message looks serious?

 

If 40 does not exist, it should not be able to connect to the pool at all, I.e. it should output sth like "failed to connect to pool: …”.

 

These logs seems suggesting the pool connection did happen.  Would you please collect the client side daos log? (by "export D_LOG_FILE=xxx “? It might tell us what really happened. Thanks.

 

Thanks

WangDi

From: <daos@daos.groups.io> on behalf of Colin Ngam <cngam@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Wednesday, May 13, 2020 at 10:08 AM
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: [daos] Message looks serious?

 

Greetings,

 

Executing the command:

daos pool list-cont --pool a68b3845-fe78-481e-aa84-164e851d5f52 --svc 40

 

Note that 40 does not exist.

 

We did not get an error from the daos command.

 

In the log:

 

05/13-11:57:11.02 delphi-006 DAOS[26509/26552] pool WARN src/pool/srv_target.c:1020 ds_pool_tgt_map_update() Ignore update pool a68b3845 1 -> 1

05/13-11:57:11.02 delphi-006 DAOS[26509/26552] pool WARN src/pool/srv_target.c:1020 ds_pool_tgt_map_update() Ignore update pool a68b3845 1 -> 1

05/13-11:57:11.02 delphi-006 DAOS[26509/26552] pool WARN src/pool/srv_target.c:1020 ds_pool_tgt_map_update() Ignore update pool a68b3845 1 -> 1

 

My guess is that ds_pool_tgt_map_update() should not even be called?

 

Cheers,

 

Colin

 


Wang, Di
 

Hello, Colin

I tried with this commit, and it can generate the failure message on my env. 

I assume you build the source yourself? And what is your output of “ldd install/bin/daos” ?

Thanks
WangDi 
From: <daos@daos.groups.io> on behalf of Colin Ngam <colin.ngam@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Thursday, May 14, 2020 at 9:41 AM
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: Re: [daos] Message looks serious?

Greetings,

 

commit 8200a7fb403e091b51b4b00c1aec57dafefb1ada

 

Thanks.

 

Colin

 

From: <daos@daos.groups.io> on behalf of "Wang, Di" <di.wang@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Thursday, May 14, 2020 at 11:37 AM
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: Re: [daos] Message looks serious?

 

Hello, 

 

Thanks.  This does show the connection failed with –svc 40.  I am not sure why it does not output any failure messages. But I do see others also complained about zero failure message.  Which version are you using 0.9 or master?

 

Btw: these server log might be seen during pool creation as well.  It is a known issue, and we will fix it. Thanks.

 

"05/13-11:57:11.02 delphi-006 DAOS[26509/26552] pool WARN src/pool/srv_target.c:1020 ds_pool_tgt_map_update() Ignore update pool a68b3845 1 -> 1

 

Thanks

Wangdi

From: <daos@daos.groups.io> on behalf of Colin Ngam <colin.ngam@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Thursday, May 14, 2020 at 8:06 AM
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: Re: [daos] Message looks serious?

 

Hi WangDi,

 

Is this what you need:

 

05/14-09:53:00.86 hl-d106 DAOS[20928/20928] fi   INFO src/cart/src/gurt/fault_inject.c:486 d_fault_inject_init() No config file, fault injection is OFF.

05/14-09:53:00.86 hl-d106 DAOS[20928/20928] crt  INFO src/cart/src/cart/crt_init.c:282 crt_init_opt() libcart version 4.7.0 initializing

05/14-09:53:00.86 hl-d106 DAOS[20928/20928] crt  WARN src/cart/src/cart/crt_init.c:174 data_init() FI_UNIVERSE_SIZE was not set; setting to 2048

05/14-09:53:00.86 hl-d106 DAOS[20928/20928] crt  WARN src/cart/src/cart/crt_init.c:393 crt_init_opt() FI_OFI_RXM_USE_SRX not set, set=1

05/14-09:53:01.23 hl-d106 DAOS[20928/20928] client INFO src/utils/daos.c:142 cmd_args_print()   DAOS system name: daos_server

05/14-09:53:01.23 hl-d106 DAOS[20928/20928] client INFO src/utils/daos.c:143 cmd_args_print()   pool UUID: 6eb32fb0-49e9-49fd-96e8-bba14728a8c3

05/14-09:53:01.23 hl-d106 DAOS[20928/20928] client INFO src/utils/daos.c:144 cmd_args_print()   cont UUID: 00000000-0000-0000-0000-000000000000

05/14-09:53:01.23 hl-d106 DAOS[20928/20928] client INFO src/utils/daos.c:148 cmd_args_print()   pool svc: parsed 1 ranks from input 1

05/14-09:53:01.23 hl-d106 DAOS[20928/20928] client INFO src/utils/daos.c:152 cmd_args_print()   attr: name=NULL, value=NULL

05/14-09:53:01.23 hl-d106 DAOS[20928/20928] client INFO src/utils/daos.c:156 cmd_args_print()   path=NULL, type=unknown, oclass=UNKNOWN, chunk_size=0

05/14-09:53:01.23 hl-d106 DAOS[20928/20928] client INFO src/utils/daos.c:162 cmd_args_print()   snapshot: name=NULL, epoch=0, epoch range=NULL (0-0)

05/14-09:53:01.23 hl-d106 DAOS[20928/20928] client INFO src/utils/daos.c:163 cmd_args_print()   oid: 0.0

05/14-09:53:01.31 hl-d106 DAOS[20928/20928] daos INFO src/common/drpc.c:664 drpc_close() Closing dRPC socket fd=19

05/14-09:53:01.35 hl-d106 DAOS[20928/20928] daos INFO src/common/drpc.c:664 drpc_close() Closing dRPC socket fd=19

05/14-09:53:01.40 hl-d106 DAOS[20928/20928] common ERR  src/common/rsvc.c:142 rsvc_client_process_error() removed rank 1 from replica list due to DER_NOTREPLICA(-2020)

05/14-09:53:01.40 hl-d106 DAOS[20928/20928] common WARN src/common/rsvc.c:102 rsvc_client_choose() replica list empty

05/14-09:53:01.40 hl-d106 DAOS[20928/20928] pool ERR  src/pool/cli.c:539 dc_pool_connect() 6eb32fb0: cannot find pool service: DER_NOTREPLICA(-2020)

failed to connect to pool: -1005

 

This comes from the daos.log on the Client.

 

Thanks.

 

Colin

 

From: <daos@daos.groups.io> on behalf of "Wang, Di" <di.wang@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Wednesday, May 13, 2020 at 4:07 PM
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: Re: [daos] Message looks serious?

 

If 40 does not exist, it should not be able to connect to the pool at all, I.e. it should output sth like "failed to connect to pool: …”.

 

These logs seems suggesting the pool connection did happen.  Would you please collect the client side daos log? (by "export D_LOG_FILE=xxx “? It might tell us what really happened. Thanks.

 

Thanks

WangDi

From: <daos@daos.groups.io> on behalf of Colin Ngam <cngam@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Wednesday, May 13, 2020 at 10:08 AM
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: [daos] Message looks serious?

 

Greetings,

 

Executing the command:

daos pool list-cont --pool a68b3845-fe78-481e-aa84-164e851d5f52 --svc 40

 

Note that 40 does not exist.

 

We did not get an error from the daos command.

 

In the log:

 

05/13-11:57:11.02 delphi-006 DAOS[26509/26552] pool WARN src/pool/srv_target.c:1020 ds_pool_tgt_map_update() Ignore update pool a68b3845 1 -> 1

05/13-11:57:11.02 delphi-006 DAOS[26509/26552] pool WARN src/pool/srv_target.c:1020 ds_pool_tgt_map_update() Ignore update pool a68b3845 1 -> 1

05/13-11:57:11.02 delphi-006 DAOS[26509/26552] pool WARN src/pool/srv_target.c:1020 ds_pool_tgt_map_update() Ignore update pool a68b3845 1 -> 1

 

My guess is that ds_pool_tgt_map_update() should not even be called?

 

Cheers,

 

Colin

 


Colin Ngam
 

Hi WangDi,

 

commit 8200a7fb403e091b51b4b00c1aec57dafefb1ada

 

[daos@hl-d106 scripts]$ daos pool list-cont --pool 96670dea-d357-4235-8659-dac16d01b1c2 --svc 40

[daos@hl-d106 scripts]$

 

… No error message in terminal window ..

 

[daos@hl-d106 scripts]$ which daos

~/daos/install/bin/daos

[daos@hl-d106 scripts]$ ldd ~/daos/install/bin/daos

                linux-vdso.so.1 =>  (0x00007ffece314000)

                libdaos.so.0 => /home/users/daos/daos/install/lib64/libdaos.so.0 (0x00007f635ae8c000)

                libdaos_common.so => /home/users/daos/daos/install/lib64/libdaos_common.so (0x00007f635ac1d000)

                libuuid.so.1 => /lib64/libuuid.so.1 (0x00007f635aa18000)

                libdfs.so => /home/users/daos/daos/install/lib64/libdfs.so (0x00007f635a7fa000)

                libduns.so => /home/users/daos/daos/install/lib64/libduns.so (0x00007f635a5f5000)

                libgurt.so.4 => /home/users/daos/daos/install/lib64/libgurt.so.4 (0x00007f635a3d2000)

                libcart.so.4 => /home/users/daos/daos/install/lib64/libcart.so.4 (0x00007f635a0ef000)

                libc.so.6 => /lib64/libc.so.6 (0x00007f6359d21000)

                /lib64/ld-linux-x86-64.so.2 (0x00007f635b167000)

                libpmemobj.so.1 => /home/users/daos/daos/install/lib/libpmemobj.so.1 (0x00007f6359ae0000)

                libisal.so.2 => /home/users/daos/daos/install/lib/libisal.so.2 (0x00007f63598a2000)

                libprotobuf-c.so.1 => /home/users/daos/daos/install/lib/libprotobuf-c.so.1 (0x00007f6359699000)

                libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f635947d000)

                libyaml-0.so.2 => /lib64/libyaml-0.so.2 (0x00007f635925d000)

                libmercury.so.2 => /home/users/daos/daos/install/lib/libmercury.so.2 (0x00007f6359043000)

                libna.so.2 => /home/users/daos/daos/install/lib/libna.so.2 (0x00007f6358e25000)

                libmercury_util.so.2 => /home/users/daos/daos/install/lib/libmercury_util.so.2 (0x00007f6358c1e000)

                libpmem.so.1 => /home/users/daos/daos/install/lib/libpmem.so.1 (0x00007f63589f5000)

                libdl.so.2 => /lib64/libdl.so.2 (0x00007f63587f1000)

                librt.so.1 => /lib64/librt.so.1 (0x00007f63585e9000)

                libfabric.so.1 => /home/users/daos/daos/install/lib/libfabric.so.1 (0x00007f6358210000)

                librdmacm.so.1 => /lib64/librdmacm.so.1 (0x00007f6357ff7000)

                libibverbs.so.1 => /lib64/libibverbs.so.1 (0x00007f6357ddc000)

                libnl-3.so.200 => /lib64/libnl-3.so.200 (0x00007f6357bbb000)

                libnl-route-3.so.200 => /lib64/libnl-route-3.so.200 (0x00007f635794e000)

                libpsm2.so.2 => /home/users/daos/daos/install/lib64/libpsm2.so.2 (0x00007f63576eb000)

                libm.so.6 => /lib64/libm.so.6 (0x00007f63573e9000)

                libnuma.so.1 => /lib64/libnuma.so.1 (0x00007f63571dd000)

                libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f6356fc7000)

[daos@hl-d106 scripts]$

 

daos.log:

05/15-09:57:58.58 hl-d106 DAOS[38462/38462] fi   INFO src/cart/src/gurt/fault_inject.c:486 d_fault_inject_init() No config file, fault injection is OFF.

05/15-09:57:58.58 hl-d106 DAOS[38462/38462] crt  INFO src/cart/src/cart/crt_init.c:282 crt_init_opt() libcart version 4.7.0 initializing

05/15-09:57:58.58 hl-d106 DAOS[38462/38462] crt  WARN src/cart/src/cart/crt_init.c:174 data_init() FI_UNIVERSE_SIZE was not set; setting to 2048

05/15-09:57:58.58 hl-d106 DAOS[38462/38462] crt  WARN src/cart/src/cart/crt_init.c:393 crt_init_opt() FI_OFI_RXM_USE_SRX not set, set=1

05/15-09:57:58.95 hl-d106 DAOS[38462/38462] client INFO src/utils/daos.c:142 cmd_args_print()   DAOS system name: daos_server

05/15-09:57:58.95 hl-d106 DAOS[38462/38462] client INFO src/utils/daos.c:143 cmd_args_print()   pool UUID: 96670dea-d357-4235-8659-dac16d01b1c2

05/15-09:57:58.95 hl-d106 DAOS[38462/38462] client INFO src/utils/daos.c:144 cmd_args_print()   cont UUID: 00000000-0000-0000-0000-000000000000

05/15-09:57:58.95 hl-d106 DAOS[38462/38462] client INFO src/utils/daos.c:148 cmd_args_print()   pool svc: parsed 1 ranks from input 40

05/15-09:57:58.95 hl-d106 DAOS[38462/38462] client INFO src/utils/daos.c:152 cmd_args_print()   attr: name=NULL, value=NULL

05/15-09:57:58.95 hl-d106 DAOS[38462/38462] client INFO src/utils/daos.c:156 cmd_args_print()   path=NULL, type=unknown, oclass=UNKNOWN, chunk_size=0

05/15-09:57:58.95 hl-d106 DAOS[38462/38462] client INFO src/utils/daos.c:162 cmd_args_print()   snapshot: name=NULL, epoch=0, epoch range=NULL (0-0)

05/15-09:57:58.95 hl-d106 DAOS[38462/38462] client INFO src/utils/daos.c:163 cmd_args_print()   oid: 0.0

05/15-09:57:59.03 hl-d106 DAOS[38462/38462] daos INFO src/common/drpc.c:664 drpc_close() Closing dRPC socket fd=19

05/15-09:57:59.06 hl-d106 DAOS[38462/38462] daos INFO src/common/drpc.c:664 drpc_close() Closing dRPC socket fd=19

05/15-09:57:59.11 hl-d106 DAOS[38462/38462] rpc  ERR  src/cart/src/cart/crt_context.c:302 crt_rpc_complete(0xc5cfe0) [opc=0x2010001 xid=0x0 rank:tag=40:0] RPC failed; rc: -1019

05/15-09:57:59.11 hl-d106 DAOS[38462/38462] common ERR  src/common/rsvc.c:142 rsvc_client_process_error() removed rank 40 from replica list due to DER_OOG(-1019)

05/15-09:57:59.11 hl-d106 DAOS[38462/38462] common WARN src/common/rsvc.c:102 rsvc_client_choose() replica list empty

05/15-09:57:59.11 hl-d106 DAOS[38462/38462] pool ERR  src/pool/cli.c:539 dc_pool_connect() 96670dea: cannot find pool service: DER_NOTREPLICA(-2020)

failed to connect to pool: -1005

 

Thanks.

 

Colin

 

From: <daos@daos.groups.io> on behalf of "Wang, Di" <di.wang@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Thursday, May 14, 2020 at 3:47 PM
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: Re: [daos] Message looks serious?

 

Hello, Colin

 

I tried with this commit, and it can generate the failure message on my env. 

 

I assume you build the source yourself? And what is your output of “ldd install/bin/daos” ?

 

Thanks

WangDi 

From: <daos@daos.groups.io> on behalf of Colin Ngam <colin.ngam@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Thursday, May 14, 2020 at 9:41 AM
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: Re: [daos] Message looks serious?

 

Greetings,

 

commit 8200a7fb403e091b51b4b00c1aec57dafefb1ada

 

Thanks.

 

Colin

 

From: <daos@daos.groups.io> on behalf of "Wang, Di" <di.wang@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Thursday, May 14, 2020 at 11:37 AM
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: Re: [daos] Message looks serious?

 

Hello, 

 

Thanks.  This does show the connection failed with –svc 40.  I am not sure why it does not output any failure messages. But I do see others also complained about zero failure message.  Which version are you using 0.9 or master?

 

Btw: these server log might be seen during pool creation as well.  It is a known issue, and we will fix it. Thanks.

 

"05/13-11:57:11.02 delphi-006 DAOS[26509/26552] pool WARN src/pool/srv_target.c:1020 ds_pool_tgt_map_update() Ignore update pool a68b3845 1 -> 1

 

Thanks

Wangdi

From: <daos@daos.groups.io> on behalf of Colin Ngam <colin.ngam@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Thursday, May 14, 2020 at 8:06 AM
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: Re: [daos] Message looks serious?

 

Hi WangDi,

 

Is this what you need:

 

05/14-09:53:00.86 hl-d106 DAOS[20928/20928] fi   INFO src/cart/src/gurt/fault_inject.c:486 d_fault_inject_init() No config file, fault injection is OFF.

05/14-09:53:00.86 hl-d106 DAOS[20928/20928] crt  INFO src/cart/src/cart/crt_init.c:282 crt_init_opt() libcart version 4.7.0 initializing

05/14-09:53:00.86 hl-d106 DAOS[20928/20928] crt  WARN src/cart/src/cart/crt_init.c:174 data_init() FI_UNIVERSE_SIZE was not set; setting to 2048

05/14-09:53:00.86 hl-d106 DAOS[20928/20928] crt  WARN src/cart/src/cart/crt_init.c:393 crt_init_opt() FI_OFI_RXM_USE_SRX not set, set=1

05/14-09:53:01.23 hl-d106 DAOS[20928/20928] client INFO src/utils/daos.c:142 cmd_args_print()   DAOS system name: daos_server

05/14-09:53:01.23 hl-d106 DAOS[20928/20928] client INFO src/utils/daos.c:143 cmd_args_print()   pool UUID: 6eb32fb0-49e9-49fd-96e8-bba14728a8c3

05/14-09:53:01.23 hl-d106 DAOS[20928/20928] client INFO src/utils/daos.c:144 cmd_args_print()   cont UUID: 00000000-0000-0000-0000-000000000000

05/14-09:53:01.23 hl-d106 DAOS[20928/20928] client INFO src/utils/daos.c:148 cmd_args_print()   pool svc: parsed 1 ranks from input 1

05/14-09:53:01.23 hl-d106 DAOS[20928/20928] client INFO src/utils/daos.c:152 cmd_args_print()   attr: name=NULL, value=NULL

05/14-09:53:01.23 hl-d106 DAOS[20928/20928] client INFO src/utils/daos.c:156 cmd_args_print()   path=NULL, type=unknown, oclass=UNKNOWN, chunk_size=0

05/14-09:53:01.23 hl-d106 DAOS[20928/20928] client INFO src/utils/daos.c:162 cmd_args_print()   snapshot: name=NULL, epoch=0, epoch range=NULL (0-0)

05/14-09:53:01.23 hl-d106 DAOS[20928/20928] client INFO src/utils/daos.c:163 cmd_args_print()   oid: 0.0

05/14-09:53:01.31 hl-d106 DAOS[20928/20928] daos INFO src/common/drpc.c:664 drpc_close() Closing dRPC socket fd=19

05/14-09:53:01.35 hl-d106 DAOS[20928/20928] daos INFO src/common/drpc.c:664 drpc_close() Closing dRPC socket fd=19

05/14-09:53:01.40 hl-d106 DAOS[20928/20928] common ERR  src/common/rsvc.c:142 rsvc_client_process_error() removed rank 1 from replica list due to DER_NOTREPLICA(-2020)

05/14-09:53:01.40 hl-d106 DAOS[20928/20928] common WARN src/common/rsvc.c:102 rsvc_client_choose() replica list empty

05/14-09:53:01.40 hl-d106 DAOS[20928/20928] pool ERR  src/pool/cli.c:539 dc_pool_connect() 6eb32fb0: cannot find pool service: DER_NOTREPLICA(-2020)

failed to connect to pool: -1005

 

This comes from the daos.log on the Client.

 

Thanks.

 

Colin

 

From: <daos@daos.groups.io> on behalf of "Wang, Di" <di.wang@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Wednesday, May 13, 2020 at 4:07 PM
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: Re: [daos] Message looks serious?

 

If 40 does not exist, it should not be able to connect to the pool at all, I.e. it should output sth like "failed to connect to pool: …”.

 

These logs seems suggesting the pool connection did happen.  Would you please collect the client side daos log? (by "export D_LOG_FILE=xxx “? It might tell us what really happened. Thanks.

 

Thanks

WangDi

From: <daos@daos.groups.io> on behalf of Colin Ngam <cngam@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Wednesday, May 13, 2020 at 10:08 AM
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: [daos] Message looks serious?

 

Greetings,

 

Executing the command:

daos pool list-cont --pool a68b3845-fe78-481e-aa84-164e851d5f52 --svc 40

 

Note that 40 does not exist.

 

We did not get an error from the daos command.

 

In the log:

 

05/13-11:57:11.02 delphi-006 DAOS[26509/26552] pool WARN src/pool/srv_target.c:1020 ds_pool_tgt_map_update() Ignore update pool a68b3845 1 -> 1

05/13-11:57:11.02 delphi-006 DAOS[26509/26552] pool WARN src/pool/srv_target.c:1020 ds_pool_tgt_map_update() Ignore update pool a68b3845 1 -> 1

05/13-11:57:11.02 delphi-006 DAOS[26509/26552] pool WARN src/pool/srv_target.c:1020 ds_pool_tgt_map_update() Ignore update pool a68b3845 1 -> 1

 

My guess is that ds_pool_tgt_map_update() should not even be called?

 

Cheers,

 

Colin