bugs in crt_context.c


Oganezov, Alexander A
 

Hi Hector,

 

Timeout messages on a client side are a symptom of some issue, not the issue on its own. It basically means an RPC was attempted to be sent to the server and was not responded to within timeout specified (60 seconds by default for each rpc, but can be overridden by the settings).

 

Can you provide more information about what you were running and what your setup is when you encounter this message?

 

In particular:

  • Which provider do you use?
  • What is your setup in terms of number of server nodes, servers per node, number of client nodes and clients per node?
  • What application are you running on client?
  • If you can provide server-side logs during the failure it would also help.
  • Server yaml file that you use would also be helpful

 

Thanks,

~~Alex.

 

 

From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of Wu Huijun
Sent: Saturday, June 6, 2020 1:49 AM
To: daos@daos.groups.io
Subject: [daos] bugs in crt_context.c

 

Hi all, 

 

We are currently playing with the code on GitHub master (cloned on June 4). 

 

We frequently received an error triggered in crt_context.c. It is saying that ctx_id 0 (status:0x38) timed out, tgt rank 3, tag1. Followed by another sentence saying aborting to group daos_server...

 

After we saw these errors, the clients seem to get stuck and no more data is written. Any ideas about this?

 

Cheers,

Hector


Wu Huijun
 

Hi all, 

We are currently playing with the code on GitHub master (cloned on June 4). 

We frequently received an error triggered in crt_context.c. It is saying that ctx_id 0 (status:0x38) timed out, tgt rank 3, tag1. Followed by another sentence saying aborting to group daos_server...

After we saw these errors, the clients seem to get stuck and no more data is written. Any ideas about this?

Cheers,
Hector