Re: Timeouts/DAOS rendered useless when running IOR with SX/default object class


Rosenzweig, Joel B <joel.b.rosenzweig@...>
 

Sure thing.  Unless you say otherwise, I’m planning to submit it against 1.2 and 2.0 branches.

 

https://github.com/daos-stack/daos/pull/5246

 

 

From: Lombardi, Johann <johann.lombardi@...>
Sent: Tuesday, March 30, 2021 3:19 PM
To: daos@daos.groups.io; Rosenzweig, Joel B <joel.b.rosenzweig@...>
Subject: Re: [daos] Timeouts/DAOS rendered useless when running IOR with SX/default object class

 

Hi Steffen,

 

Good catch! It sounds like we need to add a “LimitNOFILE” entry to our daos_server’s systemd unit file.

@Rosenzweig, Joel B could you please take of this? Thanks in advance.

 

Cheers,

Johann

 

From: <daos@daos.groups.io> on behalf of Steffen Christgau <christgau@...>
Reply-To: <daos@daos.groups.io>
Date: Tuesday 30 March 2021 at 17:04
To: <daos@daos.groups.io>
Subject: Re: [daos] Timeouts/DAOS rendered useless when running IOR with SX/default object class

 

A final "Hi" on that topic,

 

we have discovered the reason for the issue: The ulimit on the _server_

side was too low and it differs between regular users and daemons like

the DAOS server. For the latter it was set to soft 1024/hard 4096. We

increased it to 50000 respectively by modifying the service/unit file.

With that we did multiple IOR runs with up to 48 processes and SX object

class from a single client node without any errors.

 

We noted that the coredump end memlock limits are already "increased" in

the server's unit file. Maybe it is a good idea to increase the file

limit as well by default, although the limit may depend on the provider

in use.

 

Regards, Steffen

 

 

 

 

 

 

Join daos@daos.groups.io to automatically receive all group messages.