Re: Timeouts/DAOS rendered useless when running IOR with SX/default object class
Rosenzweig, Joel B
Sure thing. Unless you say otherwise, I’m planning to submit it against 1.2 and 2.0 branches.
From: Lombardi, Johann <johann.lombardi@...>
Sent: Tuesday, March 30, 2021 3:19 PM
To: email@example.com; Rosenzweig, Joel B <joel.b.rosenzweig@...>
Subject: Re: [daos] Timeouts/DAOS rendered useless when running IOR with SX/default object class
Good catch! It sounds like we need to add a “LimitNOFILE” entry to our daos_server’s systemd unit file.
@Rosenzweig, Joel B could you please take of this? Thanks in advance.
<firstname.lastname@example.org> on behalf of Steffen Christgau <christgau@...>
A final "Hi" on that topic,
we have discovered the reason for the issue: The ulimit on the _server_
side was too low and it differs between regular users and daemons like
the DAOS server. For the latter it was set to soft 1024/hard 4096. We
increased it to 50000 respectively by modifying the service/unit file.
With that we did multiple IOR runs with up to 48 processes and SX object
class from a single client node without any errors.
We noted that the coredump end memlock limits are already "increased" in
the server's unit file. Maybe it is a good idea to increase the file
limit as well by default, although the limit may depend on the provider