Cannot start DAOS - failure in hwloc
Jordan Henderson
Hi all,
I recently updated my out-of-date DAOS installation to commit a7f093db2fc96aa1dc20cc8c293d44274474ef62 (I'm sure this wasn't the commit that changed this behavior, but good to have for reference), and it seems I can no longer start the DAOS I/O server due to
a failure in hwloc_set_membind (https://github.com/daos-stack/daos/blob/master/src/iosrv/srv.c#L392).
I get the following lines in my log:
05/04-18:52:41.20 Talos DAOS[14193/14197] server ERR src/iosrv/srv.c:395 dss_srv_handler() failed to set memory affinity: 38
05/04-18:52:41.20 Talos DAOS[14193/14193] server ERR src/iosrv/init.c:521 server_init() DAOS cannot be initialized using the configured path (/mnt/daos_fs). Please ensure it is on a PMDK compatible file system and writeable by the current user.
which I think corresponds to a return value of ENOSYS from hwloc_set_membind, probably due to some form of missing support for the type of memory binding requested. Has the DAOS team seen this issue on other machines? Removing the call from the DAOS source
allows me to start the server once again, but of course that's just a temporary workaround for now.
Thanks in advance!
|
|
Lombardi, Johann
Thanks for the report Jordan. I have just pushed a fix: https://github.com/daos-stack/daos/pull/2625 No, I’ve never seen this failing. Could you please advise what distribution you run?
Cheers, Johann
From:
<daos@daos.groups.io> on behalf of Jordan Henderson <jhenderson@...>
Hi all,
I recently updated my out-of-date DAOS installation to commit a7f093db2fc96aa1dc20cc8c293d44274474ef62 (I'm sure this wasn't the commit that changed this behavior, but good to have for reference), and it seems I can no longer start the DAOS I/O server due to a failure in hwloc_set_membind (https://github.com/daos-stack/daos/blob/master/src/iosrv/srv.c#L392).
I get the following lines in my log:
05/04-18:52:41.20 Talos DAOS[14193/14197] server ERR src/iosrv/srv.c:395 dss_srv_handler() failed to set memory affinity: 38
which I think corresponds to a return value of ENOSYS from hwloc_set_membind, probably due to some form of missing support for the type of memory binding requested. Has the DAOS team seen this issue on other machines? Removing the call from the DAOS source allows me to start the server once again, but of course that's just a temporary workaround for now.
Thanks in advance! --------------------------------------------------------------------- This e-mail and any attachments may contain confidential material for |
|
Jordan Henderson
Hi Johann,
this is on Slackware Linux (https://en.wikipedia.org/wiki/Slackware). It's still actively maintained, but it's a very old distribution so often times there are modern things missing. In
any case, thank you for the quick fix!
From: daos@daos.groups.io <daos@daos.groups.io> on behalf of Lombardi, Johann via groups.io <johann.lombardi@...>
Sent: Tuesday, May 5, 2020 1:56 AM To: daos@daos.groups.io <daos@daos.groups.io> Subject: Re: [daos] Cannot start DAOS - failure in hwloc Thanks for the report Jordan. I have just pushed a fix: https://github.com/daos-stack/daos/pull/2625 No, I’ve never seen this failing. Could you please advise what distribution you run?
Cheers, Johann
From:
<daos@daos.groups.io> on behalf of Jordan Henderson <jhenderson@...>
Hi all,
I recently updated my out-of-date DAOS installation to commit a7f093db2fc96aa1dc20cc8c293d44274474ef62 (I'm sure this wasn't the commit that changed this behavior, but good to have for reference), and it seems I can no longer start the DAOS I/O server due to a failure in hwloc_set_membind (https://github.com/daos-stack/daos/blob/master/src/iosrv/srv.c#L392).
I get the following lines in my log:
05/04-18:52:41.20 Talos DAOS[14193/14197] server ERR src/iosrv/srv.c:395 dss_srv_handler() failed to set memory affinity: 38
which I think corresponds to a return value of ENOSYS from hwloc_set_membind, probably due to some form of missing support for the type of memory binding requested. Has the DAOS team seen this issue on other machines? Removing the call from the DAOS source allows me to start the server once again, but of course that's just a temporary workaround for now.
Thanks in advance! --------------------------------------------------------------------- This e-mail and any attachments may contain confidential material for |
|