DAOS configuration


Steffen Christgau
 

Dear all,

I successfully installed DAOS v0.6 on a small test-bed. I was able to
create a pool that uses both SCM and an emulated NVMe storage. Although
this quite nice for the moment, the configuration files of server, agent
and client contain options which are not that clear to me. Here are some
questions:

* access_points

The server configuration from utils/config states that "DAOS will need a
quorum of access point nodes to be available [in order to operate]
[...]". In other words: Is the list of hosts in the "access_point"
configuration simply the list of hosts running DAOS server instances
that form the DAOS system according to the storage model [1]? If that is
the case and if I have n nodes where I run the daos_server on, do all n
hostnames have to be listed in the "access point" variable?

In addition: Should the value of access_points be identical on all nodes
and also identical for the server, agent and client configuration?

* hostlist

In the client and agent configuration there is also a "hostlist"
configuration variable. What is the actual meaning of this parameter? Do
they have different semantics for the agent and for the client? From my
experiments I assume that the daos_shell (a client) connects to the
hosts specified in the hostlist from daos.yml. But what is then the
meaning of access_points for the client in that case?

* targets in daos_server.yml

According to the storage model document "a target is the unit of fault"
and "the number of target[s] exported by a DAOS server instance is
configurable and depends on the underlying hardware (i.e. number of SCM
modules)". For a system with six NVDIMMs per socket is the correct
number of the "targets" setting six as each DIMM may fail?

* multiple NVDIMM namespaces/NUMA configuration

The nodes of the test-bed are dual-socket ones where each socket has
NVDIMMs attached to it. Two namespaces have been created for each socket
with ipmctl, formatted (ext4) and mounted (dax) under /mnt/daos/pmem{0,1}.

How do I configure the daos_server for such a system? The comments in
the example server configuration have a "single server instance per
config file for now", the scm_mount variable is a single string (a list
does not work), however scm_list is actually a list of namespaces/device
files but "currently only one per server [is] supported".

So for the NUMA case, do I have to create a second server configuration
for the second NUMA domain and launch it together with the existing
configuration (e.g. using orterun's appfile facility)?

* Configuration for dmg

This is a minor issue: Is there any configuration file for dmg? It
always writes log file to /tmp/daos.log. The client configuration file
appears to be ignored and strace shows no indication of a config file
being read. In addition, I had to set the OFI_PORT and OFI_INTERFACE
environment variables according to to get it working. Its no real
problem, but it would be convenient to have these settings persisted.

Some clarifications on these points would be quite helpful.

Regards, Steffen

[1] https://github.com/daos-stack/daos/blob/v0.6/doc/storage_model.md


Nabarro, Tom
 

Answers inline below

Regards,
Tom Nabarro – DCG/ESAD
M: +44 (0)7786 260986
Skype: tom.nabarro

-----Original Message-----
From: daos@daos.groups.io [mailto:daos@daos.groups.io] On Behalf Of Steffen Christgau
Sent: Thursday, August 15, 2019 4:12 PM
To: daos@daos.groups.io
Subject: [daos] DAOS configuration

Dear all,

I successfully installed DAOS v0.6 on a small test-bed. I was able to create a pool that uses both SCM and an emulated NVMe storage. Although this quite nice for the moment, the configuration files of server, agent and client contain options which are not that clear to me. Here are some
questions:

* access_points

The server configuration from utils/config states that "DAOS will need a quorum of access point nodes to be available [in order to operate] [...]". In other words: Is the list of hosts in the "access_point"
configuration simply the list of hosts running DAOS server instances that form the DAOS system according to the storage model [1]? If that is the case and if I have n nodes where I run the daos_server on, do all n hostnames have to be listed in the "access point" variable?

if using orterun to launch daos_server (which is the only currently supported launcher), access_points is unused, this will be for future use. Please remove this parameter from config files.
In addition: Should the value of access_points be identical on all nodes and also identical for the server, agent and client configuration?

* hostlist

In the client and agent configuration there is also a "hostlist"
configuration variable. What is the actual meaning of this parameter? Do they have different semantics for the agent and for the client? From my experiments I assume that the daos_shell (a client) connects to the hosts specified in the hostlist from daos.yml. But what is then the meaning of access_points for the client in that case?

hostlist determines the storage server hosts that will be acted upon from the client (as you have said), we are in the process of cleaning up the definitions/distinctions between that and access_points. For the moment it's best to ignore access_points please.
* targets in daos_server.yml

According to the storage model document "a target is the unit of fault"
and "the number of target[s] exported by a DAOS server instance is configurable and depends on the underlying hardware (i.e. number of SCM modules)". For a system with six NVDIMMs per socket is the correct number of the "targets" setting six as each DIMM may fail?

targets in the server configuration file refer to VOS instances (can be thought of as service threads). https://github.com/daos-stack/daos/blob/master/utils/config/daos_server.yml#L194
* multiple NVDIMM namespaces/NUMA configuration

The nodes of the test-bed are dual-socket ones where each socket has NVDIMMs attached to it. Two namespaces have been created for each socket with ipmctl, formatted (ext4) and mounted (dax) under /mnt/daos/pmem{0,1}.

How do I configure the daos_server for such a system? The comments in the example server configuration have a "single server instance per config file for now", the scm_mount variable is a single string (a list does not work), however scm_list is actually a list of namespaces/device files but "currently only one per server [is] supported".

So for the NUMA case, do I have to create a second server configuration for the second NUMA domain and launch it together with the existing configuration (e.g. using orterun's appfile facility)?

support for multiple NUMA domains is in development, currently please just choose one pmem device/SCM Mount. Thanks for your patience.
* Configuration for dmg

This is a minor issue: Is there any configuration file for dmg? It always writes log file to /tmp/daos.log. The client configuration file appears to be ignored and strace shows no indication of a config file being read. In addition, I had to set the OFI_PORT and OFI_INTERFACE environment variables according to to get it working. Its no real problem, but it would be convenient to have these settings persisted.

The environment variable is D_LOG_FILE
Some clarifications on these points would be quite helpful.

Regards, Steffen

[1] https://github.com/daos-stack/daos/blob/v0.6/doc/storage_model.md



---------------------------------------------------------------------
Intel Corporation (UK) Limited
Registered No. 1134945 (England)
Registered Office: Pipers Way, Swindon SN3 1RJ
VAT No: 860 2173 47

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.