'access_points' must contain resolvable addresses complaint


Hannappel, Juergen
 

Hello,
I try to set up daos on a small 4-node cluster, following the recipe on https://docs.daos.io/v2.0/QSG/setup_centos/
When I try to start the server it complains:
Feb 16 09:16:11 asapo-srv09.desy.de daos_server[251023]: ERROR: retrieve replicas from config: serverconfig: code = 705 description = "invalid list of access points in configuration"
Feb 16 09:16:11 asapo-srv09.desy.de daos_server[251023]: ERROR: serverconfig: code = 705 resolution = "'access_points' must contain resolvable addresses; fix the configuration and restart the cont>

Config is:
[root@asapo-srv09 tmp]# grep -v ^# /etc/daos/daos_server.yml
access_points: ['asapo-srv09', 'asapo-srv10', 'asapo-srv11', 'asapo-srv12']
provider: ofi+verbs;ofi_rxm
bdev_include: ["01:00.0","02:00.0"]

The node names are resolvable:
root@asapo-srv09 tmp]# host asapo-srv09
asapo-srv09.desy.de has address 131.169.183.155
[root@asapo-srv09 tmp]# host asapo-srv10
asapo-srv10.desy.de has address 131.169.183.157
... and so on.

The result was the same when I added the domain name to the host names in the config file or used the ip adresses.
Probably a stupid error on my side, any hints?

Thanks in advance


Hennecke, Michael
 

Hi,

 

Does it work with a single host in the access_points list? 

You should add a line with “name: daos_server” above the access_points.

Anything useful in the daos server logfile?

 

Best,

Michael

 

From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of Hannappel, Juergen
Sent: Wednesday, 16 February 2022 10:18
To: daos@daos.groups.io
Subject: [daos] 'access_points' must contain resolvable addresses complaint

 

Hello,
I try to set up daos on a small 4-node cluster, following the recipe on https://docs.daos.io/v2.0/QSG/setup_centos/
When I try to start the server it complains:
Feb 16 09:16:11 asapo-srv09.desy.de daos_server[251023]: ERROR: retrieve replicas from config: serverconfig: code = 705 description = "invalid list of access points in configuration"
Feb 16 09:16:11 asapo-srv09.desy.de daos_server[251023]: ERROR: serverconfig: code = 705 resolution = "'access_points' must contain resolvable addresses; fix the configuration and restart the cont>

Config is:
[root@asapo-srv09 tmp]# grep -v ^# /etc/daos/daos_server.yml
access_points: ['asapo-srv09', 'asapo-srv10', 'asapo-srv11', 'asapo-srv12']
provider: ofi+verbs;ofi_rxm
bdev_include: ["01:00.0","02:00.0"]

The node names are resolvable:
root@asapo-srv09 tmp]# host asapo-srv09
asapo-srv09.desy.de has address 131.169.183.155
[root@asapo-srv09 tmp]# host asapo-srv10
asapo-srv10.desy.de has address 131.169.183.157
... and so on.

The result was the same when I added the domain name to the host names in the config file or used the ip adresses.
Probably a stupid error on my side, any hints?

Thanks in advance

Intel Deutschland GmbH
Registered Address: Am Campeon 10, 85579 Neubiberg, Germany
Tel: +49 89 99 8853-0, www.intel.de
Managing Directors: Christin Eisenschmid, Sharon Heck, Tiffany Doon Silva  
Chairperson of the Supervisory Board: Nicole Lau
Registered Office: Munich
Commercial Register: Amtsgericht Muenchen HRB 186928


Nabarro, Tom
 

Hello,

 

I would check if there are any clues from comparing with an example config file (under utils/config/examples dir) and then verify that it works with the IP addresses (maybe just try one to start with and use double quotes like in the examples) just to narrow down the problem.

 

Regards,

Tom

 

From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of Hannappel, Juergen
Sent: Wednesday, February 16, 2022 9:18 AM
To: daos@daos.groups.io
Subject: [daos] 'access_points' must contain resolvable addresses complaint

 

Hello,
I try to set up daos on a small 4-node cluster, following the recipe on https://docs.daos.io/v2.0/QSG/setup_centos/
When I try to start the server it complains:
Feb 16 09:16:11 asapo-srv09.desy.de daos_server[251023]: ERROR: retrieve replicas from config: serverconfig: code = 705 description = "invalid list of access points in configuration"
Feb 16 09:16:11 asapo-srv09.desy.de daos_server[251023]: ERROR: serverconfig: code = 705 resolution = "'access_points' must contain resolvable addresses; fix the configuration and restart the cont>

Config is:
[root@asapo-srv09 tmp]# grep -v ^# /etc/daos/daos_server.yml
access_points: ['asapo-srv09', 'asapo-srv10', 'asapo-srv11', 'asapo-srv12']
provider: ofi+verbs;ofi_rxm
bdev_include: ["01:00.0","02:00.0"]

The node names are resolvable:
root@asapo-srv09 tmp]# host asapo-srv09
asapo-srv09.desy.de has address 131.169.183.155
[root@asapo-srv09 tmp]# host asapo-srv10
asapo-srv10.desy.de has address 131.169.183.157
... and so on.

The result was the same when I added the domain name to the host names in the config file or used the ip adresses.
Probably a stupid error on my side, any hints?

Thanks in advance


Hannappel, Juergen
 

Hi,
there is no log file yet, all entries are in the journal.
They look like this:
Feb 16 16:51:32 asapo-srv09.desy.de systemd[1]: Started DAOS Server.
Feb 16 16:51:32 asapo-srv09.desy.de daos_server[267723]: DAOS Server config loaded from /etc/daos/daos_server.yml
Feb 16 16:51:32 asapo-srv09.desy.de daos_server[267723]: no control log file specified; logging to stdout
Feb 16 16:51:32 asapo-srv09.desy.de daos_server[267723]: No DAOS I/O Engines in configuration, DAOS Control Server starting in discovery mode
Feb 16 16:51:32 asapo-srv09.desy.de daos_server[267723]: ERROR: retrieve replicas from config: serverconfig: code = 705 description = "invalid list of access points in configuration"
Feb 16 16:51:32 asapo-srv09.desy.de daos_server[267723]: ERROR: serverconfig: code = 705 resolution = "'access_points' must contain resolvable addresses; fix the configuration and restart the cont>
Feb 16 16:51:32 asapo-srv09.desy.de systemd[1]: daos_server.service: Main process exited, code=exited, status=1/FAILURE
Feb 16 16:51:32 asapo-srv09.desy.de systemd[1]: daos_server.service: Failed with result 'exit-code'.

I added the "name: daos_server" line, reduced the access_points list to just one node and tried with th ip number, always teh same result.
root@asapo-srv09 ~]# grep -v ^# /etc/daos/daos_server.yml
name: daos_server
access_points: ['131.169.183.155']
provider: ofi+verbs;ofi_rxm
bdev_include: ["01:00.0","02:00.0"]


I use
Name         : daos-server
Version      : 2.0.1
Release      : 2.el8
Architecture : x86_64
Size         : 41 M
Source       : daos-2.0.1-2.el8.src.rpm

on
LSB Version:    :core-4.1-amd64:core-4.1-noarch
Distributor ID:    AlmaLinux
Description:    AlmaLinux release 8.5 (Arctic Sphynx)
Release:    8.5
Codename:    ArcticSphynx


Nabarro, Tom
 

Hello,

 

Could you please try to use one of the examples in utils/config/examples and look at the reference documentation in utils/config/daos_server.yml. The config file you are using below is not valid. Specifically bdev_include is not a global parameter.

 

Regards,

Tom

 

From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of Hannappel, Juergen
Sent: Wednesday, February 16, 2022 3:56 PM
To: daos@daos.groups.io
Subject: Re: [daos] 'access_points' must contain resolvable addresses complaint

 

Hi,
there is no log file yet, all entries are in the journal.
They look like this:
Feb 16 16:51:32 asapo-srv09.desy.de systemd[1]: Started DAOS Server.
Feb 16 16:51:32 asapo-srv09.desy.de daos_server[267723]: DAOS Server config loaded from /etc/daos/daos_server.yml
Feb 16 16:51:32 asapo-srv09.desy.de daos_server[267723]: no control log file specified; logging to stdout
Feb 16 16:51:32 asapo-srv09.desy.de daos_server[267723]: No DAOS I/O Engines in configuration, DAOS Control Server starting in discovery mode
Feb 16 16:51:32 asapo-srv09.desy.de daos_server[267723]: ERROR: retrieve replicas from config: serverconfig: code = 705 description = "invalid list of access points in configuration"
Feb 16 16:51:32 asapo-srv09.desy.de daos_server[267723]: ERROR: serverconfig: code = 705 resolution = "'access_points' must contain resolvable addresses; fix the configuration and restart the cont>
Feb 16 16:51:32 asapo-srv09.desy.de systemd[1]: daos_server.service: Main process exited, code=exited, status=1/FAILURE
Feb 16 16:51:32 asapo-srv09.desy.de systemd[1]: daos_server.service: Failed with result 'exit-code'.

I added the "name: daos_server" line, reduced the access_points list to just one node and tried with th ip number, always teh same result.
root@asapo-srv09 ~]# grep -v ^# /etc/daos/daos_server.yml
name: daos_server
access_points: ['131.169.183.155']
provider: ofi+verbs;ofi_rxm
bdev_include: ["01:00.0","02:00.0"]


I use
Name         : daos-server
Version      : 2.0.1
Release      : 2.el8
Architecture : x86_64
Size         : 41 M
Source       : daos-2.0.1-2.el8.src.rpm

on
LSB Version:    :core-4.1-amd64:core-4.1-noarch
Distributor ID:    AlmaLinux
Description:    AlmaLinux release 8.5 (Arctic Sphynx)
Release:    8.5
Codename:    ArcticSphynx


Hannappel, Juergen
 

The config file was created by modifying the file that was installed from the RPM, what I showd in the mail were just the lines which are not commented out.
I commented out the bdev_include, but that did not help.

See here the comple diff to the version in utils/config  (why is that one different from the one installed initailly in /etc/daos ?):
[root@asapo-srv09 ~]# diff /usr/local/src/daos/utils/config/daos_server.yml /etc/daos/daos_server.yml
15c15
< #name: daos_server
---
> name: daos_server
27c27
< #access_points: ['hostname1']
---
> access_points: ['131.169.183.155']
75,76c75,76
< ##  ofi+verbs for Infiniband/RoCE and
< ##  ofi+tcp for non-RDMA-capable Ethernet.
---
> ##  ofi+verbs;ofi_rxm for Infiniband/RoCE and
> ##  ofi+sockets for non-RDMA-capable Ethernet.
78c78
< #provider: ofi+verbs
---
> provider: ofi+verbs;ofi_rxm
94,99d93
< ## CART: Disable SRX
< ## parameters shared with client.
< #
< #disable_srx: false
< #
< #
107c101
< #bdev_include: ["0000:81:00.1","0000:81:00.2","0000:81:00.3"]
---
> #bdev_include: ["01:00.0","02:00.0"]
176c170
< ## Number of hugepages to allocate for DMA buffer memory
---
> ## Number of hugepages to allocate for use by NVMe SSDs
179,189c173,175
< ## through SPDK. Note that each target requires 1 GB of hugepage space, so
< ## this value needs to represent the total amount of hugespace required for
< ## all targets across all engines on host, divided by the system hugepage size.
< ## If not set here, an appropriate value will be automatically calculated and
< ## default system hugepage size will be used.
< #
< ## Example: (2 engines * (8 targets/engine * 1GB)) / 2MB hugepage size = 16834
< #
< ## Hugepages are mandatory with NVME SSDs configured and optional without.
< ## To disabled the use of hugepages when no NVMe SSDs are configured, set
< ## nr_hugepages to -1.
---
> ## through SPDK. This indicates the number to be used for each spawned
> ## I/O Engines, so the total will be this number * number of I/O Engines.
> ## Default system hugepage size will be used.
191c177,178
< ##nr_hugepages: -1
---
> ## default: 4096
> #nr_hugepages: 4096
353,354c340
< #    bdev_list: ["0000:81:00.0", "0000:82:00.0"]  # generate regular nvme.conf
< #
---
> #    bdev_list: ["0000:81:00.0"]  # generate regular nvme.conf
359,365d344
< #
< #    # Optional override, will be automatically generated based on NUMA affinity.
< #    # Filter hot-pluggable devices by PCI bus-ID by specifying a hexadecimal
< #    # range. Hotplug events relating to devices with PCI bus-IDs outside this range
< #    # will not be processed by this engine. Empty or unset range signifies allow all.
< #    bdev_busid_range: 0x80-0x8f
< #    #bdev_busid_range: 128-143
501,512c480,481
< #    #class: nvme
< #    #bdev_list: ["0000:5d:05.5"]
< #
< #    #class: nvme
< #    #bdev_list: ["0000:da:00.0", "0000:db:00.0"]  # generate regular nvme.conf
< #
< #    # Optional override, will be automatically generated based on NUMA affinity.
< #    # Filter hot-pluggable devices by PCI bus-ID by specifying a hexadecimal
< #    # range. Hotplug events relating to devices with PCI bus-IDs outside this range
< #    # will not be processed by this engine. Empty or unset range signifies allow all.
< #    #bdev_busid_range: 0xd0-0xdf
< #    #bdev_busid_range: 208-223
---
> #    class: nvme
> #    bdev_list: ["0000:82:00.0","0000:5d:05.5"]