Re: Startup Errors


Jacque, Kristin
 

Hi Neale,

 

I suspect this may be a case of incompatible transport configurations. All components must be configured to either enable or disable certificates. If you prefer to run without certs, as with the dmg “-i” option, your server and agent must also be configured with “allow_insecure: true” in the yml file.

 

In your server config file I am seeing certs enabled:

 

transport_config:

#  # In order to disable transport security, uncomment and set allow_insecure

#  # to true. Not recommended for production configurations.

  allow_insecure: false

 

  # Location where daos_server will look for Client certificates

  client_cert_dir: /etc/daos/daosCA/clients

  # Custom CA Root certificate for generated certs

  ca_cert: /etc/daos/daosCA/certs/daosCA.crt

  # Server certificate for use in TLS handshakes

  cert: /etc/daos/daosCA/certs/server.crt

  # Key portion of Server Certificate

  key: /etc/daos/daosCA/certs/server.key

 

If that doesn’t resolve the connection failure, Tom’s suggestions will help you get to a good starting point to debug further.

 

Please let us know how it goes.

 

Thanks,

Kris

 

 

From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of Petrillo, Neale A. (Contractor) via groups.io
Sent: Wednesday, February 24, 2021 2:00 PM
To: daos@daos.groups.io
Subject: [daos] Startup Errors

 

Hello Group! 

 

I'm having some trouble getting my new DAOS cluster working. I've installed 6 servers all with the 1.0.1 RPMs. When I do a 'dmg storage format' from my test host, I get the following output:

 

[root@head ~]# dmg -i -l <host01>:10001 storage format

ERROR: <host01>:10001: socket connection is not active (TRANSIENT_FAILURE)

ERROR: dmg: no active connections

[root@head ~]# dmg -i -l <host01> system query

ERROR: <host01>:10001: socket connection is not active (TRANSIENT_FAILURE)

ERROR: dmg: no active connections

 

I'm also seeing these errors in the log files:

 

INFO 2021/02/18 10:40:15 DAOS I/O Server instance 0 storage not ready: context canceled

INFO 2021/02/18 10:40:19 SCM format required on instance 1

INFO 2021/02/18 10:40:19 DAOS I/O Server instance 1 storage not ready: context canceled

INFO 2021/02/18 10:40:19 DAOS Control Server (pid 9993) shutting down

ERROR 2021/02/18 10:40:54 /usr/bin/daos_admin EAL: No free hugepages reported in hugepages-1048576kB

INFO 2021/02/18 10:41:00 DAOS Control Server (pid 11507) listening on 0.0.0.0:10001

INFO 2021/02/18 10:41:00 Waiting for DAOS I/O Server instance storage to be ready...

INFO 2021/02/18 10:41:04 SCM format required on instance 0

 

Configuration files are attached. Any help would be appreciated! 

Neale

 

Join daos@daos.groups.io to automatically receive all group messages.