Re: issues with NVMe drives from RPM installation

Nabarro, Tom

Hello Richard


"ERROR: DAOS I/O Server exited with error: /usr/bin/daos_io_server (instance 0) exited: exit status 1”
indicates that there might be some useful information in the io_server log for the first instance, the default location as set in the server config file (log_file) is /tmp/server0.log. If nothing useful in there try increasing the log_mask to DEBUG.


Tom Nabarro – HPC

M: +44 (0)7786 260986

Skype: tom.nabarro


From: <> On Behalf Of richard.dahringer@...
Sent: Thursday, July 30, 2020 3:27 PM
Subject: [daos] issues with NVMe drives from RPM installation


Hi all -
I'm trying to set up a proof of concept daos cluster, and it is proving to be tricky. The systems have 4 SCM 128G DIMMs, and 4 U.2 NVMe drives installed. I have installed all the RPMs from, and have been able to set up the SCM devices, 'dmg -i' commands all seem to work.  When I add nvme drives to the configuration though, daos_server does not start - it does start when the nvme drives are not there. 

My daos_server.conf file:

name: daos_server
access_points: ['elfs13o01']
# port: 10001
provider: ofi+psm2
nr_hugepages: 4096
control_log_file: /tmp/daos_control.log
   allow_insecure: true

  targets: 1
  first_core: 0
  nr_xs_helpers: 0
  fabric_iface: hib0
  fabric_iface_port: 31416
  log_file: /tmp/daos_server.log


  - DAOS_MD_CAP=1024


  # Storage definitions


  # When scm_class is set to ram, tmpfs will be used to emulate SCM.

  # The size of ram is specified by scm_size in GB units.

  scm_mount: /mnt/daos0  # map to -s /mnt/daos
  scm_class: dcpm
  scm_list: [/dev/pmem0]

  bdev_class: nvme
  bdev_list: ["0000:5e:00.0"]

The startup error:

[root@elfs13o01 ~]# daos_server -o daos_local.yml start
daos_server logging to file /tmp/daos_control.log
ERROR: /usr/bin/daos_admin EAL: No free hugepages reported in hugepages-1048576kB
DAOS Control Server (pid 73257) listening on
Waiting for DAOS I/O Server instance storage to be ready...
SCM @ /mnt/daos0: 262 GB Total/247 GB Avail
Starting I/O server instance 0: /usr/bin/daos_io_server
daos_io_server:0 Using legacy core allocation algorithm
daos_io_server:0 Starting SPDK v19.04.1 / DPDK 19.02.0 initialization...
[ DPDK EAL parameters: daos -c 0x1 --pci-whitelist=0000:5e:00.0 --log-level=lib.eal:6 --base-virtaddr=0x200000000000 --match-allocations --file-prefix=spdk73258 --proc-type=auto ]
ERROR: daos_io_server:0 EAL: No free hugepages reported in hugepages-1048576kB
ERROR: /var/run/daos_server/daos_server.sock: failed to accept connection: accept unixpacket /var/run/daos_server/daos_server.sock: use of closed network connection
ERROR: DAOS I/O Server exited with error: /usr/bin/daos_io_server (instance 0) exited: exit status 1

Can someone provide some pointers to what is going on? 

Intel Corporation (UK) Limited
Registered No. 1134945 (England)
Registered Office: Pipers Way, Swindon SN3 1RJ
VAT No: 860 2173 47

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

Join to automatically receive all group messages.