issues with NVMe drives from RPM installation

Dahringer, Richard
 

Hi all -
I'm trying to set up a proof of concept daos cluster, and it is proving to be tricky. The systems have 4 SCM 128G DIMMs, and 4 U.2 NVMe drives installed. I have installed all the RPMs from registrationcenter.intel.com, and have been able to set up the SCM devices, 'dmg -i' commands all seem to work.  When I add nvme drives to the configuration though, daos_server does not start - it does start when the nvme drives are not there. 

My daos_server.conf file:

name: daos_server
access_points: ['elfs13o01']
# port: 10001
provider: ofi+psm2
nr_hugepages: 4096
control_log_file: /tmp/daos_control.log
transport_config:
   allow_insecure: true

servers:
-
  targets: 1
  first_core: 0
  nr_xs_helpers: 0
  fabric_iface: hib0
  fabric_iface_port: 31416
  log_file: /tmp/daos_server.log

 

  env_vars:
  - DAOS_MD_CAP=1024
  - CRT_CTX_SHARE_ADDR=0
  - CRT_TIMEOUT=30
  - FI_SOCKETS_MAX_CONN_RETRY=1
  - FI_SOCKETS_CONN_TIMEOUT=2000

 

  # Storage definitions

 

  # When scm_class is set to ram, tmpfs will be used to emulate SCM.

  # The size of ram is specified by scm_size in GB units.

  scm_mount: /mnt/daos0  # map to -s /mnt/daos
  scm_class: dcpm
  scm_list: [/dev/pmem0]

  bdev_class: nvme
  bdev_list: ["0000:5e:00.0"]

The startup error:

[root@elfs13o01 ~]# daos_server -o daos_local.yml start
daos_server logging to file /tmp/daos_control.log
ERROR: /usr/bin/daos_admin EAL: No free hugepages reported in hugepages-1048576kB
DAOS Control Server (pid 73257) listening on 0.0.0.0:10001
Waiting for DAOS I/O Server instance storage to be ready...
SCM @ /mnt/daos0: 262 GB Total/247 GB Avail
Starting I/O server instance 0: /usr/bin/daos_io_server
daos_io_server:0 Using legacy core allocation algorithm
daos_io_server:0 Starting SPDK v19.04.1 / DPDK 19.02.0 initialization...
[ DPDK EAL parameters: daos -c 0x1 --pci-whitelist=0000:5e:00.0 --log-level=lib.eal:6 --base-virtaddr=0x200000000000 --match-allocations --file-prefix=spdk73258 --proc-type=auto ]
ERROR: daos_io_server:0 EAL: No free hugepages reported in hugepages-1048576kB
ERROR: /var/run/daos_server/daos_server.sock: failed to accept connection: accept unixpacket /var/run/daos_server/daos_server.sock: use of closed network connection
ERROR: DAOS I/O Server exited with error: /usr/bin/daos_io_server (instance 0) exited: exit status 1

Can someone provide some pointers to what is going on? 

Join daos@daos.groups.io to automatically receive all group messages.