issues with NVMe drives from RPM installation
Dahringer, Richard
Hi all - name: daos_server servers:
env_vars:
# Storage definitions
# When scm_class is set to ram, tmpfs will be used to emulate SCM. # The size of ram is specified by scm_size in GB units. scm_mount: /mnt/daos0 # map to -s /mnt/daos bdev_class: nvme [root@elfs13o01 ~]# daos_server -o daos_local.yml start |
|
Farrell, Patrick Arthur <patrick.farrell@...>
Richard,
There's nothing obviously wrong - to me, anyway - with your config, and no useful errors in the output. You can check the logs in /tmp/daos*.log (There will be multiple files), they should contain more information. You could also turn on debug before you
start the server to possibly get more info - described in the manual
https://daos-stack.github.io/admin/troubleshooting/
Also, if you have not, you can check your drives are visible to DAOS and can be prepared as expected with the daos_server storage commands, scan and prepare, detailed here:
https://daos-stack.github.io/admin/deployment/
That details how to run them for SCM, look at the command help for how to run them for NVMe devices. (You'll want to select NVMe only or it may ask you to reboot to set up your SCM goals, which you've obviously already done.)
Regards,
-Patrick
From: daos@daos.groups.io <daos@daos.groups.io> on behalf of richard.dahringer@... <richard.dahringer@...>
Sent: Thursday, July 30, 2020 9:27 AM To: daos@daos.groups.io <daos@daos.groups.io> Subject: [daos] issues with NVMe drives from RPM installation Hi all - name: daos_server servers:
env_vars:
# Storage definitions
# When scm_class is set to ram, tmpfs will be used to emulate SCM. # The size of ram is specified by scm_size in GB units. scm_mount: /mnt/daos0 # map to -s /mnt/daos bdev_class: nvme [root@elfs13o01 ~]# daos_server -o daos_local.yml start |
|
Hello Richard
"ERROR: DAOS I/O Server exited with error: /usr/bin/daos_io_server (instance 0) exited: exit status 1” Regards, Tom Nabarro – HPC M: +44 (0)7786 260986 Skype: tom.nabarro
From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of
richard.dahringer@...
Sent: Thursday, July 30, 2020 3:27 PM To: daos@daos.groups.io Subject: [daos] issues with NVMe drives from RPM installation
Hi all - name: daos_server servers:
env_vars:
# Storage definitions
# When scm_class is set to ram, tmpfs will be used to emulate SCM. # The size of ram is specified by scm_size in GB units. scm_mount: /mnt/daos0 # map to -s /mnt/daos bdev_class: nvme [root@elfs13o01 ~]# daos_server -o daos_local.yml start --------------------------------------------------------------------- This e-mail and any attachments may contain confidential material for |
|
Dahringer, Richard
Thanks Tom, that led me to this:
07/30-08:21:10.77 elfs13o01 DAOS[74504/74524] bio ERR src/bio/bio_xstream.c:877 init_blobstore_ctxt() Device list & device mapping is inconsistent 07/30-08:21:14.13 elfs13o01 DAOS[74504/74524] server ERR src/iosrv/srv.c:452 dss_srv_handler() failed to init spdk context for xstream(2) rc:-1005
When I check for consistency, I see :
[root@elfs13o01 tmp]# daos_server storage scan Scanning locally-attached storage... ERROR: /usr/bin/daos_admin EAL: No free hugepages reported in hugepages-1048576kB NVMe controllers and namespaces: PCI:0000:5e:00.0 Model:INTEL SSDPE2KX040T8 FW:VDV10131 Socket:0 Capacity:4.0 TB PCI:0000:5f:00.0 Model:INTEL SSDPE2KX040T8 FW:VDV10131 Socket:0 Capacity:4.0 TB PCI:0000:d8:00.0 Model:INTEL SSDPE2KX040T8 FW:VDV10131 Socket:1 Capacity:4.0 TB PCI:0000:d9:00.0 Model:INTEL SSDPE2KX040T8 FW:VDV10131 Socket:1 Capacity:4.0 TB SCM Namespaces: Device:pmem0 Socket:0 Capacity:266 GB Device:pmem1 Socket:1 Capacity:266 GB
And the first line of the NVMe controllers listed is the drive I have in the configuration file (from below)
bdev_class: nvme
Is there another file somewhere that I need to set up? I saw some documentation of ‘daos_nvme.conf’ which is automatically generated. I added the second NVMe device on socket 0 to the configuration to test to see if that would change anything, but I have the same results.
From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of
Nabarro, Tom
Sent: Thursday, July 30, 2020 09:59 To: daos@daos.groups.io Subject: Re: [daos] issues with NVMe drives from RPM installation
Hello Richard
"ERROR: DAOS I/O Server exited with error: /usr/bin/daos_io_server (instance 0) exited: exit status 1” Regards, Tom Nabarro – HPC M: +44 (0)7786 260986 Skype: tom.nabarro
From: daos@daos.groups.io <daos@daos.groups.io>
On Behalf Of richard.dahringer@...
Hi all - name: daos_server servers:
env_vars:
# Storage definitions
# When scm_class is set to ram, tmpfs will be used to emulate SCM. # The size of ram is specified by scm_size in GB units. scm_mount: /mnt/daos0 # map to -s /mnt/daos bdev_class: nvme [root@elfs13o01 ~]# daos_server -o daos_local.yml start --------------------------------------------------------------------- This e-mail and any attachments may contain confidential material for |
|
Sounds like maybe metadata is out of sync, can you try removing /mnt/daos0/*, starting the server and then (on a separate tty) reformatting with "dmg storage format --reformat"?
From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of
Dahringer, Richard
Sent: Thursday, July 30, 2020 5:28 PM To: daos@daos.groups.io Subject: Re: [daos] issues with NVMe drives from RPM installation
Thanks Tom, that led me to this:
07/30-08:21:10.77 elfs13o01 DAOS[74504/74524] bio ERR src/bio/bio_xstream.c:877 init_blobstore_ctxt() Device list & device mapping is inconsistent 07/30-08:21:14.13 elfs13o01 DAOS[74504/74524] server ERR src/iosrv/srv.c:452 dss_srv_handler() failed to init spdk context for xstream(2) rc:-1005
When I check for consistency, I see :
[root@elfs13o01 tmp]# daos_server storage scan Scanning locally-attached storage... ERROR: /usr/bin/daos_admin EAL: No free hugepages reported in hugepages-1048576kB NVMe controllers and namespaces: PCI:0000:5e:00.0 Model:INTEL SSDPE2KX040T8 FW:VDV10131 Socket:0 Capacity:4.0 TB PCI:0000:5f:00.0 Model:INTEL SSDPE2KX040T8 FW:VDV10131 Socket:0 Capacity:4.0 TB PCI:0000:d8:00.0 Model:INTEL SSDPE2KX040T8 FW:VDV10131 Socket:1 Capacity:4.0 TB PCI:0000:d9:00.0 Model:INTEL SSDPE2KX040T8 FW:VDV10131 Socket:1 Capacity:4.0 TB SCM Namespaces: Device:pmem0 Socket:0 Capacity:266 GB Device:pmem1 Socket:1 Capacity:266 GB
And the first line of the NVMe controllers listed is the drive I have in the configuration file (from below)
bdev_class: nvme
Is there another file somewhere that I need to set up? I saw some documentation of ‘daos_nvme.conf’ which is automatically generated. I added the second NVMe device on socket 0 to the configuration to test to see if that would change anything, but I have the same results.
From: daos@daos.groups.io <daos@daos.groups.io>
On Behalf Of Nabarro, Tom
Hello Richard
"ERROR: DAOS I/O Server exited with error: /usr/bin/daos_io_server (instance 0) exited: exit status 1” Regards, Tom Nabarro – HPC M: +44 (0)7786 260986 Skype: tom.nabarro
From: daos@daos.groups.io <daos@daos.groups.io>
On Behalf Of richard.dahringer@...
Hi all - name: daos_server servers:
env_vars:
# Storage definitions
# When scm_class is set to ram, tmpfs will be used to emulate SCM. # The size of ram is specified by scm_size in GB units. scm_mount: /mnt/daos0 # map to -s /mnt/daos bdev_class: nvme [root@elfs13o01 ~]# daos_server -o daos_local.yml start --------------------------------------------------------------------- This e-mail and any attachments may contain confidential material for --------------------------------------------------------------------- This e-mail and any attachments may contain confidential material for |
|
Dahringer, Richard
That worked!
Thanks Tom!
From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of
Nabarro, Tom
Sent: Thursday, July 30, 2020 12:11 To: daos@daos.groups.io Subject: Re: [daos] issues with NVMe drives from RPM installation
Sounds like maybe metadata is out of sync, can you try removing /mnt/daos0/*, starting the server and then (on a separate tty) reformatting with "dmg storage format --reformat"?
From: daos@daos.groups.io <daos@daos.groups.io>
On Behalf Of Dahringer, Richard
Thanks Tom, that led me to this:
07/30-08:21:10.77 elfs13o01 DAOS[74504/74524] bio ERR src/bio/bio_xstream.c:877 init_blobstore_ctxt() Device list & device mapping is inconsistent 07/30-08:21:14.13 elfs13o01 DAOS[74504/74524] server ERR src/iosrv/srv.c:452 dss_srv_handler() failed to init spdk context for xstream(2) rc:-1005
When I check for consistency, I see :
[root@elfs13o01 tmp]# daos_server storage scan Scanning locally-attached storage... ERROR: /usr/bin/daos_admin EAL: No free hugepages reported in hugepages-1048576kB NVMe controllers and namespaces: PCI:0000:5e:00.0 Model:INTEL SSDPE2KX040T8 FW:VDV10131 Socket:0 Capacity:4.0 TB PCI:0000:5f:00.0 Model:INTEL SSDPE2KX040T8 FW:VDV10131 Socket:0 Capacity:4.0 TB PCI:0000:d8:00.0 Model:INTEL SSDPE2KX040T8 FW:VDV10131 Socket:1 Capacity:4.0 TB PCI:0000:d9:00.0 Model:INTEL SSDPE2KX040T8 FW:VDV10131 Socket:1 Capacity:4.0 TB SCM Namespaces: Device:pmem0 Socket:0 Capacity:266 GB Device:pmem1 Socket:1 Capacity:266 GB
And the first line of the NVMe controllers listed is the drive I have in the configuration file (from below)
bdev_class: nvme
Is there another file somewhere that I need to set up? I saw some documentation of ‘daos_nvme.conf’ which is automatically generated. I added the second NVMe device on socket 0 to the configuration to test to see if that would change anything, but I have the same results.
From: daos@daos.groups.io <daos@daos.groups.io>
On Behalf Of Nabarro, Tom
Hello Richard
"ERROR: DAOS I/O Server exited with error: /usr/bin/daos_io_server (instance 0) exited: exit status 1” Regards, Tom Nabarro – HPC M: +44 (0)7786 260986 Skype: tom.nabarro
From: daos@daos.groups.io <daos@daos.groups.io>
On Behalf Of richard.dahringer@...
Hi all - name: daos_server servers:
env_vars:
# Storage definitions
# When scm_class is set to ram, tmpfs will be used to emulate SCM. # The size of ram is specified by scm_size in GB units. scm_mount: /mnt/daos0 # map to -s /mnt/daos bdev_class: nvme [root@elfs13o01 ~]# daos_server -o daos_local.yml start --------------------------------------------------------------------- This e-mail and any attachments may contain confidential material for --------------------------------------------------------------------- This e-mail and any attachments may contain confidential material for |
|