Help debugging a daos I/O server startup failure?
Kevan Rehm
Greetings,
I’ve been debugging a daos_io_server startup problem for a couple of days now and have gotten nowhere, so it’s time to call in the experts. Two IO servers are configured per node, one has 5 NVMe devices and one has 6. They both give the following errors:
03/12-08:33:32.33 delphi-004 DAOS[141245/141245] server INFO src/iosrv/init.c:491 server_init() Network successfully initialized 03/12-08:33:32.33 delphi-004 DAOS[141245/141245] server INFO src/iosrv/init.c:500 server_init() Module vos,rdb,rsvc,security,mgmt,pool,cont,dtx,obj,rebuild successfully loaded 03/12-08:33:32.43 delphi-004 DAOS[141245/141287] bio INFO src/bio/bio_xstream.c:961 bio_xsctxt_alloc() Initialize NVMe context, tgt_id:0, init_thread:(nil) 03/12-08:33:32.71 delphi-004 DAOS[141245/141287] bio ERR src/bio/bio_xstream.c:1019 bio_xsctxt_alloc() failed to initialize SPDK env, DER_INVAL(-1003) 03/12-08:33:32.71 delphi-004 DAOS[141245/141287] server ERR src/iosrv/srv.c:452 dss_srv_handler() failed to init spdk context for xstream(2) rc:-1003
The failing function in bio_xsctxt_alloc() is spdk_env_init(), which just returns -1. Looking in /dev/hugepages, I see 40 2 MiB hugepage files owned by root, which I did not expect, because daos_server and the two daos_io_server processes are all running as user ‘daos’. I believe I have the file permissions set correctly:
[root@delphi-004 tmp]# cd ~daos/daos/install/bin [root@delphi-004 bin]# ls -l daos_admin daos_server -rwsr-x---. 1 root daos_grp 5751760 Mar 12 2020 daos_admin -rwxr-sr-x. 1 root daos_grp 16219920 Mar 12 2020 daos_server
The daos_admin process had errors also and exited, and daos_control.log mentions it is discarding garbage responses.
Am I doing something obviously wrong? Attached are the daos_control.log and daos_io_server.log files, together with the daos .yml files.
If you need more info, let me know.
Thanks, Kevan
|
|
Kevan Rehm
One update, the discarded garbage messages occur because MaxMessageSize is set to 4096 which is smaller than the message which daos_admin is trying to return because there are 11 (eventually 12) NVMe devices on this node. I raised the value of MaxMessageSize to 8192 and the “discarded garbage message” problem went away, but the rest of the issues still remain.
Kevan
From: <daos@daos.groups.io> on behalf of Kevan Rehm <kevan.rehm@...>
Greetings,
I’ve been debugging a daos_io_server startup problem for a couple of days now and have gotten nowhere, so it’s time to call in the experts. Two IO servers are configured per node, one has 5 NVMe devices and one has 6. They both give the following errors:
03/12-08:33:32.33 delphi-004 DAOS[141245/141245] server INFO src/iosrv/init.c:491 server_init() Network successfully initialized 03/12-08:33:32.33 delphi-004 DAOS[141245/141245] server INFO src/iosrv/init.c:500 server_init() Module vos,rdb,rsvc,security,mgmt,pool,cont,dtx,obj,rebuild successfully loaded 03/12-08:33:32.43 delphi-004 DAOS[141245/141287] bio INFO src/bio/bio_xstream.c:961 bio_xsctxt_alloc() Initialize NVMe context, tgt_id:0, init_thread:(nil) 03/12-08:33:32.71 delphi-004 DAOS[141245/141287] bio ERR src/bio/bio_xstream.c:1019 bio_xsctxt_alloc() failed to initialize SPDK env, DER_INVAL(-1003) 03/12-08:33:32.71 delphi-004 DAOS[141245/141287] server ERR src/iosrv/srv.c:452 dss_srv_handler() failed to init spdk context for xstream(2) rc:-1003
The failing function in bio_xsctxt_alloc() is spdk_env_init(), which just returns -1. Looking in /dev/hugepages, I see 40 2 MiB hugepage files owned by root, which I did not expect, because daos_server and the two daos_io_server processes are all running as user ‘daos’. I believe I have the file permissions set correctly:
[root@delphi-004 tmp]# cd ~daos/daos/install/bin [root@delphi-004 bin]# ls -l daos_admin daos_server -rwsr-x---. 1 root daos_grp 5751760 Mar 12 2020 daos_admin -rwxr-sr-x. 1 root daos_grp 16219920 Mar 12 2020 daos_server
The daos_admin process had errors also and exited, and daos_control.log mentions it is discarding garbage responses.
Am I doing something obviously wrong? Attached are the daos_control.log and daos_io_server.log files, together with the daos .yml files.
If you need more info, let me know.
Thanks, Kevan
|
|
Kevan Rehm
Greetings,
I have opened Jira DAOS-4342 for the problem where the value of MaxMessageSize is too small.
I still have no solution for the DPDK failure I am seeing in daos_server startup, but I have more background information on the problem. The underlying problem is that DPDK cannot convert virtual addresses to physical addresses.
Step 1: Early in startup, rte_pci_get_iommu_class() is called from rte_eal_init(), it sets iova_mode to RTE_IOVA_PA (DMA using physical addresses).
Step 2: rte_eal_hugepage_init() gets called, which eventually calls test_phys_addrs_available(). That routine sets phys_addrs_available = false, and reports:
delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:1 EAL: Cannot obtain physical addresses: Permission denied. Only vfio will function.
The “Permission denied” is a stale errno value, the actual test in the code that fails is inside routine rte_mem_virt2phy():
/* * the pfn (page frame number) are bits 0-54 (see * pagemap.txt in linux Documentation) */ if ((page & 0x7fffffffffffffULL) == 0) { return RTE_BAD_IOVA; }
The bottom 55 bits of the word that was read are all zeros. The actual value of the word is 0x8180000000000000.
Step 3: Routine rte_service_init() gets called. It eventually calls alloc_seg() which wants to convert a virtual address to a physical address, but phys_addrs_available is false, so it fails. It tries to allocate segments on both sockets but both attempts fail, as phys_addrs_available applies to both sockets.
delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: Trying to obtain current memory policy. EAL: Setting policy MPOL_PREFERRED for socket 0 delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: alloc_seg(): can't get IOVA addr delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: Ask a virtual area of 0x200000 bytes delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: Virtual area found at 0x200000200000 (size = 0x200000) delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: attempted to allocate 1 segments, but only 0 were allocated EAL: Restoring previous memory policy: 0 delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: Trying to obtain current memory policy. delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: Setting policy MPOL_PREFERRED for socket 1 delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: alloc_seg(): can't get IOVA addr delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: Ask a virtual area of 0x200000 bytes EAL: Virtual area found at 0x201000a00000 (size = 0x200000) delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: attempted to allocate 1 segments, but only 0 were allocated delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: Restoring previous memory policy: 0 delphi-004.us.cray.com ERROR 2020/03/14 08:37:09 daos_io_server:0 EAL: FATAL: rte_service_init() failed Failed to initialize DPDK delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 error allocating rte services array EAL: rte_service_init() failed
I think the heart of the problem is, when DPDK reads /proc/self/pagemap, why is it getting a value where the bottom 55 bits are all zero?
Kevan
From: <daos@daos.groups.io> on behalf of Kevan Rehm <kevan.rehm@...>
One update, the discarded garbage messages occur because MaxMessageSize is set to 4096 which is smaller than the message which daos_admin is trying to return because there are 11 (eventually 12) NVMe devices on this node. I raised the value of MaxMessageSize to 8192 and the “discarded garbage message” problem went away, but the rest of the issues still remain.
Kevan
From: <daos@daos.groups.io> on behalf of Kevan Rehm <kevan.rehm@...>
Greetings,
I’ve been debugging a daos_io_server startup problem for a couple of days now and have gotten nowhere, so it’s time to call in the experts. Two IO servers are configured per node, one has 5 NVMe devices and one has 6. They both give the following errors:
03/12-08:33:32.33 delphi-004 DAOS[141245/141245] server INFO src/iosrv/init.c:491 server_init() Network successfully initialized 03/12-08:33:32.33 delphi-004 DAOS[141245/141245] server INFO src/iosrv/init.c:500 server_init() Module vos,rdb,rsvc,security,mgmt,pool,cont,dtx,obj,rebuild successfully loaded 03/12-08:33:32.43 delphi-004 DAOS[141245/141287] bio INFO src/bio/bio_xstream.c:961 bio_xsctxt_alloc() Initialize NVMe context, tgt_id:0, init_thread:(nil) 03/12-08:33:32.71 delphi-004 DAOS[141245/141287] bio ERR src/bio/bio_xstream.c:1019 bio_xsctxt_alloc() failed to initialize SPDK env, DER_INVAL(-1003) 03/12-08:33:32.71 delphi-004 DAOS[141245/141287] server ERR src/iosrv/srv.c:452 dss_srv_handler() failed to init spdk context for xstream(2) rc:-1003
The failing function in bio_xsctxt_alloc() is spdk_env_init(), which just returns -1. Looking in /dev/hugepages, I see 40 2 MiB hugepage files owned by root, which I did not expect, because daos_server and the two daos_io_server processes are all running as user ‘daos’. I believe I have the file permissions set correctly:
[root@delphi-004 tmp]# cd ~daos/daos/install/bin [root@delphi-004 bin]# ls -l daos_admin daos_server -rwsr-x---. 1 root daos_grp 5751760 Mar 12 2020 daos_admin -rwxr-sr-x. 1 root daos_grp 16219920 Mar 12 2020 daos_server
The daos_admin process had errors also and exited, and daos_control.log mentions it is discarding garbage responses.
Am I doing something obviously wrong? Attached are the daos_control.log and daos_io_server.log files, together with the daos .yml files.
If you need more info, let me know.
Thanks, Kevan
|
|
Kevan Rehm
(DAOS-4342 will be closed, a more recent master contains the fix. For the DPDK problem here, I am running yesterday’s master.)
My DPDK problem is related to permissions. I am running daos_server as user ‘daos’, group ‘daos_grp’, with daos_admin and daos_server permissions set as documented:
# ls -l daos_admin daos_server -rwsr-x---. 1 root daos_grp 6188984 Mar 14 16:12 daos_admin -rwxr-sr-x. 1 root daos_grp 16345032 Mar 14 16:13 daos_server
If I start the daemon manually like this: ~/daos/install/bin/daos_server start
it fails every time, the page frame number read by rte_mem_virt2phy() is always zero, and alloc_seg() always fails. If I instead start the daemon with: sudo ~/daos/install/bin/daos_server start
then the page frame numbers read by rte_mem_virt2phy() are correctly non-zero, and the daos_io_server daemons start up.
Am I doing something incorrectly in my attempts to run the daos servers as non-root?
Thanks, Kevan
From: <daos@daos.groups.io> on behalf of Kevan Rehm <kevan.rehm@...>
Greetings,
I have opened Jira DAOS-4342 for the problem where the value of MaxMessageSize is too small.
I still have no solution for the DPDK failure I am seeing in daos_server startup, but I have more background information on the problem. The underlying problem is that DPDK cannot convert virtual addresses to physical addresses.
Step 1: Early in startup, rte_pci_get_iommu_class() is called from rte_eal_init(), it sets iova_mode to RTE_IOVA_PA (DMA using physical addresses).
Step 2: rte_eal_hugepage_init() gets called, which eventually calls test_phys_addrs_available(). That routine sets phys_addrs_available = false, and reports:
delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:1 EAL: Cannot obtain physical addresses: Permission denied. Only vfio will function.
The “Permission denied” is a stale errno value, the actual test in the code that fails is inside routine rte_mem_virt2phy():
/* * the pfn (page frame number) are bits 0-54 (see * pagemap.txt in linux Documentation) */ if ((page & 0x7fffffffffffffULL) == 0) { return RTE_BAD_IOVA; }
The bottom 55 bits of the word that was read are all zeros. The actual value of the word is 0x8180000000000000.
Step 3: Routine rte_service_init() gets called. It eventually calls alloc_seg() which wants to convert a virtual address to a physical address, but phys_addrs_available is false, so it fails. It tries to allocate segments on both sockets but both attempts fail, as phys_addrs_available applies to both sockets.
delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: Trying to obtain current memory policy. EAL: Setting policy MPOL_PREFERRED for socket 0 delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: alloc_seg(): can't get IOVA addr delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: Ask a virtual area of 0x200000 bytes delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: Virtual area found at 0x200000200000 (size = 0x200000) delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: attempted to allocate 1 segments, but only 0 were allocated EAL: Restoring previous memory policy: 0 delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: Trying to obtain current memory policy. delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: Setting policy MPOL_PREFERRED for socket 1 delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: alloc_seg(): can't get IOVA addr delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: Ask a virtual area of 0x200000 bytes EAL: Virtual area found at 0x201000a00000 (size = 0x200000) delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: attempted to allocate 1 segments, but only 0 were allocated delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: Restoring previous memory policy: 0 delphi-004.us.cray.com ERROR 2020/03/14 08:37:09 daos_io_server:0 EAL: FATAL: rte_service_init() failed Failed to initialize DPDK delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 error allocating rte services array EAL: rte_service_init() failed
I think the heart of the problem is, when DPDK reads /proc/self/pagemap, why is it getting a value where the bottom 55 bits are all zero?
Kevan
From: <daos@daos.groups.io> on behalf of Kevan Rehm <kevan.rehm@...>
One update, the discarded garbage messages occur because MaxMessageSize is set to 4096 which is smaller than the message which daos_admin is trying to return because there are 11 (eventually 12) NVMe devices on this node. I raised the value of MaxMessageSize to 8192 and the “discarded garbage message” problem went away, but the rest of the issues still remain.
Kevan
From: <daos@daos.groups.io> on behalf of Kevan Rehm <kevan.rehm@...>
Greetings,
I’ve been debugging a daos_io_server startup problem for a couple of days now and have gotten nowhere, so it’s time to call in the experts. Two IO servers are configured per node, one has 5 NVMe devices and one has 6. They both give the following errors:
03/12-08:33:32.33 delphi-004 DAOS[141245/141245] server INFO src/iosrv/init.c:491 server_init() Network successfully initialized 03/12-08:33:32.33 delphi-004 DAOS[141245/141245] server INFO src/iosrv/init.c:500 server_init() Module vos,rdb,rsvc,security,mgmt,pool,cont,dtx,obj,rebuild successfully loaded 03/12-08:33:32.43 delphi-004 DAOS[141245/141287] bio INFO src/bio/bio_xstream.c:961 bio_xsctxt_alloc() Initialize NVMe context, tgt_id:0, init_thread:(nil) 03/12-08:33:32.71 delphi-004 DAOS[141245/141287] bio ERR src/bio/bio_xstream.c:1019 bio_xsctxt_alloc() failed to initialize SPDK env, DER_INVAL(-1003) 03/12-08:33:32.71 delphi-004 DAOS[141245/141287] server ERR src/iosrv/srv.c:452 dss_srv_handler() failed to init spdk context for xstream(2) rc:-1003
The failing function in bio_xsctxt_alloc() is spdk_env_init(), which just returns -1. Looking in /dev/hugepages, I see 40 2 MiB hugepage files owned by root, which I did not expect, because daos_server and the two daos_io_server processes are all running as user ‘daos’. I believe I have the file permissions set correctly:
[root@delphi-004 tmp]# cd ~daos/daos/install/bin [root@delphi-004 bin]# ls -l daos_admin daos_server -rwsr-x---. 1 root daos_grp 5751760 Mar 12 2020 daos_admin -rwxr-sr-x. 1 root daos_grp 16219920 Mar 12 2020 daos_server
The daos_admin process had errors also and exited, and daos_control.log mentions it is discarding garbage responses.
Am I doing something obviously wrong? Attached are the daos_control.log and daos_io_server.log files, together with the daos .yml files.
If you need more info, let me know.
Thanks, Kevan
|
|
Lombardi, Johann
Hi Kevan,
To run SPDK as a non-root user, you need to switch from UIO to VFIO. Tom and Mike have spent some time recently to verify that the DAOS server can be run as a regular user (except for the setuid root on the daos_admin utility) when VFIO is enabled. It requires VT-d to be enabled in the BIOS. Please check: https://daos-stack.github.io/admin/deployment/#enable-iommu-optional
Cheers, Johann
From:
<daos@daos.groups.io> on behalf of Kevan Rehm <kevan.rehm@...>
(DAOS-4342 will be closed, a more recent master contains the fix. For the DPDK problem here, I am running yesterday’s master.)
My DPDK problem is related to permissions. I am running daos_server as user ‘daos’, group ‘daos_grp’, with daos_admin and daos_server permissions set as documented:
# ls -l daos_admin daos_server -rwsr-x---. 1 root daos_grp 6188984 Mar 14 16:12 daos_admin -rwxr-sr-x. 1 root daos_grp 16345032 Mar 14 16:13 daos_server
If I start the daemon manually like this: ~/daos/install/bin/daos_server start
it fails every time, the page frame number read by rte_mem_virt2phy() is always zero, and alloc_seg() always fails. If I instead start the daemon with: sudo ~/daos/install/bin/daos_server start
then the page frame numbers read by rte_mem_virt2phy() are correctly non-zero, and the daos_io_server daemons start up.
Am I doing something incorrectly in my attempts to run the daos servers as non-root?
Thanks, Kevan
From:
<daos@daos.groups.io> on behalf of Kevan Rehm <kevan.rehm@...>
Greetings,
I have opened Jira DAOS-4342 for the problem where the value of MaxMessageSize is too small.
I still have no solution for the DPDK failure I am seeing in daos_server startup, but I have more background information on the problem. The underlying problem is that DPDK cannot convert virtual addresses to physical addresses.
Step 1: Early in startup, rte_pci_get_iommu_class() is called from rte_eal_init(), it sets iova_mode to RTE_IOVA_PA (DMA using physical addresses).
Step 2: rte_eal_hugepage_init() gets called, which eventually calls test_phys_addrs_available(). That routine sets phys_addrs_available = false, and reports:
delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:1 EAL: Cannot obtain physical addresses: Permission denied. Only vfio will function.
The “Permission denied” is a stale errno value, the actual test in the code that fails is inside routine rte_mem_virt2phy():
/* * the pfn (page frame number) are bits 0-54 (see * pagemap.txt in linux Documentation) */ if ((page & 0x7fffffffffffffULL) == 0) { return RTE_BAD_IOVA; }
The bottom 55 bits of the word that was read are all zeros. The actual value of the word is 0x8180000000000000.
Step 3: Routine rte_service_init() gets called. It eventually calls alloc_seg() which wants to convert a virtual address to a physical address, but phys_addrs_available is false, so it fails. It tries to allocate segments on both sockets but both attempts fail, as phys_addrs_available applies to both sockets.
delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: Trying to obtain current memory policy. EAL: Setting policy MPOL_PREFERRED for socket 0 delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: alloc_seg(): can't get IOVA addr delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: Ask a virtual area of 0x200000 bytes delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: Virtual area found at 0x200000200000 (size = 0x200000) delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: attempted to allocate 1 segments, but only 0 were allocated EAL: Restoring previous memory policy: 0 delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: Trying to obtain current memory policy. delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: Setting policy MPOL_PREFERRED for socket 1 delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: alloc_seg(): can't get IOVA addr delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: Ask a virtual area of 0x200000 bytes EAL: Virtual area found at 0x201000a00000 (size = 0x200000) delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: attempted to allocate 1 segments, but only 0 were allocated delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: Restoring previous memory policy: 0 delphi-004.us.cray.com ERROR 2020/03/14 08:37:09 daos_io_server:0 EAL: FATAL: rte_service_init() failed Failed to initialize DPDK delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 error allocating rte services array EAL: rte_service_init() failed
I think the heart of the problem is, when DPDK reads /proc/self/pagemap, why is it getting a value where the bottom 55 bits are all zero?
Kevan
From:
<daos@daos.groups.io> on behalf of Kevan Rehm <kevan.rehm@...>
One update, the discarded garbage messages occur because MaxMessageSize is set to 4096 which is smaller than the message which daos_admin is trying to return because there are 11 (eventually 12) NVMe devices on this node. I raised the value of MaxMessageSize to 8192 and the “discarded garbage message” problem went away, but the rest of the issues still remain.
Kevan
From:
<daos@daos.groups.io> on behalf of Kevan Rehm <kevan.rehm@...>
Greetings,
I’ve been debugging a daos_io_server startup problem for a couple of days now and have gotten nowhere, so it’s time to call in the experts. Two IO servers are configured per node, one has 5 NVMe devices and one has 6. They both give the following errors:
03/12-08:33:32.33 delphi-004 DAOS[141245/141245] server INFO src/iosrv/init.c:491 server_init() Network successfully initialized 03/12-08:33:32.33 delphi-004 DAOS[141245/141245] server INFO src/iosrv/init.c:500 server_init() Module vos,rdb,rsvc,security,mgmt,pool,cont,dtx,obj,rebuild successfully loaded 03/12-08:33:32.43 delphi-004 DAOS[141245/141287] bio INFO src/bio/bio_xstream.c:961 bio_xsctxt_alloc() Initialize NVMe context, tgt_id:0, init_thread:(nil) 03/12-08:33:32.71 delphi-004 DAOS[141245/141287] bio ERR src/bio/bio_xstream.c:1019 bio_xsctxt_alloc() failed to initialize SPDK env, DER_INVAL(-1003) 03/12-08:33:32.71 delphi-004 DAOS[141245/141287] server ERR src/iosrv/srv.c:452 dss_srv_handler() failed to init spdk context for xstream(2) rc:-1003
The failing function in bio_xsctxt_alloc() is spdk_env_init(), which just returns -1. Looking in /dev/hugepages, I see 40 2 MiB hugepage files owned by root, which I did not expect, because daos_server and the two daos_io_server processes are all running as user ‘daos’. I believe I have the file permissions set correctly:
[root@delphi-004 tmp]# cd ~daos/daos/install/bin [root@delphi-004 bin]# ls -l daos_admin daos_server -rwsr-x---. 1 root daos_grp 5751760 Mar 12 2020 daos_admin -rwxr-sr-x. 1 root daos_grp 16219920 Mar 12 2020 daos_server
The daos_admin process had errors also and exited, and daos_control.log mentions it is discarding garbage responses.
Am I doing something obviously wrong? Attached are the daos_control.log and daos_io_server.log files, together with the daos .yml files.
If you need more info, let me know.
Thanks, Kevan --------------------------------------------------------------------- This e-mail and any attachments may contain confidential material for
|
|
Kevan Rehm
I am using vfio, and my IOMMU is enabled.
From: <daos@daos.groups.io> on behalf of "Lombardi, Johann" <johann.lombardi@...>
Hi Kevan,
To run SPDK as a non-root user, you need to switch from UIO to VFIO. Tom and Mike have spent some time recently to verify that the DAOS server can be run as a regular user (except for the setuid root on the daos_admin utility) when VFIO is enabled. It requires VT-d to be enabled in the BIOS. Please check: https://daos-stack.github.io/admin/deployment/#enable-iommu-optional
Cheers, Johann
From:
<daos@daos.groups.io> on behalf of Kevan Rehm <kevan.rehm@...>
(DAOS-4342 will be closed, a more recent master contains the fix. For the DPDK problem here, I am running yesterday’s master.)
My DPDK problem is related to permissions. I am running daos_server as user ‘daos’, group ‘daos_grp’, with daos_admin and daos_server permissions set as documented:
# ls -l daos_admin daos_server -rwsr-x---. 1 root daos_grp 6188984 Mar 14 16:12 daos_admin -rwxr-sr-x. 1 root daos_grp 16345032 Mar 14 16:13 daos_server
If I start the daemon manually like this: ~/daos/install/bin/daos_server start
it fails every time, the page frame number read by rte_mem_virt2phy() is always zero, and alloc_seg() always fails. If I instead start the daemon with: sudo ~/daos/install/bin/daos_server start
then the page frame numbers read by rte_mem_virt2phy() are correctly non-zero, and the daos_io_server daemons start up.
Am I doing something incorrectly in my attempts to run the daos servers as non-root?
Thanks, Kevan
From:
<daos@daos.groups.io> on behalf of Kevan Rehm <kevan.rehm@...>
Greetings,
I have opened Jira DAOS-4342 for the problem where the value of MaxMessageSize is too small.
I still have no solution for the DPDK failure I am seeing in daos_server startup, but I have more background information on the problem. The underlying problem is that DPDK cannot convert virtual addresses to physical addresses.
Step 1: Early in startup, rte_pci_get_iommu_class() is called from rte_eal_init(), it sets iova_mode to RTE_IOVA_PA (DMA using physical addresses).
Step 2: rte_eal_hugepage_init() gets called, which eventually calls test_phys_addrs_available(). That routine sets phys_addrs_available = false, and reports:
delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:1 EAL: Cannot obtain physical addresses: Permission denied. Only vfio will function.
The “Permission denied” is a stale errno value, the actual test in the code that fails is inside routine rte_mem_virt2phy():
/* * the pfn (page frame number) are bits 0-54 (see * pagemap.txt in linux Documentation) */ if ((page & 0x7fffffffffffffULL) == 0) { return RTE_BAD_IOVA; }
The bottom 55 bits of the word that was read are all zeros. The actual value of the word is 0x8180000000000000.
Step 3: Routine rte_service_init() gets called. It eventually calls alloc_seg() which wants to convert a virtual address to a physical address, but phys_addrs_available is false, so it fails. It tries to allocate segments on both sockets but both attempts fail, as phys_addrs_available applies to both sockets.
delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: Trying to obtain current memory policy. EAL: Setting policy MPOL_PREFERRED for socket 0 delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: alloc_seg(): can't get IOVA addr delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: Ask a virtual area of 0x200000 bytes delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: Virtual area found at 0x200000200000 (size = 0x200000) delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: attempted to allocate 1 segments, but only 0 were allocated EAL: Restoring previous memory policy: 0 delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: Trying to obtain current memory policy. delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: Setting policy MPOL_PREFERRED for socket 1 delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: alloc_seg(): can't get IOVA addr delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: Ask a virtual area of 0x200000 bytes EAL: Virtual area found at 0x201000a00000 (size = 0x200000) delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: attempted to allocate 1 segments, but only 0 were allocated delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: Restoring previous memory policy: 0 delphi-004.us.cray.com ERROR 2020/03/14 08:37:09 daos_io_server:0 EAL: FATAL: rte_service_init() failed Failed to initialize DPDK delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 error allocating rte services array EAL: rte_service_init() failed
I think the heart of the problem is, when DPDK reads /proc/self/pagemap, why is it getting a value where the bottom 55 bits are all zero?
Kevan
From:
<daos@daos.groups.io> on behalf of Kevan Rehm <kevan.rehm@...>
One update, the discarded garbage messages occur because MaxMessageSize is set to 4096 which is smaller than the message which daos_admin is trying to return because there are 11 (eventually 12) NVMe devices on this node. I raised the value of MaxMessageSize to 8192 and the “discarded garbage message” problem went away, but the rest of the issues still remain.
Kevan
From:
<daos@daos.groups.io> on behalf of Kevan Rehm <kevan.rehm@...>
Greetings,
I’ve been debugging a daos_io_server startup problem for a couple of days now and have gotten nowhere, so it’s time to call in the experts. Two IO servers are configured per node, one has 5 NVMe devices and one has 6. They both give the following errors:
03/12-08:33:32.33 delphi-004 DAOS[141245/141245] server INFO src/iosrv/init.c:491 server_init() Network successfully initialized 03/12-08:33:32.33 delphi-004 DAOS[141245/141245] server INFO src/iosrv/init.c:500 server_init() Module vos,rdb,rsvc,security,mgmt,pool,cont,dtx,obj,rebuild successfully loaded 03/12-08:33:32.43 delphi-004 DAOS[141245/141287] bio INFO src/bio/bio_xstream.c:961 bio_xsctxt_alloc() Initialize NVMe context, tgt_id:0, init_thread:(nil) 03/12-08:33:32.71 delphi-004 DAOS[141245/141287] bio ERR src/bio/bio_xstream.c:1019 bio_xsctxt_alloc() failed to initialize SPDK env, DER_INVAL(-1003) 03/12-08:33:32.71 delphi-004 DAOS[141245/141287] server ERR src/iosrv/srv.c:452 dss_srv_handler() failed to init spdk context for xstream(2) rc:-1003
The failing function in bio_xsctxt_alloc() is spdk_env_init(), which just returns -1. Looking in /dev/hugepages, I see 40 2 MiB hugepage files owned by root, which I did not expect, because daos_server and the two daos_io_server processes are all running as user ‘daos’. I believe I have the file permissions set correctly:
[root@delphi-004 tmp]# cd ~daos/daos/install/bin [root@delphi-004 bin]# ls -l daos_admin daos_server -rwsr-x---. 1 root daos_grp 5751760 Mar 12 2020 daos_admin -rwxr-sr-x. 1 root daos_grp 16219920 Mar 12 2020 daos_server
The daos_admin process had errors also and exited, and daos_control.log mentions it is discarding garbage responses.
Am I doing something obviously wrong? Attached are the daos_control.log and daos_io_server.log files, together with the daos .yml files.
If you need more info, let me know.
Thanks, Kevan --------------------------------------------------------------------- This e-mail and any attachments may contain confidential material for
|
|
Lombardi, Johann
Hm, then I don’t understand why it is trying to read /proc/self/pagemap (only accessible to root in recent kernels). Maybe Tom and Mike can comment.
Cheers, Johann
From:
<daos@daos.groups.io> on behalf of Kevan Rehm <kevan.rehm@...>
I am using vfio, and my IOMMU is enabled.
From:
<daos@daos.groups.io> on behalf of "Lombardi, Johann" <johann.lombardi@...>
Hi Kevan,
To run SPDK as a non-root user, you need to switch from UIO to VFIO. Tom and Mike have spent some time recently to verify that the DAOS server can be run as a regular user (except for the setuid root on the daos_admin utility) when VFIO is enabled. It requires VT-d to be enabled in the BIOS. Please check: https://daos-stack.github.io/admin/deployment/#enable-iommu-optional
Cheers, Johann
From:
<daos@daos.groups.io> on behalf of Kevan Rehm <kevan.rehm@...>
(DAOS-4342 will be closed, a more recent master contains the fix. For the DPDK problem here, I am running yesterday’s master.)
My DPDK problem is related to permissions. I am running daos_server as user ‘daos’, group ‘daos_grp’, with daos_admin and daos_server permissions set as documented:
# ls -l daos_admin daos_server -rwsr-x---. 1 root daos_grp 6188984 Mar 14 16:12 daos_admin -rwxr-sr-x. 1 root daos_grp 16345032 Mar 14 16:13 daos_server
If I start the daemon manually like this: ~/daos/install/bin/daos_server start
it fails every time, the page frame number read by rte_mem_virt2phy() is always zero, and alloc_seg() always fails. If I instead start the daemon with: sudo ~/daos/install/bin/daos_server start
then the page frame numbers read by rte_mem_virt2phy() are correctly non-zero, and the daos_io_server daemons start up.
Am I doing something incorrectly in my attempts to run the daos servers as non-root?
Thanks, Kevan
From:
<daos@daos.groups.io> on behalf of Kevan Rehm <kevan.rehm@...>
Greetings,
I have opened Jira DAOS-4342 for the problem where the value of MaxMessageSize is too small.
I still have no solution for the DPDK failure I am seeing in daos_server startup, but I have more background information on the problem. The underlying problem is that DPDK cannot convert virtual addresses to physical addresses.
Step 1: Early in startup, rte_pci_get_iommu_class() is called from rte_eal_init(), it sets iova_mode to RTE_IOVA_PA (DMA using physical addresses).
Step 2: rte_eal_hugepage_init() gets called, which eventually calls test_phys_addrs_available(). That routine sets phys_addrs_available = false, and reports:
delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:1 EAL: Cannot obtain physical addresses: Permission denied. Only vfio will function.
The “Permission denied” is a stale errno value, the actual test in the code that fails is inside routine rte_mem_virt2phy():
/* * the pfn (page frame number) are bits 0-54 (see * pagemap.txt in linux Documentation) */ if ((page & 0x7fffffffffffffULL) == 0) { return RTE_BAD_IOVA; }
The bottom 55 bits of the word that was read are all zeros. The actual value of the word is 0x8180000000000000.
Step 3: Routine rte_service_init() gets called. It eventually calls alloc_seg() which wants to convert a virtual address to a physical address, but phys_addrs_available is false, so it fails. It tries to allocate segments on both sockets but both attempts fail, as phys_addrs_available applies to both sockets.
delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: Trying to obtain current memory policy. EAL: Setting policy MPOL_PREFERRED for socket 0 delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: alloc_seg(): can't get IOVA addr delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: Ask a virtual area of 0x200000 bytes delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: Virtual area found at 0x200000200000 (size = 0x200000) delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: attempted to allocate 1 segments, but only 0 were allocated EAL: Restoring previous memory policy: 0 delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: Trying to obtain current memory policy. delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: Setting policy MPOL_PREFERRED for socket 1 delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: alloc_seg(): can't get IOVA addr delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: Ask a virtual area of 0x200000 bytes EAL: Virtual area found at 0x201000a00000 (size = 0x200000) delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: attempted to allocate 1 segments, but only 0 were allocated delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: Restoring previous memory policy: 0 delphi-004.us.cray.com ERROR 2020/03/14 08:37:09 daos_io_server:0 EAL: FATAL: rte_service_init() failed Failed to initialize DPDK delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 error allocating rte services array EAL: rte_service_init() failed
I think the heart of the problem is, when DPDK reads /proc/self/pagemap, why is it getting a value where the bottom 55 bits are all zero?
Kevan
From:
<daos@daos.groups.io> on behalf of Kevan Rehm <kevan.rehm@...>
One update, the discarded garbage messages occur because MaxMessageSize is set to 4096 which is smaller than the message which daos_admin is trying to return because there are 11 (eventually 12) NVMe devices on this node. I raised the value of MaxMessageSize to 8192 and the “discarded garbage message” problem went away, but the rest of the issues still remain.
Kevan
From:
<daos@daos.groups.io> on behalf of Kevan Rehm <kevan.rehm@...>
Greetings,
I’ve been debugging a daos_io_server startup problem for a couple of days now and have gotten nowhere, so it’s time to call in the experts. Two IO servers are configured per node, one has 5 NVMe devices and one has 6. They both give the following errors:
03/12-08:33:32.33 delphi-004 DAOS[141245/141245] server INFO src/iosrv/init.c:491 server_init() Network successfully initialized 03/12-08:33:32.33 delphi-004 DAOS[141245/141245] server INFO src/iosrv/init.c:500 server_init() Module vos,rdb,rsvc,security,mgmt,pool,cont,dtx,obj,rebuild successfully loaded 03/12-08:33:32.43 delphi-004 DAOS[141245/141287] bio INFO src/bio/bio_xstream.c:961 bio_xsctxt_alloc() Initialize NVMe context, tgt_id:0, init_thread:(nil) 03/12-08:33:32.71 delphi-004 DAOS[141245/141287] bio ERR src/bio/bio_xstream.c:1019 bio_xsctxt_alloc() failed to initialize SPDK env, DER_INVAL(-1003) 03/12-08:33:32.71 delphi-004 DAOS[141245/141287] server ERR src/iosrv/srv.c:452 dss_srv_handler() failed to init spdk context for xstream(2) rc:-1003
The failing function in bio_xsctxt_alloc() is spdk_env_init(), which just returns -1. Looking in /dev/hugepages, I see 40 2 MiB hugepage files owned by root, which I did not expect, because daos_server and the two daos_io_server processes are all running as user ‘daos’. I believe I have the file permissions set correctly:
[root@delphi-004 tmp]# cd ~daos/daos/install/bin [root@delphi-004 bin]# ls -l daos_admin daos_server -rwsr-x---. 1 root daos_grp 5751760 Mar 12 2020 daos_admin -rwxr-sr-x. 1 root daos_grp 16219920 Mar 12 2020 daos_server
The daos_admin process had errors also and exited, and daos_control.log mentions it is discarding garbage responses.
Am I doing something obviously wrong? Attached are the daos_control.log and daos_io_server.log files, together with the daos .yml files.
If you need more info, let me know.
Thanks, Kevan --------------------------------------------------------------------- This e-mail and any attachments may contain confidential material for --------------------------------------------------------------------- This e-mail and any attachments may contain confidential material for
|
|
Kevan Rehm
I should probably be more accurate and say that “I think I am using vfio”. 😊. Here are the things I have checked, if there are other things to check, let me know.
# cat /proc/cmdline BOOT_IMAGE=/vmlinuz-3.10.0-1062.12.1.el7.x86_64 root=/dev/mapper/cl_delphi--004-root ro spectre_v2=retpoline rd.lvm.lv=cl_delphi-004/root rd.lvm.lv=cl_delphi-004/swap rhgb quiet intel_iommu=on console=ttyS1,115200
# ls -l total 0 lrwxrwxrwx. 1 root root 0 Mar 15 01:38 dmar0 -> ../../devices/virtual/iommu/dmar0 lrwxrwxrwx. 1 root root 0 Mar 15 01:38 dmar1 -> ../../devices/virtual/iommu/dmar1 lrwxrwxrwx. 1 root root 0 Mar 15 01:38 dmar2 -> ../../devices/virtual/iommu/dmar2 lrwxrwxrwx. 1 root root 0 Mar 15 01:38 dmar3 -> ../../devices/virtual/iommu/dmar3 lrwxrwxrwx. 1 root root 0 Mar 15 01:38 dmar4 -> ../../devices/virtual/iommu/dmar4 lrwxrwxrwx. 1 root root 0 Mar 15 01:38 dmar5 -> ../../devices/virtual/iommu/dmar5 lrwxrwxrwx. 1 root root 0 Mar 15 01:38 dmar6 -> ../../devices/virtual/iommu/dmar6 lrwxrwxrwx. 1 root root 0 Mar 15 01:38 dmar7 -> ../../devices/virtual/iommu/dmar7 [root@delphi-004 iommu]# pwd /sys/class/iommu
From daos_control.log: delphi-004.us.cray.com INFO 2020/03/15 02:31:29 daos_io_server:1 EAL: Initializing vfio delphi-004.us.cray.com INFO 2020/03/15 02:31:29 daos_io_server:1 EAL: Probing VFIO support... delphi-004.us.cray.com INFO 2020/03/15 02:31:29 daos_io_server:1 EAL: IOMMU type 1 (Type 1) is supported EAL: IOMMU type 7 (sPAPR) is not supported delphi-004.us.cray.com INFO 2020/03/15 02:31:29 daos_io_server:1 EAL: IOMMU type 8 (No-IOMMU) is not supported EAL: VFIO support initialized
[root@delphi-004 vfio]# cd /dev/vfio [root@delphi-004 vfio]# ls -l total 0 crw-rw-rw-. 1 root root 235, 11 Mar 15 02:34 1 crw-rw-rw-. 1 root root 235, 0 Mar 15 02:33 29 crw-rw-rw-. 1 root root 235, 1 Mar 15 02:33 41 crw-rw-rw-. 1 root root 235, 2 Mar 15 02:33 42 crw-rw-rw-. 1 root root 235, 3 Mar 15 02:33 43 crw-rw-rw-. 1 root root 235, 4 Mar 15 02:33 44 crw-rw-rw-. 1 root root 235, 12 Mar 15 02:34 55 crw-rw-rw-. 1 root root 235, 5 Mar 15 02:33 71 crw-rw-rw-. 1 root root 235, 6 Mar 15 02:33 72 crw-rw-rw-. 1 root root 235, 7 Mar 15 02:33 84 crw-rw-rw-. 1 root root 235, 8 Mar 15 02:33 85 crw-rw-rw-. 1 root root 235, 9 Mar 15 02:33 86 crw-rw-rw-. 1 root root 235, 10 Mar 15 02:34 87 crw-rw-rw-. 1 root root 10, 196 Mar 15 02:18 vfio
From daos_server.yml, server 1 has:
bdev_class: nvme bdev_list: ["0000:1a:00.0", "0000:3b:00.0", "0000:3c:00.0", "0000:3d:00.0", "0000:3e:00.0"] and server 2 has: bdev_class: nvme bdev_list: ["0000:86:00.0", "0000:87:00.0", "0000:af:00.0", "0000:b0:00.0", "0000:b1:00.0", "0000:b2:00.0"]
From: <daos@daos.groups.io> on behalf of "Lombardi, Johann" <johann.lombardi@...>
Hm, then I don’t understand why it is trying to read /proc/self/pagemap (only accessible to root in recent kernels). Maybe Tom and Mike can comment.
Cheers, Johann
From:
<daos@daos.groups.io> on behalf of Kevan Rehm <kevan.rehm@...>
I am using vfio, and my IOMMU is enabled.
From:
<daos@daos.groups.io> on behalf of "Lombardi, Johann" <johann.lombardi@...>
Hi Kevan,
To run SPDK as a non-root user, you need to switch from UIO to VFIO. Tom and Mike have spent some time recently to verify that the DAOS server can be run as a regular user (except for the setuid root on the daos_admin utility) when VFIO is enabled. It requires VT-d to be enabled in the BIOS. Please check: https://daos-stack.github.io/admin/deployment/#enable-iommu-optional
Cheers, Johann
From:
<daos@daos.groups.io> on behalf of Kevan Rehm <kevan.rehm@...>
(DAOS-4342 will be closed, a more recent master contains the fix. For the DPDK problem here, I am running yesterday’s master.)
My DPDK problem is related to permissions. I am running daos_server as user ‘daos’, group ‘daos_grp’, with daos_admin and daos_server permissions set as documented:
# ls -l daos_admin daos_server -rwsr-x---. 1 root daos_grp 6188984 Mar 14 16:12 daos_admin -rwxr-sr-x. 1 root daos_grp 16345032 Mar 14 16:13 daos_server
If I start the daemon manually like this: ~/daos/install/bin/daos_server start
it fails every time, the page frame number read by rte_mem_virt2phy() is always zero, and alloc_seg() always fails. If I instead start the daemon with: sudo ~/daos/install/bin/daos_server start
then the page frame numbers read by rte_mem_virt2phy() are correctly non-zero, and the daos_io_server daemons start up.
Am I doing something incorrectly in my attempts to run the daos servers as non-root?
Thanks, Kevan
From:
<daos@daos.groups.io> on behalf of Kevan Rehm <kevan.rehm@...>
Greetings,
I have opened Jira DAOS-4342 for the problem where the value of MaxMessageSize is too small.
I still have no solution for the DPDK failure I am seeing in daos_server startup, but I have more background information on the problem. The underlying problem is that DPDK cannot convert virtual addresses to physical addresses.
Step 1: Early in startup, rte_pci_get_iommu_class() is called from rte_eal_init(), it sets iova_mode to RTE_IOVA_PA (DMA using physical addresses).
Step 2: rte_eal_hugepage_init() gets called, which eventually calls test_phys_addrs_available(). That routine sets phys_addrs_available = false, and reports:
delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:1 EAL: Cannot obtain physical addresses: Permission denied. Only vfio will function.
The “Permission denied” is a stale errno value, the actual test in the code that fails is inside routine rte_mem_virt2phy():
/* * the pfn (page frame number) are bits 0-54 (see * pagemap.txt in linux Documentation) */ if ((page & 0x7fffffffffffffULL) == 0) { return RTE_BAD_IOVA; }
The bottom 55 bits of the word that was read are all zeros. The actual value of the word is 0x8180000000000000.
Step 3: Routine rte_service_init() gets called. It eventually calls alloc_seg() which wants to convert a virtual address to a physical address, but phys_addrs_available is false, so it fails. It tries to allocate segments on both sockets but both attempts fail, as phys_addrs_available applies to both sockets.
delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: Trying to obtain current memory policy. EAL: Setting policy MPOL_PREFERRED for socket 0 delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: alloc_seg(): can't get IOVA addr delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: Ask a virtual area of 0x200000 bytes delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: Virtual area found at 0x200000200000 (size = 0x200000) delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: attempted to allocate 1 segments, but only 0 were allocated EAL: Restoring previous memory policy: 0 delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: Trying to obtain current memory policy. delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: Setting policy MPOL_PREFERRED for socket 1 delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: alloc_seg(): can't get IOVA addr delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: Ask a virtual area of 0x200000 bytes EAL: Virtual area found at 0x201000a00000 (size = 0x200000) delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: attempted to allocate 1 segments, but only 0 were allocated delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: Restoring previous memory policy: 0 delphi-004.us.cray.com ERROR 2020/03/14 08:37:09 daos_io_server:0 EAL: FATAL: rte_service_init() failed Failed to initialize DPDK delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 error allocating rte services array EAL: rte_service_init() failed
I think the heart of the problem is, when DPDK reads /proc/self/pagemap, why is it getting a value where the bottom 55 bits are all zero?
Kevan
From:
<daos@daos.groups.io> on behalf of Kevan Rehm <kevan.rehm@...>
One update, the discarded garbage messages occur because MaxMessageSize is set to 4096 which is smaller than the message which daos_admin is trying to return because there are 11 (eventually 12) NVMe devices on this node. I raised the value of MaxMessageSize to 8192 and the “discarded garbage message” problem went away, but the rest of the issues still remain.
Kevan
From:
<daos@daos.groups.io> on behalf of Kevan Rehm <kevan.rehm@...>
Greetings,
I’ve been debugging a daos_io_server startup problem for a couple of days now and have gotten nowhere, so it’s time to call in the experts. Two IO servers are configured per node, one has 5 NVMe devices and one has 6. They both give the following errors:
03/12-08:33:32.33 delphi-004 DAOS[141245/141245] server INFO src/iosrv/init.c:491 server_init() Network successfully initialized 03/12-08:33:32.33 delphi-004 DAOS[141245/141245] server INFO src/iosrv/init.c:500 server_init() Module vos,rdb,rsvc,security,mgmt,pool,cont,dtx,obj,rebuild successfully loaded 03/12-08:33:32.43 delphi-004 DAOS[141245/141287] bio INFO src/bio/bio_xstream.c:961 bio_xsctxt_alloc() Initialize NVMe context, tgt_id:0, init_thread:(nil) 03/12-08:33:32.71 delphi-004 DAOS[141245/141287] bio ERR src/bio/bio_xstream.c:1019 bio_xsctxt_alloc() failed to initialize SPDK env, DER_INVAL(-1003) 03/12-08:33:32.71 delphi-004 DAOS[141245/141287] server ERR src/iosrv/srv.c:452 dss_srv_handler() failed to init spdk context for xstream(2) rc:-1003
The failing function in bio_xsctxt_alloc() is spdk_env_init(), which just returns -1. Looking in /dev/hugepages, I see 40 2 MiB hugepage files owned by root, which I did not expect, because daos_server and the two daos_io_server processes are all running as user ‘daos’. I believe I have the file permissions set correctly:
[root@delphi-004 tmp]# cd ~daos/daos/install/bin [root@delphi-004 bin]# ls -l daos_admin daos_server -rwsr-x---. 1 root daos_grp 5751760 Mar 12 2020 daos_admin -rwxr-sr-x. 1 root daos_grp 16219920 Mar 12 2020 daos_server
The daos_admin process had errors also and exited, and daos_control.log mentions it is discarding garbage responses.
Am I doing something obviously wrong? Attached are the daos_control.log and daos_io_server.log files, together with the daos .yml files.
If you need more info, let me know.
Thanks, Kevan --------------------------------------------------------------------- This e-mail and any attachments may contain confidential material for --------------------------------------------------------------------- This e-mail and any attachments may contain confidential material for
|
|
Hello Kevan,
sincere apologies for not being able to reply sooner, I haven't been well and Mike has been on leave.
SPDK through VFIO (when running as non-root) broke with the version of SPDK (19.04) we use in our build when we moved from Centos 7.6->7.7 . The fix is to upgrade SPDK to 20.01, unfortunately there have been some API breakages between those versions and we have had to push for another release to properly bump .so versions to reflect the API changes. we are therefore waiting for 20.01.1 which is targeted for March 20th.
Think this is the issue you are seeing, this PR should enable you to run as non-root (I've been using it for a while): https://github.com/daos-stack/daos/pull/1902 Rebuild instructions: https://github.com/daos-stack/daos/pull/1902#issuecomment-595702110
https://jira.hpdd.intel.com/browse/DAOS-4164
Hope that helps
Regards, Tom Nabarro – DCG/ESAD M: +44 (0)7786 260986 Skype: tom.nabarro
From: daos@daos.groups.io <daos@daos.groups.io>
On Behalf Of Kevan Rehm
I should probably be more accurate and say that “I think I am using vfio”. 😊. Here are the things I have checked, if there are other things to check, let me know.
# cat /proc/cmdline BOOT_IMAGE=/vmlinuz-3.10.0-1062.12.1.el7.x86_64 root=/dev/mapper/cl_delphi--004-root ro spectre_v2=retpoline rd.lvm.lv=cl_delphi-004/root rd.lvm.lv=cl_delphi-004/swap rhgb quiet intel_iommu=on console=ttyS1,115200
# ls -l total 0 lrwxrwxrwx. 1 root root 0 Mar 15 01:38 dmar0 -> ../../devices/virtual/iommu/dmar0 lrwxrwxrwx. 1 root root 0 Mar 15 01:38 dmar1 -> ../../devices/virtual/iommu/dmar1 lrwxrwxrwx. 1 root root 0 Mar 15 01:38 dmar2 -> ../../devices/virtual/iommu/dmar2 lrwxrwxrwx. 1 root root 0 Mar 15 01:38 dmar3 -> ../../devices/virtual/iommu/dmar3 lrwxrwxrwx. 1 root root 0 Mar 15 01:38 dmar4 -> ../../devices/virtual/iommu/dmar4 lrwxrwxrwx. 1 root root 0 Mar 15 01:38 dmar5 -> ../../devices/virtual/iommu/dmar5 lrwxrwxrwx. 1 root root 0 Mar 15 01:38 dmar6 -> ../../devices/virtual/iommu/dmar6 lrwxrwxrwx. 1 root root 0 Mar 15 01:38 dmar7 -> ../../devices/virtual/iommu/dmar7 [root@delphi-004 iommu]# pwd /sys/class/iommu
From daos_control.log: delphi-004.us.cray.com INFO 2020/03/15 02:31:29 daos_io_server:1 EAL: Initializing vfio delphi-004.us.cray.com INFO 2020/03/15 02:31:29 daos_io_server:1 EAL: Probing VFIO support... delphi-004.us.cray.com INFO 2020/03/15 02:31:29 daos_io_server:1 EAL: IOMMU type 1 (Type 1) is supported EAL: IOMMU type 7 (sPAPR) is not supported delphi-004.us.cray.com INFO 2020/03/15 02:31:29 daos_io_server:1 EAL: IOMMU type 8 (No-IOMMU) is not supported EAL: VFIO support initialized
[root@delphi-004 vfio]# cd /dev/vfio [root@delphi-004 vfio]# ls -l total 0 crw-rw-rw-. 1 root root 235, 11 Mar 15 02:34 1 crw-rw-rw-. 1 root root 235, 0 Mar 15 02:33 29 crw-rw-rw-. 1 root root 235, 1 Mar 15 02:33 41 crw-rw-rw-. 1 root root 235, 2 Mar 15 02:33 42 crw-rw-rw-. 1 root root 235, 3 Mar 15 02:33 43 crw-rw-rw-. 1 root root 235, 4 Mar 15 02:33 44 crw-rw-rw-. 1 root root 235, 12 Mar 15 02:34 55 crw-rw-rw-. 1 root root 235, 5 Mar 15 02:33 71 crw-rw-rw-. 1 root root 235, 6 Mar 15 02:33 72 crw-rw-rw-. 1 root root 235, 7 Mar 15 02:33 84 crw-rw-rw-. 1 root root 235, 8 Mar 15 02:33 85 crw-rw-rw-. 1 root root 235, 9 Mar 15 02:33 86 crw-rw-rw-. 1 root root 235, 10 Mar 15 02:34 87 crw-rw-rw-. 1 root root 10, 196 Mar 15 02:18 vfio
From daos_server.yml, server 1 has:
bdev_class: nvme bdev_list: ["0000:1a:00.0", "0000:3b:00.0", "0000:3c:00.0", "0000:3d:00.0", "0000:3e:00.0"] and server 2 has: bdev_class: nvme bdev_list: ["0000:86:00.0", "0000:87:00.0", "0000:af:00.0", "0000:b0:00.0", "0000:b1:00.0", "0000:b2:00.0"]
From: <daos@daos.groups.io> on behalf of "Lombardi, Johann" <johann.lombardi@...>
Hm, then I don’t understand why it is trying to read /proc/self/pagemap (only accessible to root in recent kernels). Maybe Tom and Mike can comment.
Cheers, Johann
From:
<daos@daos.groups.io> on behalf of Kevan Rehm <kevan.rehm@...>
I am using vfio, and my IOMMU is enabled.
From:
<daos@daos.groups.io> on behalf of "Lombardi, Johann" <johann.lombardi@...>
Hi Kevan,
To run SPDK as a non-root user, you need to switch from UIO to VFIO. Tom and Mike have spent some time recently to verify that the DAOS server can be run as a regular user (except for the setuid root on the daos_admin utility) when VFIO is enabled. It requires VT-d to be enabled in the BIOS. Please check: https://daos-stack.github.io/admin/deployment/#enable-iommu-optional
Cheers, Johann
From:
<daos@daos.groups.io> on behalf of Kevan Rehm <kevan.rehm@...>
(DAOS-4342 will be closed, a more recent master contains the fix. For the DPDK problem here, I am running yesterday’s master.)
My DPDK problem is related to permissions. I am running daos_server as user ‘daos’, group ‘daos_grp’, with daos_admin and daos_server permissions set as documented:
# ls -l daos_admin daos_server -rwsr-x---. 1 root daos_grp 6188984 Mar 14 16:12 daos_admin -rwxr-sr-x. 1 root daos_grp 16345032 Mar 14 16:13 daos_server
If I start the daemon manually like this: ~/daos/install/bin/daos_server start
it fails every time, the page frame number read by rte_mem_virt2phy() is always zero, and alloc_seg() always fails. If I instead start the daemon with: sudo ~/daos/install/bin/daos_server start
then the page frame numbers read by rte_mem_virt2phy() are correctly non-zero, and the daos_io_server daemons start up.
Am I doing something incorrectly in my attempts to run the daos servers as non-root?
Thanks, Kevan
From:
<daos@daos.groups.io> on behalf of Kevan Rehm <kevan.rehm@...>
Greetings,
I have opened Jira DAOS-4342 for the problem where the value of MaxMessageSize is too small.
I still have no solution for the DPDK failure I am seeing in daos_server startup, but I have more background information on the problem. The underlying problem is that DPDK cannot convert virtual addresses to physical addresses.
Step 1: Early in startup, rte_pci_get_iommu_class() is called from rte_eal_init(), it sets iova_mode to RTE_IOVA_PA (DMA using physical addresses).
Step 2: rte_eal_hugepage_init() gets called, which eventually calls test_phys_addrs_available(). That routine sets phys_addrs_available = false, and reports:
delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:1 EAL: Cannot obtain physical addresses: Permission denied. Only vfio will function.
The “Permission denied” is a stale errno value, the actual test in the code that fails is inside routine rte_mem_virt2phy():
/* * the pfn (page frame number) are bits 0-54 (see * pagemap.txt in linux Documentation) */ if ((page & 0x7fffffffffffffULL) == 0) { return RTE_BAD_IOVA; }
The bottom 55 bits of the word that was read are all zeros. The actual value of the word is 0x8180000000000000.
Step 3: Routine rte_service_init() gets called. It eventually calls alloc_seg() which wants to convert a virtual address to a physical address, but phys_addrs_available is false, so it fails. It tries to allocate segments on both sockets but both attempts fail, as phys_addrs_available applies to both sockets.
delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: Trying to obtain current memory policy. EAL: Setting policy MPOL_PREFERRED for socket 0 delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: alloc_seg(): can't get IOVA addr delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: Ask a virtual area of 0x200000 bytes delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: Virtual area found at 0x200000200000 (size = 0x200000) delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: attempted to allocate 1 segments, but only 0 were allocated EAL: Restoring previous memory policy: 0 delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: Trying to obtain current memory policy. delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: Setting policy MPOL_PREFERRED for socket 1 delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: alloc_seg(): can't get IOVA addr delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: Ask a virtual area of 0x200000 bytes EAL: Virtual area found at 0x201000a00000 (size = 0x200000) delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: attempted to allocate 1 segments, but only 0 were allocated delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: Restoring previous memory policy: 0 delphi-004.us.cray.com ERROR 2020/03/14 08:37:09 daos_io_server:0 EAL: FATAL: rte_service_init() failed Failed to initialize DPDK delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 error allocating rte services array EAL: rte_service_init() failed
I think the heart of the problem is, when DPDK reads /proc/self/pagemap, why is it getting a value where the bottom 55 bits are all zero?
Kevan
From:
<daos@daos.groups.io> on behalf of Kevan Rehm <kevan.rehm@...>
One update, the discarded garbage messages occur because MaxMessageSize is set to 4096 which is smaller than the message which daos_admin is trying to return because there are 11 (eventually 12) NVMe devices on this node. I raised the value of MaxMessageSize to 8192 and the “discarded garbage message” problem went away, but the rest of the issues still remain.
Kevan
From:
<daos@daos.groups.io> on behalf of Kevan Rehm <kevan.rehm@...>
Greetings,
I’ve been debugging a daos_io_server startup problem for a couple of days now and have gotten nowhere, so it’s time to call in the experts. Two IO servers are configured per node, one has 5 NVMe devices and one has 6. They both give the following errors:
03/12-08:33:32.33 delphi-004 DAOS[141245/141245] server INFO src/iosrv/init.c:491 server_init() Network successfully initialized 03/12-08:33:32.33 delphi-004 DAOS[141245/141245] server INFO src/iosrv/init.c:500 server_init() Module vos,rdb,rsvc,security,mgmt,pool,cont,dtx,obj,rebuild successfully loaded 03/12-08:33:32.43 delphi-004 DAOS[141245/141287] bio INFO src/bio/bio_xstream.c:961 bio_xsctxt_alloc() Initialize NVMe context, tgt_id:0, init_thread:(nil) 03/12-08:33:32.71 delphi-004 DAOS[141245/141287] bio ERR src/bio/bio_xstream.c:1019 bio_xsctxt_alloc() failed to initialize SPDK env, DER_INVAL(-1003) 03/12-08:33:32.71 delphi-004 DAOS[141245/141287] server ERR src/iosrv/srv.c:452 dss_srv_handler() failed to init spdk context for xstream(2) rc:-1003
The failing function in bio_xsctxt_alloc() is spdk_env_init(), which just returns -1. Looking in /dev/hugepages, I see 40 2 MiB hugepage files owned by root, which I did not expect, because daos_server and the two daos_io_server processes are all running as user ‘daos’. I believe I have the file permissions set correctly:
[root@delphi-004 tmp]# cd ~daos/daos/install/bin [root@delphi-004 bin]# ls -l daos_admin daos_server -rwsr-x---. 1 root daos_grp 5751760 Mar 12 2020 daos_admin -rwxr-sr-x. 1 root daos_grp 16219920 Mar 12 2020 daos_server
The daos_admin process had errors also and exited, and daos_control.log mentions it is discarding garbage responses.
Am I doing something obviously wrong? Attached are the daos_control.log and daos_io_server.log files, together with the daos .yml files.
If you need more info, let me know.
Thanks, Kevan --------------------------------------------------------------------- This e-mail and any attachments may contain confidential material for --------------------------------------------------------------------- This e-mail and any attachments may contain confidential material for --------------------------------------------------------------------- This e-mail and any attachments may contain confidential material for
|
|
Kevan Rehm
Tom,
Sorry to hear that, being sick is no fun, hope you’re over the hump.
Yes, this is the problem I’ve been chasing for a couple of weeks, we are running centos 7.7. I learned more about DPDK internals than I planned to. 😊
Not sure how to make this happen, but it would be nice if breakages like this got communicated to the outside world when they happen so that we don’t lose cycles debugging things that have already been debugged. Maybe a web page of “current issues”? Or email messages here? You probably have better ideas. I did try reading through the Jira log daily for a while to watch for new issues, but that didn’t seem very productive, TMI that didn’t pertain.
I got some consolation from the fact that at least this time it wasn’t pilot error. 😊
Thanks, Kevan
P.S. We have talked from time to time about upgrading to centos 8. Would that be a bad idea?
From: <daos@daos.groups.io> on behalf of "Nabarro, Tom" <tom.nabarro@...>
Hello Kevan,
sincere apologies for not being able to reply sooner, I haven't been well and Mike has been on leave.
SPDK through VFIO (when running as non-root) broke with the version of SPDK (19.04) we use in our build when we moved from Centos 7.6->7.7 . The fix is to upgrade SPDK to 20.01, unfortunately there have been some API breakages between those versions and we have had to push for another release to properly bump .so versions to reflect the API changes. we are therefore waiting for 20.01.1 which is targeted for March 20th.
Think this is the issue you are seeing, this PR should enable you to run as non-root (I've been using it for a while): https://github.com/daos-stack/daos/pull/1902 Rebuild instructions: https://github.com/daos-stack/daos/pull/1902#issuecomment-595702110
https://jira.hpdd.intel.com/browse/DAOS-4164
Hope that helps
Regards, Tom Nabarro – DCG/ESAD M: +44 (0)7786 260986 Skype: tom.nabarro
From: daos@daos.groups.io <daos@daos.groups.io>
On Behalf Of Kevan Rehm
I should probably be more accurate and say that “I think I am using vfio”. 😊. Here are the things I have checked, if there are other things to check, let me know.
# cat /proc/cmdline BOOT_IMAGE=/vmlinuz-3.10.0-1062.12.1.el7.x86_64 root=/dev/mapper/cl_delphi--004-root ro spectre_v2=retpoline rd.lvm.lv=cl_delphi-004/root rd.lvm.lv=cl_delphi-004/swap rhgb quiet intel_iommu=on console=ttyS1,115200
# ls -l total 0 lrwxrwxrwx. 1 root root 0 Mar 15 01:38 dmar0 -> ../../devices/virtual/iommu/dmar0 lrwxrwxrwx. 1 root root 0 Mar 15 01:38 dmar1 -> ../../devices/virtual/iommu/dmar1 lrwxrwxrwx. 1 root root 0 Mar 15 01:38 dmar2 -> ../../devices/virtual/iommu/dmar2 lrwxrwxrwx. 1 root root 0 Mar 15 01:38 dmar3 -> ../../devices/virtual/iommu/dmar3 lrwxrwxrwx. 1 root root 0 Mar 15 01:38 dmar4 -> ../../devices/virtual/iommu/dmar4 lrwxrwxrwx. 1 root root 0 Mar 15 01:38 dmar5 -> ../../devices/virtual/iommu/dmar5 lrwxrwxrwx. 1 root root 0 Mar 15 01:38 dmar6 -> ../../devices/virtual/iommu/dmar6 lrwxrwxrwx. 1 root root 0 Mar 15 01:38 dmar7 -> ../../devices/virtual/iommu/dmar7 [root@delphi-004 iommu]# pwd /sys/class/iommu
From daos_control.log: delphi-004.us.cray.com INFO 2020/03/15 02:31:29 daos_io_server:1 EAL: Initializing vfio delphi-004.us.cray.com INFO 2020/03/15 02:31:29 daos_io_server:1 EAL: Probing VFIO support... delphi-004.us.cray.com INFO 2020/03/15 02:31:29 daos_io_server:1 EAL: IOMMU type 1 (Type 1) is supported EAL: IOMMU type 7 (sPAPR) is not supported delphi-004.us.cray.com INFO 2020/03/15 02:31:29 daos_io_server:1 EAL: IOMMU type 8 (No-IOMMU) is not supported EAL: VFIO support initialized
[root@delphi-004 vfio]# cd /dev/vfio [root@delphi-004 vfio]# ls -l total 0 crw-rw-rw-. 1 root root 235, 11 Mar 15 02:34 1 crw-rw-rw-. 1 root root 235, 0 Mar 15 02:33 29 crw-rw-rw-. 1 root root 235, 1 Mar 15 02:33 41 crw-rw-rw-. 1 root root 235, 2 Mar 15 02:33 42 crw-rw-rw-. 1 root root 235, 3 Mar 15 02:33 43 crw-rw-rw-. 1 root root 235, 4 Mar 15 02:33 44 crw-rw-rw-. 1 root root 235, 12 Mar 15 02:34 55 crw-rw-rw-. 1 root root 235, 5 Mar 15 02:33 71 crw-rw-rw-. 1 root root 235, 6 Mar 15 02:33 72 crw-rw-rw-. 1 root root 235, 7 Mar 15 02:33 84 crw-rw-rw-. 1 root root 235, 8 Mar 15 02:33 85 crw-rw-rw-. 1 root root 235, 9 Mar 15 02:33 86 crw-rw-rw-. 1 root root 235, 10 Mar 15 02:34 87 crw-rw-rw-. 1 root root 10, 196 Mar 15 02:18 vfio
From daos_server.yml, server 1 has:
bdev_class: nvme bdev_list: ["0000:1a:00.0", "0000:3b:00.0", "0000:3c:00.0", "0000:3d:00.0", "0000:3e:00.0"] and server 2 has: bdev_class: nvme bdev_list: ["0000:86:00.0", "0000:87:00.0", "0000:af:00.0", "0000:b0:00.0", "0000:b1:00.0", "0000:b2:00.0"]
From: <daos@daos.groups.io> on behalf of "Lombardi, Johann" <johann.lombardi@...>
Hm, then I don’t understand why it is trying to read /proc/self/pagemap (only accessible to root in recent kernels). Maybe Tom and Mike can comment.
Cheers, Johann
From:
<daos@daos.groups.io> on behalf of Kevan Rehm <kevan.rehm@...>
I am using vfio, and my IOMMU is enabled.
From:
<daos@daos.groups.io> on behalf of "Lombardi, Johann" <johann.lombardi@...>
Hi Kevan,
To run SPDK as a non-root user, you need to switch from UIO to VFIO. Tom and Mike have spent some time recently to verify that the DAOS server can be run as a regular user (except for the setuid root on the daos_admin utility) when VFIO is enabled. It requires VT-d to be enabled in the BIOS. Please check: https://daos-stack.github.io/admin/deployment/#enable-iommu-optional
Cheers, Johann
From:
<daos@daos.groups.io> on behalf of Kevan Rehm <kevan.rehm@...>
(DAOS-4342 will be closed, a more recent master contains the fix. For the DPDK problem here, I am running yesterday’s master.)
My DPDK problem is related to permissions. I am running daos_server as user ‘daos’, group ‘daos_grp’, with daos_admin and daos_server permissions set as documented:
# ls -l daos_admin daos_server -rwsr-x---. 1 root daos_grp 6188984 Mar 14 16:12 daos_admin -rwxr-sr-x. 1 root daos_grp 16345032 Mar 14 16:13 daos_server
If I start the daemon manually like this: ~/daos/install/bin/daos_server start
it fails every time, the page frame number read by rte_mem_virt2phy() is always zero, and alloc_seg() always fails. If I instead start the daemon with: sudo ~/daos/install/bin/daos_server start
then the page frame numbers read by rte_mem_virt2phy() are correctly non-zero, and the daos_io_server daemons start up.
Am I doing something incorrectly in my attempts to run the daos servers as non-root?
Thanks, Kevan
From:
<daos@daos.groups.io> on behalf of Kevan Rehm <kevan.rehm@...>
Greetings,
I have opened Jira DAOS-4342 for the problem where the value of MaxMessageSize is too small.
I still have no solution for the DPDK failure I am seeing in daos_server startup, but I have more background information on the problem. The underlying problem is that DPDK cannot convert virtual addresses to physical addresses.
Step 1: Early in startup, rte_pci_get_iommu_class() is called from rte_eal_init(), it sets iova_mode to RTE_IOVA_PA (DMA using physical addresses).
Step 2: rte_eal_hugepage_init() gets called, which eventually calls test_phys_addrs_available(). That routine sets phys_addrs_available = false, and reports:
delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:1 EAL: Cannot obtain physical addresses: Permission denied. Only vfio will function.
The “Permission denied” is a stale errno value, the actual test in the code that fails is inside routine rte_mem_virt2phy():
/* * the pfn (page frame number) are bits 0-54 (see * pagemap.txt in linux Documentation) */ if ((page & 0x7fffffffffffffULL) == 0) { return RTE_BAD_IOVA; }
The bottom 55 bits of the word that was read are all zeros. The actual value of the word is 0x8180000000000000.
Step 3: Routine rte_service_init() gets called. It eventually calls alloc_seg() which wants to convert a virtual address to a physical address, but phys_addrs_available is false, so it fails. It tries to allocate segments on both sockets but both attempts fail, as phys_addrs_available applies to both sockets.
delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: Trying to obtain current memory policy. EAL: Setting policy MPOL_PREFERRED for socket 0 delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: alloc_seg(): can't get IOVA addr delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: Ask a virtual area of 0x200000 bytes delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: Virtual area found at 0x200000200000 (size = 0x200000) delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: attempted to allocate 1 segments, but only 0 were allocated EAL: Restoring previous memory policy: 0 delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: Trying to obtain current memory policy. delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: Setting policy MPOL_PREFERRED for socket 1 delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: alloc_seg(): can't get IOVA addr delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: Ask a virtual area of 0x200000 bytes EAL: Virtual area found at 0x201000a00000 (size = 0x200000) delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: attempted to allocate 1 segments, but only 0 were allocated delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: Restoring previous memory policy: 0 delphi-004.us.cray.com ERROR 2020/03/14 08:37:09 daos_io_server:0 EAL: FATAL: rte_service_init() failed Failed to initialize DPDK delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 error allocating rte services array EAL: rte_service_init() failed
I think the heart of the problem is, when DPDK reads /proc/self/pagemap, why is it getting a value where the bottom 55 bits are all zero?
Kevan
From:
<daos@daos.groups.io> on behalf of Kevan Rehm <kevan.rehm@...>
One update, the discarded garbage messages occur because MaxMessageSize is set to 4096 which is smaller than the message which daos_admin is trying to return because there are 11 (eventually 12) NVMe devices on this node. I raised the value of MaxMessageSize to 8192 and the “discarded garbage message” problem went away, but the rest of the issues still remain.
Kevan
From:
<daos@daos.groups.io> on behalf of Kevan Rehm <kevan.rehm@...>
Greetings,
I’ve been debugging a daos_io_server startup problem for a couple of days now and have gotten nowhere, so it’s time to call in the experts. Two IO servers are configured per node, one has 5 NVMe devices and one has 6. They both give the following errors:
03/12-08:33:32.33 delphi-004 DAOS[141245/141245] server INFO src/iosrv/init.c:491 server_init() Network successfully initialized 03/12-08:33:32.33 delphi-004 DAOS[141245/141245] server INFO src/iosrv/init.c:500 server_init() Module vos,rdb,rsvc,security,mgmt,pool,cont,dtx,obj,rebuild successfully loaded 03/12-08:33:32.43 delphi-004 DAOS[141245/141287] bio INFO src/bio/bio_xstream.c:961 bio_xsctxt_alloc() Initialize NVMe context, tgt_id:0, init_thread:(nil) 03/12-08:33:32.71 delphi-004 DAOS[141245/141287] bio ERR src/bio/bio_xstream.c:1019 bio_xsctxt_alloc() failed to initialize SPDK env, DER_INVAL(-1003) 03/12-08:33:32.71 delphi-004 DAOS[141245/141287] server ERR src/iosrv/srv.c:452 dss_srv_handler() failed to init spdk context for xstream(2) rc:-1003
The failing function in bio_xsctxt_alloc() is spdk_env_init(), which just returns -1. Looking in /dev/hugepages, I see 40 2 MiB hugepage files owned by root, which I did not expect, because daos_server and the two daos_io_server processes are all running as user ‘daos’. I believe I have the file permissions set correctly:
[root@delphi-004 tmp]# cd ~daos/daos/install/bin [root@delphi-004 bin]# ls -l daos_admin daos_server -rwsr-x---. 1 root daos_grp 5751760 Mar 12 2020 daos_admin -rwxr-sr-x. 1 root daos_grp 16219920 Mar 12 2020 daos_server
The daos_admin process had errors also and exited, and daos_control.log mentions it is discarding garbage responses.
Am I doing something obviously wrong? Attached are the daos_control.log and daos_io_server.log files, together with the daos .yml files.
If you need more info, let me know.
Thanks, Kevan --------------------------------------------------------------------- This e-mail and any attachments may contain confidential material for --------------------------------------------------------------------- This e-mail and any attachments may contain confidential material for --------------------------------------------------------------------- This e-mail and any attachments may contain confidential material for
|
|
Apologies that this has taken some of your time, up until now the only real supported option for running DAOS with NVMe is to run as root (many months ago you could get away with running UIO+SPDK as non-root but a security fix in kernel precluded that). We haven't communicated VFIO+SPDK non-root as a supported configuration yet as far as I know.
Let me know how you get on with the build from that PR.
We are not using Centos 8 yet so if you did you would be doing some useful pathfinding :-)
Regards, Tom Nabarro – DCG/ESAD M: +44 (0)7786 260986 Skype: tom.nabarro
From: daos@daos.groups.io <daos@daos.groups.io>
On Behalf Of Kevan Rehm
Tom,
Sorry to hear that, being sick is no fun, hope you’re over the hump.
Yes, this is the problem I’ve been chasing for a couple of weeks, we are running centos 7.7. I learned more about DPDK internals than I planned to. 😊
Not sure how to make this happen, but it would be nice if breakages like this got communicated to the outside world when they happen so that we don’t lose cycles debugging things that have already been debugged. Maybe a web page of “current issues”? Or email messages here? You probably have better ideas. I did try reading through the Jira log daily for a while to watch for new issues, but that didn’t seem very productive, TMI that didn’t pertain.
I got some consolation from the fact that at least this time it wasn’t pilot error. 😊
Thanks, Kevan
P.S. We have talked from time to time about upgrading to centos 8. Would that be a bad idea?
From: <daos@daos.groups.io> on behalf of "Nabarro, Tom" <tom.nabarro@...>
Hello Kevan,
sincere apologies for not being able to reply sooner, I haven't been well and Mike has been on leave.
SPDK through VFIO (when running as non-root) broke with the version of SPDK (19.04) we use in our build when we moved from Centos 7.6->7.7 . The fix is to upgrade SPDK to 20.01, unfortunately there have been some API breakages between those versions and we have had to push for another release to properly bump .so versions to reflect the API changes. we are therefore waiting for 20.01.1 which is targeted for March 20th.
Think this is the issue you are seeing, this PR should enable you to run as non-root (I've been using it for a while): https://github.com/daos-stack/daos/pull/1902 Rebuild instructions: https://github.com/daos-stack/daos/pull/1902#issuecomment-595702110
https://jira.hpdd.intel.com/browse/DAOS-4164
Hope that helps
Regards, Tom Nabarro – DCG/ESAD M: +44 (0)7786 260986 Skype: tom.nabarro
From:
daos@daos.groups.io <daos@daos.groups.io>
On Behalf Of Kevan Rehm
I should probably be more accurate and say that “I think I am using vfio”. 😊. Here are the things I have checked, if there are other things to check, let me know.
# cat /proc/cmdline BOOT_IMAGE=/vmlinuz-3.10.0-1062.12.1.el7.x86_64 root=/dev/mapper/cl_delphi--004-root ro spectre_v2=retpoline rd.lvm.lv=cl_delphi-004/root rd.lvm.lv=cl_delphi-004/swap rhgb quiet intel_iommu=on console=ttyS1,115200
# ls -l total 0 lrwxrwxrwx. 1 root root 0 Mar 15 01:38 dmar0 -> ../../devices/virtual/iommu/dmar0 lrwxrwxrwx. 1 root root 0 Mar 15 01:38 dmar1 -> ../../devices/virtual/iommu/dmar1 lrwxrwxrwx. 1 root root 0 Mar 15 01:38 dmar2 -> ../../devices/virtual/iommu/dmar2 lrwxrwxrwx. 1 root root 0 Mar 15 01:38 dmar3 -> ../../devices/virtual/iommu/dmar3 lrwxrwxrwx. 1 root root 0 Mar 15 01:38 dmar4 -> ../../devices/virtual/iommu/dmar4 lrwxrwxrwx. 1 root root 0 Mar 15 01:38 dmar5 -> ../../devices/virtual/iommu/dmar5 lrwxrwxrwx. 1 root root 0 Mar 15 01:38 dmar6 -> ../../devices/virtual/iommu/dmar6 lrwxrwxrwx. 1 root root 0 Mar 15 01:38 dmar7 -> ../../devices/virtual/iommu/dmar7 [root@delphi-004 iommu]# pwd /sys/class/iommu
From daos_control.log: delphi-004.us.cray.com INFO 2020/03/15 02:31:29 daos_io_server:1 EAL: Initializing vfio delphi-004.us.cray.com INFO 2020/03/15 02:31:29 daos_io_server:1 EAL: Probing VFIO support... delphi-004.us.cray.com INFO 2020/03/15 02:31:29 daos_io_server:1 EAL: IOMMU type 1 (Type 1) is supported EAL: IOMMU type 7 (sPAPR) is not supported delphi-004.us.cray.com INFO 2020/03/15 02:31:29 daos_io_server:1 EAL: IOMMU type 8 (No-IOMMU) is not supported EAL: VFIO support initialized
[root@delphi-004 vfio]# cd /dev/vfio [root@delphi-004 vfio]# ls -l total 0 crw-rw-rw-. 1 root root 235, 11 Mar 15 02:34 1 crw-rw-rw-. 1 root root 235, 0 Mar 15 02:33 29 crw-rw-rw-. 1 root root 235, 1 Mar 15 02:33 41 crw-rw-rw-. 1 root root 235, 2 Mar 15 02:33 42 crw-rw-rw-. 1 root root 235, 3 Mar 15 02:33 43 crw-rw-rw-. 1 root root 235, 4 Mar 15 02:33 44 crw-rw-rw-. 1 root root 235, 12 Mar 15 02:34 55 crw-rw-rw-. 1 root root 235, 5 Mar 15 02:33 71 crw-rw-rw-. 1 root root 235, 6 Mar 15 02:33 72 crw-rw-rw-. 1 root root 235, 7 Mar 15 02:33 84 crw-rw-rw-. 1 root root 235, 8 Mar 15 02:33 85 crw-rw-rw-. 1 root root 235, 9 Mar 15 02:33 86 crw-rw-rw-. 1 root root 235, 10 Mar 15 02:34 87 crw-rw-rw-. 1 root root 10, 196 Mar 15 02:18 vfio
From daos_server.yml, server 1 has:
bdev_class: nvme bdev_list: ["0000:1a:00.0", "0000:3b:00.0", "0000:3c:00.0", "0000:3d:00.0", "0000:3e:00.0"] and server 2 has: bdev_class: nvme bdev_list: ["0000:86:00.0", "0000:87:00.0", "0000:af:00.0", "0000:b0:00.0", "0000:b1:00.0", "0000:b2:00.0"]
From: <daos@daos.groups.io> on behalf of "Lombardi, Johann" <johann.lombardi@...>
Hm, then I don’t understand why it is trying to read /proc/self/pagemap (only accessible to root in recent kernels). Maybe Tom and Mike can comment.
Cheers, Johann
From:
<daos@daos.groups.io> on behalf of Kevan Rehm <kevan.rehm@...>
I am using vfio, and my IOMMU is enabled.
From:
<daos@daos.groups.io> on behalf of "Lombardi, Johann" <johann.lombardi@...>
Hi Kevan,
To run SPDK as a non-root user, you need to switch from UIO to VFIO. Tom and Mike have spent some time recently to verify that the DAOS server can be run as a regular user (except for the setuid root on the daos_admin utility) when VFIO is enabled. It requires VT-d to be enabled in the BIOS. Please check: https://daos-stack.github.io/admin/deployment/#enable-iommu-optional
Cheers, Johann
From:
<daos@daos.groups.io> on behalf of Kevan Rehm <kevan.rehm@...>
(DAOS-4342 will be closed, a more recent master contains the fix. For the DPDK problem here, I am running yesterday’s master.)
My DPDK problem is related to permissions. I am running daos_server as user ‘daos’, group ‘daos_grp’, with daos_admin and daos_server permissions set as documented:
# ls -l daos_admin daos_server -rwsr-x---. 1 root daos_grp 6188984 Mar 14 16:12 daos_admin -rwxr-sr-x. 1 root daos_grp 16345032 Mar 14 16:13 daos_server
If I start the daemon manually like this: ~/daos/install/bin/daos_server start
it fails every time, the page frame number read by rte_mem_virt2phy() is always zero, and alloc_seg() always fails. If I instead start the daemon with: sudo ~/daos/install/bin/daos_server start
then the page frame numbers read by rte_mem_virt2phy() are correctly non-zero, and the daos_io_server daemons start up.
Am I doing something incorrectly in my attempts to run the daos servers as non-root?
Thanks, Kevan
From:
<daos@daos.groups.io> on behalf of Kevan Rehm <kevan.rehm@...>
Greetings,
I have opened Jira DAOS-4342 for the problem where the value of MaxMessageSize is too small.
I still have no solution for the DPDK failure I am seeing in daos_server startup, but I have more background information on the problem. The underlying problem is that DPDK cannot convert virtual addresses to physical addresses.
Step 1: Early in startup, rte_pci_get_iommu_class() is called from rte_eal_init(), it sets iova_mode to RTE_IOVA_PA (DMA using physical addresses).
Step 2: rte_eal_hugepage_init() gets called, which eventually calls test_phys_addrs_available(). That routine sets phys_addrs_available = false, and reports:
delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:1 EAL: Cannot obtain physical addresses: Permission denied. Only vfio will function.
The “Permission denied” is a stale errno value, the actual test in the code that fails is inside routine rte_mem_virt2phy():
/* * the pfn (page frame number) are bits 0-54 (see * pagemap.txt in linux Documentation) */ if ((page & 0x7fffffffffffffULL) == 0) { return RTE_BAD_IOVA; }
The bottom 55 bits of the word that was read are all zeros. The actual value of the word is 0x8180000000000000.
Step 3: Routine rte_service_init() gets called. It eventually calls alloc_seg() which wants to convert a virtual address to a physical address, but phys_addrs_available is false, so it fails. It tries to allocate segments on both sockets but both attempts fail, as phys_addrs_available applies to both sockets.
delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: Trying to obtain current memory policy. EAL: Setting policy MPOL_PREFERRED for socket 0 delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: alloc_seg(): can't get IOVA addr delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: Ask a virtual area of 0x200000 bytes delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: Virtual area found at 0x200000200000 (size = 0x200000) delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: attempted to allocate 1 segments, but only 0 were allocated EAL: Restoring previous memory policy: 0 delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: Trying to obtain current memory policy. delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: Setting policy MPOL_PREFERRED for socket 1 delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: alloc_seg(): can't get IOVA addr delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: Ask a virtual area of 0x200000 bytes EAL: Virtual area found at 0x201000a00000 (size = 0x200000) delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: attempted to allocate 1 segments, but only 0 were allocated delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: Restoring previous memory policy: 0 delphi-004.us.cray.com ERROR 2020/03/14 08:37:09 daos_io_server:0 EAL: FATAL: rte_service_init() failed Failed to initialize DPDK delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 error allocating rte services array EAL: rte_service_init() failed
I think the heart of the problem is, when DPDK reads /proc/self/pagemap, why is it getting a value where the bottom 55 bits are all zero?
Kevan
From:
<daos@daos.groups.io> on behalf of Kevan Rehm <kevan.rehm@...>
One update, the discarded garbage messages occur because MaxMessageSize is set to 4096 which is smaller than the message which daos_admin is trying to return because there are 11 (eventually 12) NVMe devices on this node. I raised the value of MaxMessageSize to 8192 and the “discarded garbage message” problem went away, but the rest of the issues still remain.
Kevan
From:
<daos@daos.groups.io> on behalf of Kevan Rehm <kevan.rehm@...>
Greetings,
I’ve been debugging a daos_io_server startup problem for a couple of days now and have gotten nowhere, so it’s time to call in the experts. Two IO servers are configured per node, one has 5 NVMe devices and one has 6. They both give the following errors:
03/12-08:33:32.33 delphi-004 DAOS[141245/141245] server INFO src/iosrv/init.c:491 server_init() Network successfully initialized 03/12-08:33:32.33 delphi-004 DAOS[141245/141245] server INFO src/iosrv/init.c:500 server_init() Module vos,rdb,rsvc,security,mgmt,pool,cont,dtx,obj,rebuild successfully loaded 03/12-08:33:32.43 delphi-004 DAOS[141245/141287] bio INFO src/bio/bio_xstream.c:961 bio_xsctxt_alloc() Initialize NVMe context, tgt_id:0, init_thread:(nil) 03/12-08:33:32.71 delphi-004 DAOS[141245/141287] bio ERR src/bio/bio_xstream.c:1019 bio_xsctxt_alloc() failed to initialize SPDK env, DER_INVAL(-1003) 03/12-08:33:32.71 delphi-004 DAOS[141245/141287] server ERR src/iosrv/srv.c:452 dss_srv_handler() failed to init spdk context for xstream(2) rc:-1003
The failing function in bio_xsctxt_alloc() is spdk_env_init(), which just returns -1. Looking in /dev/hugepages, I see 40 2 MiB hugepage files owned by root, which I did not expect, because daos_server and the two daos_io_server processes are all running as user ‘daos’. I believe I have the file permissions set correctly:
[root@delphi-004 tmp]# cd ~daos/daos/install/bin [root@delphi-004 bin]# ls -l daos_admin daos_server -rwsr-x---. 1 root daos_grp 5751760 Mar 12 2020 daos_admin -rwxr-sr-x. 1 root daos_grp 16219920 Mar 12 2020 daos_server
The daos_admin process had errors also and exited, and daos_control.log mentions it is discarding garbage responses.
Am I doing something obviously wrong? Attached are the daos_control.log and daos_io_server.log files, together with the daos .yml files.
If you need more info, let me know.
Thanks, Kevan --------------------------------------------------------------------- This e-mail and any attachments may contain confidential material for --------------------------------------------------------------------- This e-mail and any attachments may contain confidential material for --------------------------------------------------------------------- This e-mail and any attachments may contain confidential material for --------------------------------------------------------------------- This e-mail and any attachments may contain confidential material for
|
|
Kevan Rehm
Well, the daos admin guide describes how to run daos as non-root user, and how to install and configure daos_server and daos_admin for that use case. I have been following the documentation. I figured if it is documented, then it is supported.
Originally I followed the email that came out about a month or two ago on how to run daos as non-root, it described how to also do the daos_admin config. I took that to mean non-root was supported. Oddly, looking again just now at the admin guide, I see that the documentation is different again than it was a while ago, now there are a bunch of symlinks to set up. When did that change occur?
I think I am going to punt and run the daemon as root. And I am definitely not going to touch centos 8…
Kevan
From: <daos@daos.groups.io> on behalf of "Nabarro, Tom" <tom.nabarro@...>
Apologies that this has taken some of your time, up until now the only real supported option for running DAOS with NVMe is to run as root (many months ago you could get away with running UIO+SPDK as non-root but a security fix in kernel precluded that). We haven't communicated VFIO+SPDK non-root as a supported configuration yet as far as I know.
Let me know how you get on with the build from that PR.
We are not using Centos 8 yet so if you did you would be doing some useful pathfinding :-)
Regards, Tom Nabarro – DCG/ESAD M: +44 (0)7786 260986 Skype: tom.nabarro
From: daos@daos.groups.io <daos@daos.groups.io>
On Behalf Of Kevan Rehm
Tom,
Sorry to hear that, being sick is no fun, hope you’re over the hump.
Yes, this is the problem I’ve been chasing for a couple of weeks, we are running centos 7.7. I learned more about DPDK internals than I planned to. 😊
Not sure how to make this happen, but it would be nice if breakages like this got communicated to the outside world when they happen so that we don’t lose cycles debugging things that have already been debugged. Maybe a web page of “current issues”? Or email messages here? You probably have better ideas. I did try reading through the Jira log daily for a while to watch for new issues, but that didn’t seem very productive, TMI that didn’t pertain.
I got some consolation from the fact that at least this time it wasn’t pilot error. 😊
Thanks, Kevan
P.S. We have talked from time to time about upgrading to centos 8. Would that be a bad idea?
From: <daos@daos.groups.io> on behalf of "Nabarro, Tom" <tom.nabarro@...>
Hello Kevan,
sincere apologies for not being able to reply sooner, I haven't been well and Mike has been on leave.
SPDK through VFIO (when running as non-root) broke with the version of SPDK (19.04) we use in our build when we moved from Centos 7.6->7.7 . The fix is to upgrade SPDK to 20.01, unfortunately there have been some API breakages between those versions and we have had to push for another release to properly bump .so versions to reflect the API changes. we are therefore waiting for 20.01.1 which is targeted for March 20th.
Think this is the issue you are seeing, this PR should enable you to run as non-root (I've been using it for a while): https://github.com/daos-stack/daos/pull/1902 Rebuild instructions: https://github.com/daos-stack/daos/pull/1902#issuecomment-595702110
https://jira.hpdd.intel.com/browse/DAOS-4164
Hope that helps
Regards, Tom Nabarro – DCG/ESAD M: +44 (0)7786 260986 Skype: tom.nabarro
From:
daos@daos.groups.io <daos@daos.groups.io>
On Behalf Of Kevan Rehm
I should probably be more accurate and say that “I think I am using vfio”. 😊. Here are the things I have checked, if there are other things to check, let me know.
# cat /proc/cmdline BOOT_IMAGE=/vmlinuz-3.10.0-1062.12.1.el7.x86_64 root=/dev/mapper/cl_delphi--004-root ro spectre_v2=retpoline rd.lvm.lv=cl_delphi-004/root rd.lvm.lv=cl_delphi-004/swap rhgb quiet intel_iommu=on console=ttyS1,115200
# ls -l total 0 lrwxrwxrwx. 1 root root 0 Mar 15 01:38 dmar0 -> ../../devices/virtual/iommu/dmar0 lrwxrwxrwx. 1 root root 0 Mar 15 01:38 dmar1 -> ../../devices/virtual/iommu/dmar1 lrwxrwxrwx. 1 root root 0 Mar 15 01:38 dmar2 -> ../../devices/virtual/iommu/dmar2 lrwxrwxrwx. 1 root root 0 Mar 15 01:38 dmar3 -> ../../devices/virtual/iommu/dmar3 lrwxrwxrwx. 1 root root 0 Mar 15 01:38 dmar4 -> ../../devices/virtual/iommu/dmar4 lrwxrwxrwx. 1 root root 0 Mar 15 01:38 dmar5 -> ../../devices/virtual/iommu/dmar5 lrwxrwxrwx. 1 root root 0 Mar 15 01:38 dmar6 -> ../../devices/virtual/iommu/dmar6 lrwxrwxrwx. 1 root root 0 Mar 15 01:38 dmar7 -> ../../devices/virtual/iommu/dmar7 [root@delphi-004 iommu]# pwd /sys/class/iommu
From daos_control.log: delphi-004.us.cray.com INFO 2020/03/15 02:31:29 daos_io_server:1 EAL: Initializing vfio delphi-004.us.cray.com INFO 2020/03/15 02:31:29 daos_io_server:1 EAL: Probing VFIO support... delphi-004.us.cray.com INFO 2020/03/15 02:31:29 daos_io_server:1 EAL: IOMMU type 1 (Type 1) is supported EAL: IOMMU type 7 (sPAPR) is not supported delphi-004.us.cray.com INFO 2020/03/15 02:31:29 daos_io_server:1 EAL: IOMMU type 8 (No-IOMMU) is not supported EAL: VFIO support initialized
[root@delphi-004 vfio]# cd /dev/vfio [root@delphi-004 vfio]# ls -l total 0 crw-rw-rw-. 1 root root 235, 11 Mar 15 02:34 1 crw-rw-rw-. 1 root root 235, 0 Mar 15 02:33 29 crw-rw-rw-. 1 root root 235, 1 Mar 15 02:33 41 crw-rw-rw-. 1 root root 235, 2 Mar 15 02:33 42 crw-rw-rw-. 1 root root 235, 3 Mar 15 02:33 43 crw-rw-rw-. 1 root root 235, 4 Mar 15 02:33 44 crw-rw-rw-. 1 root root 235, 12 Mar 15 02:34 55 crw-rw-rw-. 1 root root 235, 5 Mar 15 02:33 71 crw-rw-rw-. 1 root root 235, 6 Mar 15 02:33 72 crw-rw-rw-. 1 root root 235, 7 Mar 15 02:33 84 crw-rw-rw-. 1 root root 235, 8 Mar 15 02:33 85 crw-rw-rw-. 1 root root 235, 9 Mar 15 02:33 86 crw-rw-rw-. 1 root root 235, 10 Mar 15 02:34 87 crw-rw-rw-. 1 root root 10, 196 Mar 15 02:18 vfio
From daos_server.yml, server 1 has:
bdev_class: nvme bdev_list: ["0000:1a:00.0", "0000:3b:00.0", "0000:3c:00.0", "0000:3d:00.0", "0000:3e:00.0"] and server 2 has: bdev_class: nvme bdev_list: ["0000:86:00.0", "0000:87:00.0", "0000:af:00.0", "0000:b0:00.0", "0000:b1:00.0", "0000:b2:00.0"]
From: <daos@daos.groups.io> on behalf of "Lombardi, Johann" <johann.lombardi@...>
Hm, then I don’t understand why it is trying to read /proc/self/pagemap (only accessible to root in recent kernels). Maybe Tom and Mike can comment.
Cheers, Johann
From:
<daos@daos.groups.io> on behalf of Kevan Rehm <kevan.rehm@...>
I am using vfio, and my IOMMU is enabled.
From:
<daos@daos.groups.io> on behalf of "Lombardi, Johann" <johann.lombardi@...>
Hi Kevan,
To run SPDK as a non-root user, you need to switch from UIO to VFIO. Tom and Mike have spent some time recently to verify that the DAOS server can be run as a regular user (except for the setuid root on the daos_admin utility) when VFIO is enabled. It requires VT-d to be enabled in the BIOS. Please check: https://daos-stack.github.io/admin/deployment/#enable-iommu-optional
Cheers, Johann
From:
<daos@daos.groups.io> on behalf of Kevan Rehm <kevan.rehm@...>
(DAOS-4342 will be closed, a more recent master contains the fix. For the DPDK problem here, I am running yesterday’s master.)
My DPDK problem is related to permissions. I am running daos_server as user ‘daos’, group ‘daos_grp’, with daos_admin and daos_server permissions set as documented:
# ls -l daos_admin daos_server -rwsr-x---. 1 root daos_grp 6188984 Mar 14 16:12 daos_admin -rwxr-sr-x. 1 root daos_grp 16345032 Mar 14 16:13 daos_server
If I start the daemon manually like this: ~/daos/install/bin/daos_server start
it fails every time, the page frame number read by rte_mem_virt2phy() is always zero, and alloc_seg() always fails. If I instead start the daemon with: sudo ~/daos/install/bin/daos_server start
then the page frame numbers read by rte_mem_virt2phy() are correctly non-zero, and the daos_io_server daemons start up.
Am I doing something incorrectly in my attempts to run the daos servers as non-root?
Thanks, Kevan
From:
<daos@daos.groups.io> on behalf of Kevan Rehm <kevan.rehm@...>
Greetings,
I have opened Jira DAOS-4342 for the problem where the value of MaxMessageSize is too small.
I still have no solution for the DPDK failure I am seeing in daos_server startup, but I have more background information on the problem. The underlying problem is that DPDK cannot convert virtual addresses to physical addresses.
Step 1: Early in startup, rte_pci_get_iommu_class() is called from rte_eal_init(), it sets iova_mode to RTE_IOVA_PA (DMA using physical addresses).
Step 2: rte_eal_hugepage_init() gets called, which eventually calls test_phys_addrs_available(). That routine sets phys_addrs_available = false, and reports:
delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:1 EAL: Cannot obtain physical addresses: Permission denied. Only vfio will function.
The “Permission denied” is a stale errno value, the actual test in the code that fails is inside routine rte_mem_virt2phy():
/* * the pfn (page frame number) are bits 0-54 (see * pagemap.txt in linux Documentation) */ if ((page & 0x7fffffffffffffULL) == 0) { return RTE_BAD_IOVA; }
The bottom 55 bits of the word that was read are all zeros. The actual value of the word is 0x8180000000000000.
Step 3: Routine rte_service_init() gets called. It eventually calls alloc_seg() which wants to convert a virtual address to a physical address, but phys_addrs_available is false, so it fails. It tries to allocate segments on both sockets but both attempts fail, as phys_addrs_available applies to both sockets.
delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: Trying to obtain current memory policy. EAL: Setting policy MPOL_PREFERRED for socket 0 delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: alloc_seg(): can't get IOVA addr delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: Ask a virtual area of 0x200000 bytes delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: Virtual area found at 0x200000200000 (size = 0x200000) delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: attempted to allocate 1 segments, but only 0 were allocated EAL: Restoring previous memory policy: 0 delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: Trying to obtain current memory policy. delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: Setting policy MPOL_PREFERRED for socket 1 delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: alloc_seg(): can't get IOVA addr delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: Ask a virtual area of 0x200000 bytes EAL: Virtual area found at 0x201000a00000 (size = 0x200000) delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: attempted to allocate 1 segments, but only 0 were allocated delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: Restoring previous memory policy: 0 delphi-004.us.cray.com ERROR 2020/03/14 08:37:09 daos_io_server:0 EAL: FATAL: rte_service_init() failed Failed to initialize DPDK delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 error allocating rte services array EAL: rte_service_init() failed
I think the heart of the problem is, when DPDK reads /proc/self/pagemap, why is it getting a value where the bottom 55 bits are all zero?
Kevan
From:
<daos@daos.groups.io> on behalf of Kevan Rehm <kevan.rehm@...>
One update, the discarded garbage messages occur because MaxMessageSize is set to 4096 which is smaller than the message which daos_admin is trying to return because there are 11 (eventually 12) NVMe devices on this node. I raised the value of MaxMessageSize to 8192 and the “discarded garbage message” problem went away, but the rest of the issues still remain.
Kevan
From:
<daos@daos.groups.io> on behalf of Kevan Rehm <kevan.rehm@...>
Greetings,
I’ve been debugging a daos_io_server startup problem for a couple of days now and have gotten nowhere, so it’s time to call in the experts. Two IO servers are configured per node, one has 5 NVMe devices and one has 6. They both give the following errors:
03/12-08:33:32.33 delphi-004 DAOS[141245/141245] server INFO src/iosrv/init.c:491 server_init() Network successfully initialized 03/12-08:33:32.33 delphi-004 DAOS[141245/141245] server INFO src/iosrv/init.c:500 server_init() Module vos,rdb,rsvc,security,mgmt,pool,cont,dtx,obj,rebuild successfully loaded 03/12-08:33:32.43 delphi-004 DAOS[141245/141287] bio INFO src/bio/bio_xstream.c:961 bio_xsctxt_alloc() Initialize NVMe context, tgt_id:0, init_thread:(nil) 03/12-08:33:32.71 delphi-004 DAOS[141245/141287] bio ERR src/bio/bio_xstream.c:1019 bio_xsctxt_alloc() failed to initialize SPDK env, DER_INVAL(-1003) 03/12-08:33:32.71 delphi-004 DAOS[141245/141287] server ERR src/iosrv/srv.c:452 dss_srv_handler() failed to init spdk context for xstream(2) rc:-1003
The failing function in bio_xsctxt_alloc() is spdk_env_init(), which just returns -1. Looking in /dev/hugepages, I see 40 2 MiB hugepage files owned by root, which I did not expect, because daos_server and the two daos_io_server processes are all running as user ‘daos’. I believe I have the file permissions set correctly:
[root@delphi-004 tmp]# cd ~daos/daos/install/bin [root@delphi-004 bin]# ls -l daos_admin daos_server -rwsr-x---. 1 root daos_grp 5751760 Mar 12 2020 daos_admin -rwxr-sr-x. 1 root daos_grp 16219920 Mar 12 2020 daos_server
The daos_admin process had errors also and exited, and daos_control.log mentions it is discarding garbage responses.
Am I doing something obviously wrong? Attached are the daos_control.log and daos_io_server.log files, together with the daos .yml files.
If you need more info, let me know.
Thanks, Kevan --------------------------------------------------------------------- This e-mail and any attachments may contain confidential material for --------------------------------------------------------------------- This e-mail and any attachments may contain confidential material for --------------------------------------------------------------------- This e-mail and any attachments may contain confidential material for --------------------------------------------------------------------- This e-mail and any attachments may contain confidential material for
|
|
Non-root supported for SCM only currently. Apologies for any inconsistency with that communication.
Why don’t you try building with the PR I suggested, that should get landed before too long.
Regards, Tom Nabarro – DCG/ESAD M: +44 (0)7786 260986 Skype: tom.nabarro
From: daos@daos.groups.io <daos@daos.groups.io>
On Behalf Of Kevan Rehm
Well, the daos admin guide describes how to run daos as non-root user, and how to install and configure daos_server and daos_admin for that use case. I have been following the documentation. I figured if it is documented, then it is supported.
Originally I followed the email that came out about a month or two ago on how to run daos as non-root, it described how to also do the daos_admin config. I took that to mean non-root was supported. Oddly, looking again just now at the admin guide, I see that the documentation is different again than it was a while ago, now there are a bunch of symlinks to set up. When did that change occur?
I think I am going to punt and run the daemon as root. And I am definitely not going to touch centos 8…
Kevan
From: <daos@daos.groups.io> on behalf of "Nabarro, Tom" <tom.nabarro@...>
Apologies that this has taken some of your time, up until now the only real supported option for running DAOS with NVMe is to run as root (many months ago you could get away with running UIO+SPDK as non-root but a security fix in kernel precluded that). We haven't communicated VFIO+SPDK non-root as a supported configuration yet as far as I know.
Let me know how you get on with the build from that PR.
We are not using Centos 8 yet so if you did you would be doing some useful pathfinding :-)
Regards, Tom Nabarro – DCG/ESAD M: +44 (0)7786 260986 Skype: tom.nabarro
From:
daos@daos.groups.io <daos@daos.groups.io>
On Behalf Of Kevan Rehm
Tom,
Sorry to hear that, being sick is no fun, hope you’re over the hump.
Yes, this is the problem I’ve been chasing for a couple of weeks, we are running centos 7.7. I learned more about DPDK internals than I planned to. 😊
Not sure how to make this happen, but it would be nice if breakages like this got communicated to the outside world when they happen so that we don’t lose cycles debugging things that have already been debugged. Maybe a web page of “current issues”? Or email messages here? You probably have better ideas. I did try reading through the Jira log daily for a while to watch for new issues, but that didn’t seem very productive, TMI that didn’t pertain.
I got some consolation from the fact that at least this time it wasn’t pilot error. 😊
Thanks, Kevan
P.S. We have talked from time to time about upgrading to centos 8. Would that be a bad idea?
From: <daos@daos.groups.io> on behalf of "Nabarro, Tom" <tom.nabarro@...>
Hello Kevan,
sincere apologies for not being able to reply sooner, I haven't been well and Mike has been on leave.
SPDK through VFIO (when running as non-root) broke with the version of SPDK (19.04) we use in our build when we moved from Centos 7.6->7.7 . The fix is to upgrade SPDK to 20.01, unfortunately there have been some API breakages between those versions and we have had to push for another release to properly bump .so versions to reflect the API changes. we are therefore waiting for 20.01.1 which is targeted for March 20th.
Think this is the issue you are seeing, this PR should enable you to run as non-root (I've been using it for a while): https://github.com/daos-stack/daos/pull/1902 Rebuild instructions: https://github.com/daos-stack/daos/pull/1902#issuecomment-595702110
https://jira.hpdd.intel.com/browse/DAOS-4164
Hope that helps
Regards, Tom Nabarro – DCG/ESAD M: +44 (0)7786 260986 Skype: tom.nabarro
From:
daos@daos.groups.io <daos@daos.groups.io>
On Behalf Of Kevan Rehm
I should probably be more accurate and say that “I think I am using vfio”. 😊. Here are the things I have checked, if there are other things to check, let me know.
# cat /proc/cmdline BOOT_IMAGE=/vmlinuz-3.10.0-1062.12.1.el7.x86_64 root=/dev/mapper/cl_delphi--004-root ro spectre_v2=retpoline rd.lvm.lv=cl_delphi-004/root rd.lvm.lv=cl_delphi-004/swap rhgb quiet intel_iommu=on console=ttyS1,115200
# ls -l total 0 lrwxrwxrwx. 1 root root 0 Mar 15 01:38 dmar0 -> ../../devices/virtual/iommu/dmar0 lrwxrwxrwx. 1 root root 0 Mar 15 01:38 dmar1 -> ../../devices/virtual/iommu/dmar1 lrwxrwxrwx. 1 root root 0 Mar 15 01:38 dmar2 -> ../../devices/virtual/iommu/dmar2 lrwxrwxrwx. 1 root root 0 Mar 15 01:38 dmar3 -> ../../devices/virtual/iommu/dmar3 lrwxrwxrwx. 1 root root 0 Mar 15 01:38 dmar4 -> ../../devices/virtual/iommu/dmar4 lrwxrwxrwx. 1 root root 0 Mar 15 01:38 dmar5 -> ../../devices/virtual/iommu/dmar5 lrwxrwxrwx. 1 root root 0 Mar 15 01:38 dmar6 -> ../../devices/virtual/iommu/dmar6 lrwxrwxrwx. 1 root root 0 Mar 15 01:38 dmar7 -> ../../devices/virtual/iommu/dmar7 [root@delphi-004 iommu]# pwd /sys/class/iommu
From daos_control.log: delphi-004.us.cray.com INFO 2020/03/15 02:31:29 daos_io_server:1 EAL: Initializing vfio delphi-004.us.cray.com INFO 2020/03/15 02:31:29 daos_io_server:1 EAL: Probing VFIO support... delphi-004.us.cray.com INFO 2020/03/15 02:31:29 daos_io_server:1 EAL: IOMMU type 1 (Type 1) is supported EAL: IOMMU type 7 (sPAPR) is not supported delphi-004.us.cray.com INFO 2020/03/15 02:31:29 daos_io_server:1 EAL: IOMMU type 8 (No-IOMMU) is not supported EAL: VFIO support initialized
[root@delphi-004 vfio]# cd /dev/vfio [root@delphi-004 vfio]# ls -l total 0 crw-rw-rw-. 1 root root 235, 11 Mar 15 02:34 1 crw-rw-rw-. 1 root root 235, 0 Mar 15 02:33 29 crw-rw-rw-. 1 root root 235, 1 Mar 15 02:33 41 crw-rw-rw-. 1 root root 235, 2 Mar 15 02:33 42 crw-rw-rw-. 1 root root 235, 3 Mar 15 02:33 43 crw-rw-rw-. 1 root root 235, 4 Mar 15 02:33 44 crw-rw-rw-. 1 root root 235, 12 Mar 15 02:34 55 crw-rw-rw-. 1 root root 235, 5 Mar 15 02:33 71 crw-rw-rw-. 1 root root 235, 6 Mar 15 02:33 72 crw-rw-rw-. 1 root root 235, 7 Mar 15 02:33 84 crw-rw-rw-. 1 root root 235, 8 Mar 15 02:33 85 crw-rw-rw-. 1 root root 235, 9 Mar 15 02:33 86 crw-rw-rw-. 1 root root 235, 10 Mar 15 02:34 87 crw-rw-rw-. 1 root root 10, 196 Mar 15 02:18 vfio
From daos_server.yml, server 1 has:
bdev_class: nvme bdev_list: ["0000:1a:00.0", "0000:3b:00.0", "0000:3c:00.0", "0000:3d:00.0", "0000:3e:00.0"] and server 2 has: bdev_class: nvme bdev_list: ["0000:86:00.0", "0000:87:00.0", "0000:af:00.0", "0000:b0:00.0", "0000:b1:00.0", "0000:b2:00.0"]
From: <daos@daos.groups.io> on behalf of "Lombardi, Johann" <johann.lombardi@...>
Hm, then I don’t understand why it is trying to read /proc/self/pagemap (only accessible to root in recent kernels). Maybe Tom and Mike can comment.
Cheers, Johann
From:
<daos@daos.groups.io> on behalf of Kevan Rehm <kevan.rehm@...>
I am using vfio, and my IOMMU is enabled.
From:
<daos@daos.groups.io> on behalf of "Lombardi, Johann" <johann.lombardi@...>
Hi Kevan,
To run SPDK as a non-root user, you need to switch from UIO to VFIO. Tom and Mike have spent some time recently to verify that the DAOS server can be run as a regular user (except for the setuid root on the daos_admin utility) when VFIO is enabled. It requires VT-d to be enabled in the BIOS. Please check: https://daos-stack.github.io/admin/deployment/#enable-iommu-optional
Cheers, Johann
From:
<daos@daos.groups.io> on behalf of Kevan Rehm <kevan.rehm@...>
(DAOS-4342 will be closed, a more recent master contains the fix. For the DPDK problem here, I am running yesterday’s master.)
My DPDK problem is related to permissions. I am running daos_server as user ‘daos’, group ‘daos_grp’, with daos_admin and daos_server permissions set as documented:
# ls -l daos_admin daos_server -rwsr-x---. 1 root daos_grp 6188984 Mar 14 16:12 daos_admin -rwxr-sr-x. 1 root daos_grp 16345032 Mar 14 16:13 daos_server
If I start the daemon manually like this: ~/daos/install/bin/daos_server start
it fails every time, the page frame number read by rte_mem_virt2phy() is always zero, and alloc_seg() always fails. If I instead start the daemon with: sudo ~/daos/install/bin/daos_server start
then the page frame numbers read by rte_mem_virt2phy() are correctly non-zero, and the daos_io_server daemons start up.
Am I doing something incorrectly in my attempts to run the daos servers as non-root?
Thanks, Kevan
From:
<daos@daos.groups.io> on behalf of Kevan Rehm <kevan.rehm@...>
Greetings,
I have opened Jira DAOS-4342 for the problem where the value of MaxMessageSize is too small.
I still have no solution for the DPDK failure I am seeing in daos_server startup, but I have more background information on the problem. The underlying problem is that DPDK cannot convert virtual addresses to physical addresses.
Step 1: Early in startup, rte_pci_get_iommu_class() is called from rte_eal_init(), it sets iova_mode to RTE_IOVA_PA (DMA using physical addresses).
Step 2: rte_eal_hugepage_init() gets called, which eventually calls test_phys_addrs_available(). That routine sets phys_addrs_available = false, and reports:
delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:1 EAL: Cannot obtain physical addresses: Permission denied. Only vfio will function.
The “Permission denied” is a stale errno value, the actual test in the code that fails is inside routine rte_mem_virt2phy():
/* * the pfn (page frame number) are bits 0-54 (see * pagemap.txt in linux Documentation) */ if ((page & 0x7fffffffffffffULL) == 0) { return RTE_BAD_IOVA; }
The bottom 55 bits of the word that was read are all zeros. The actual value of the word is 0x8180000000000000.
Step 3: Routine rte_service_init() gets called. It eventually calls alloc_seg() which wants to convert a virtual address to a physical address, but phys_addrs_available is false, so it fails. It tries to allocate segments on both sockets but both attempts fail, as phys_addrs_available applies to both sockets.
delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: Trying to obtain current memory policy. EAL: Setting policy MPOL_PREFERRED for socket 0 delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: alloc_seg(): can't get IOVA addr delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: Ask a virtual area of 0x200000 bytes delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: Virtual area found at 0x200000200000 (size = 0x200000) delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: attempted to allocate 1 segments, but only 0 were allocated EAL: Restoring previous memory policy: 0 delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: Trying to obtain current memory policy. delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: Setting policy MPOL_PREFERRED for socket 1 delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: alloc_seg(): can't get IOVA addr delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: Ask a virtual area of 0x200000 bytes EAL: Virtual area found at 0x201000a00000 (size = 0x200000) delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: attempted to allocate 1 segments, but only 0 were allocated delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: Restoring previous memory policy: 0 delphi-004.us.cray.com ERROR 2020/03/14 08:37:09 daos_io_server:0 EAL: FATAL: rte_service_init() failed Failed to initialize DPDK delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 error allocating rte services array EAL: rte_service_init() failed
I think the heart of the problem is, when DPDK reads /proc/self/pagemap, why is it getting a value where the bottom 55 bits are all zero?
Kevan
From:
<daos@daos.groups.io> on behalf of Kevan Rehm <kevan.rehm@...>
One update, the discarded garbage messages occur because MaxMessageSize is set to 4096 which is smaller than the message which daos_admin is trying to return because there are 11 (eventually 12) NVMe devices on this node. I raised the value of MaxMessageSize to 8192 and the “discarded garbage message” problem went away, but the rest of the issues still remain.
Kevan
From:
<daos@daos.groups.io> on behalf of Kevan Rehm <kevan.rehm@...>
Greetings,
I’ve been debugging a daos_io_server startup problem for a couple of days now and have gotten nowhere, so it’s time to call in the experts. Two IO servers are configured per node, one has 5 NVMe devices and one has 6. They both give the following errors:
03/12-08:33:32.33 delphi-004 DAOS[141245/141245] server INFO src/iosrv/init.c:491 server_init() Network successfully initialized 03/12-08:33:32.33 delphi-004 DAOS[141245/141245] server INFO src/iosrv/init.c:500 server_init() Module vos,rdb,rsvc,security,mgmt,pool,cont,dtx,obj,rebuild successfully loaded 03/12-08:33:32.43 delphi-004 DAOS[141245/141287] bio INFO src/bio/bio_xstream.c:961 bio_xsctxt_alloc() Initialize NVMe context, tgt_id:0, init_thread:(nil) 03/12-08:33:32.71 delphi-004 DAOS[141245/141287] bio ERR src/bio/bio_xstream.c:1019 bio_xsctxt_alloc() failed to initialize SPDK env, DER_INVAL(-1003) 03/12-08:33:32.71 delphi-004 DAOS[141245/141287] server ERR src/iosrv/srv.c:452 dss_srv_handler() failed to init spdk context for xstream(2) rc:-1003
The failing function in bio_xsctxt_alloc() is spdk_env_init(), which just returns -1. Looking in /dev/hugepages, I see 40 2 MiB hugepage files owned by root, which I did not expect, because daos_server and the two daos_io_server processes are all running as user ‘daos’. I believe I have the file permissions set correctly:
[root@delphi-004 tmp]# cd ~daos/daos/install/bin [root@delphi-004 bin]# ls -l daos_admin daos_server -rwsr-x---. 1 root daos_grp 5751760 Mar 12 2020 daos_admin -rwxr-sr-x. 1 root daos_grp 16219920 Mar 12 2020 daos_server
The daos_admin process had errors also and exited, and daos_control.log mentions it is discarding garbage responses.
Am I doing something obviously wrong? Attached are the daos_control.log and daos_io_server.log files, together with the daos .yml files.
If you need more info, let me know.
Thanks, Kevan --------------------------------------------------------------------- This e-mail and any attachments may contain confidential material for --------------------------------------------------------------------- This e-mail and any attachments may contain confidential material for --------------------------------------------------------------------- This e-mail and any attachments may contain confidential material for --------------------------------------------------------------------- This e-mail and any attachments may contain confidential material for --------------------------------------------------------------------- This e-mail and any attachments may contain confidential material for
|
|
Kevan Rehm
Right, we are using SCM, thanks.
I’ll give the PR a try.
Kevan
From: <daos@daos.groups.io> on behalf of "Nabarro, Tom" <tom.nabarro@...>
Non-root supported for SCM only currently. Apologies for any inconsistency with that communication.
Why don’t you try building with the PR I suggested, that should get landed before too long.
Regards, Tom Nabarro – DCG/ESAD M: +44 (0)7786 260986 Skype: tom.nabarro
From: daos@daos.groups.io <daos@daos.groups.io>
On Behalf Of Kevan Rehm
Well, the daos admin guide describes how to run daos as non-root user, and how to install and configure daos_server and daos_admin for that use case. I have been following the documentation. I figured if it is documented, then it is supported.
Originally I followed the email that came out about a month or two ago on how to run daos as non-root, it described how to also do the daos_admin config. I took that to mean non-root was supported. Oddly, looking again just now at the admin guide, I see that the documentation is different again than it was a while ago, now there are a bunch of symlinks to set up. When did that change occur?
I think I am going to punt and run the daemon as root. And I am definitely not going to touch centos 8…
Kevan
From: <daos@daos.groups.io> on behalf of "Nabarro, Tom" <tom.nabarro@...>
Apologies that this has taken some of your time, up until now the only real supported option for running DAOS with NVMe is to run as root (many months ago you could get away with running UIO+SPDK as non-root but a security fix in kernel precluded that). We haven't communicated VFIO+SPDK non-root as a supported configuration yet as far as I know.
Let me know how you get on with the build from that PR.
We are not using Centos 8 yet so if you did you would be doing some useful pathfinding :-)
Regards, Tom Nabarro – DCG/ESAD M: +44 (0)7786 260986 Skype: tom.nabarro
From:
daos@daos.groups.io <daos@daos.groups.io>
On Behalf Of Kevan Rehm
Tom,
Sorry to hear that, being sick is no fun, hope you’re over the hump.
Yes, this is the problem I’ve been chasing for a couple of weeks, we are running centos 7.7. I learned more about DPDK internals than I planned to. 😊
Not sure how to make this happen, but it would be nice if breakages like this got communicated to the outside world when they happen so that we don’t lose cycles debugging things that have already been debugged. Maybe a web page of “current issues”? Or email messages here? You probably have better ideas. I did try reading through the Jira log daily for a while to watch for new issues, but that didn’t seem very productive, TMI that didn’t pertain.
I got some consolation from the fact that at least this time it wasn’t pilot error. 😊
Thanks, Kevan
P.S. We have talked from time to time about upgrading to centos 8. Would that be a bad idea?
From: <daos@daos.groups.io> on behalf of "Nabarro, Tom" <tom.nabarro@...>
Hello Kevan,
sincere apologies for not being able to reply sooner, I haven't been well and Mike has been on leave.
SPDK through VFIO (when running as non-root) broke with the version of SPDK (19.04) we use in our build when we moved from Centos 7.6->7.7 . The fix is to upgrade SPDK to 20.01, unfortunately there have been some API breakages between those versions and we have had to push for another release to properly bump .so versions to reflect the API changes. we are therefore waiting for 20.01.1 which is targeted for March 20th.
Think this is the issue you are seeing, this PR should enable you to run as non-root (I've been using it for a while): https://github.com/daos-stack/daos/pull/1902 Rebuild instructions: https://github.com/daos-stack/daos/pull/1902#issuecomment-595702110
https://jira.hpdd.intel.com/browse/DAOS-4164
Hope that helps
Regards, Tom Nabarro – DCG/ESAD M: +44 (0)7786 260986 Skype: tom.nabarro
From:
daos@daos.groups.io <daos@daos.groups.io>
On Behalf Of Kevan Rehm
I should probably be more accurate and say that “I think I am using vfio”. 😊. Here are the things I have checked, if there are other things to check, let me know.
# cat /proc/cmdline BOOT_IMAGE=/vmlinuz-3.10.0-1062.12.1.el7.x86_64 root=/dev/mapper/cl_delphi--004-root ro spectre_v2=retpoline rd.lvm.lv=cl_delphi-004/root rd.lvm.lv=cl_delphi-004/swap rhgb quiet intel_iommu=on console=ttyS1,115200
# ls -l total 0 lrwxrwxrwx. 1 root root 0 Mar 15 01:38 dmar0 -> ../../devices/virtual/iommu/dmar0 lrwxrwxrwx. 1 root root 0 Mar 15 01:38 dmar1 -> ../../devices/virtual/iommu/dmar1 lrwxrwxrwx. 1 root root 0 Mar 15 01:38 dmar2 -> ../../devices/virtual/iommu/dmar2 lrwxrwxrwx. 1 root root 0 Mar 15 01:38 dmar3 -> ../../devices/virtual/iommu/dmar3 lrwxrwxrwx. 1 root root 0 Mar 15 01:38 dmar4 -> ../../devices/virtual/iommu/dmar4 lrwxrwxrwx. 1 root root 0 Mar 15 01:38 dmar5 -> ../../devices/virtual/iommu/dmar5 lrwxrwxrwx. 1 root root 0 Mar 15 01:38 dmar6 -> ../../devices/virtual/iommu/dmar6 lrwxrwxrwx. 1 root root 0 Mar 15 01:38 dmar7 -> ../../devices/virtual/iommu/dmar7 [root@delphi-004 iommu]# pwd /sys/class/iommu
From daos_control.log: delphi-004.us.cray.com INFO 2020/03/15 02:31:29 daos_io_server:1 EAL: Initializing vfio delphi-004.us.cray.com INFO 2020/03/15 02:31:29 daos_io_server:1 EAL: Probing VFIO support... delphi-004.us.cray.com INFO 2020/03/15 02:31:29 daos_io_server:1 EAL: IOMMU type 1 (Type 1) is supported EAL: IOMMU type 7 (sPAPR) is not supported delphi-004.us.cray.com INFO 2020/03/15 02:31:29 daos_io_server:1 EAL: IOMMU type 8 (No-IOMMU) is not supported EAL: VFIO support initialized
[root@delphi-004 vfio]# cd /dev/vfio [root@delphi-004 vfio]# ls -l total 0 crw-rw-rw-. 1 root root 235, 11 Mar 15 02:34 1 crw-rw-rw-. 1 root root 235, 0 Mar 15 02:33 29 crw-rw-rw-. 1 root root 235, 1 Mar 15 02:33 41 crw-rw-rw-. 1 root root 235, 2 Mar 15 02:33 42 crw-rw-rw-. 1 root root 235, 3 Mar 15 02:33 43 crw-rw-rw-. 1 root root 235, 4 Mar 15 02:33 44 crw-rw-rw-. 1 root root 235, 12 Mar 15 02:34 55 crw-rw-rw-. 1 root root 235, 5 Mar 15 02:33 71 crw-rw-rw-. 1 root root 235, 6 Mar 15 02:33 72 crw-rw-rw-. 1 root root 235, 7 Mar 15 02:33 84 crw-rw-rw-. 1 root root 235, 8 Mar 15 02:33 85 crw-rw-rw-. 1 root root 235, 9 Mar 15 02:33 86 crw-rw-rw-. 1 root root 235, 10 Mar 15 02:34 87 crw-rw-rw-. 1 root root 10, 196 Mar 15 02:18 vfio
From daos_server.yml, server 1 has:
bdev_class: nvme bdev_list: ["0000:1a:00.0", "0000:3b:00.0", "0000:3c:00.0", "0000:3d:00.0", "0000:3e:00.0"] and server 2 has: bdev_class: nvme bdev_list: ["0000:86:00.0", "0000:87:00.0", "0000:af:00.0", "0000:b0:00.0", "0000:b1:00.0", "0000:b2:00.0"]
From: <daos@daos.groups.io> on behalf of "Lombardi, Johann" <johann.lombardi@...>
Hm, then I don’t understand why it is trying to read /proc/self/pagemap (only accessible to root in recent kernels). Maybe Tom and Mike can comment.
Cheers, Johann
From:
<daos@daos.groups.io> on behalf of Kevan Rehm <kevan.rehm@...>
I am using vfio, and my IOMMU is enabled.
From:
<daos@daos.groups.io> on behalf of "Lombardi, Johann" <johann.lombardi@...>
Hi Kevan,
To run SPDK as a non-root user, you need to switch from UIO to VFIO. Tom and Mike have spent some time recently to verify that the DAOS server can be run as a regular user (except for the setuid root on the daos_admin utility) when VFIO is enabled. It requires VT-d to be enabled in the BIOS. Please check: https://daos-stack.github.io/admin/deployment/#enable-iommu-optional
Cheers, Johann
From:
<daos@daos.groups.io> on behalf of Kevan Rehm <kevan.rehm@...>
(DAOS-4342 will be closed, a more recent master contains the fix. For the DPDK problem here, I am running yesterday’s master.)
My DPDK problem is related to permissions. I am running daos_server as user ‘daos’, group ‘daos_grp’, with daos_admin and daos_server permissions set as documented:
# ls -l daos_admin daos_server -rwsr-x---. 1 root daos_grp 6188984 Mar 14 16:12 daos_admin -rwxr-sr-x. 1 root daos_grp 16345032 Mar 14 16:13 daos_server
If I start the daemon manually like this: ~/daos/install/bin/daos_server start
it fails every time, the page frame number read by rte_mem_virt2phy() is always zero, and alloc_seg() always fails. If I instead start the daemon with: sudo ~/daos/install/bin/daos_server start
then the page frame numbers read by rte_mem_virt2phy() are correctly non-zero, and the daos_io_server daemons start up.
Am I doing something incorrectly in my attempts to run the daos servers as non-root?
Thanks, Kevan
From:
<daos@daos.groups.io> on behalf of Kevan Rehm <kevan.rehm@...>
Greetings,
I have opened Jira DAOS-4342 for the problem where the value of MaxMessageSize is too small.
I still have no solution for the DPDK failure I am seeing in daos_server startup, but I have more background information on the problem. The underlying problem is that DPDK cannot convert virtual addresses to physical addresses.
Step 1: Early in startup, rte_pci_get_iommu_class() is called from rte_eal_init(), it sets iova_mode to RTE_IOVA_PA (DMA using physical addresses).
Step 2: rte_eal_hugepage_init() gets called, which eventually calls test_phys_addrs_available(). That routine sets phys_addrs_available = false, and reports:
delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:1 EAL: Cannot obtain physical addresses: Permission denied. Only vfio will function.
The “Permission denied” is a stale errno value, the actual test in the code that fails is inside routine rte_mem_virt2phy():
/* * the pfn (page frame number) are bits 0-54 (see * pagemap.txt in linux Documentation) */ if ((page & 0x7fffffffffffffULL) == 0) { return RTE_BAD_IOVA; }
The bottom 55 bits of the word that was read are all zeros. The actual value of the word is 0x8180000000000000.
Step 3: Routine rte_service_init() gets called. It eventually calls alloc_seg() which wants to convert a virtual address to a physical address, but phys_addrs_available is false, so it fails. It tries to allocate segments on both sockets but both attempts fail, as phys_addrs_available applies to both sockets.
delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: Trying to obtain current memory policy. EAL: Setting policy MPOL_PREFERRED for socket 0 delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: alloc_seg(): can't get IOVA addr delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: Ask a virtual area of 0x200000 bytes delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: Virtual area found at 0x200000200000 (size = 0x200000) delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: attempted to allocate 1 segments, but only 0 were allocated EAL: Restoring previous memory policy: 0 delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: Trying to obtain current memory policy. delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: Setting policy MPOL_PREFERRED for socket 1 delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: alloc_seg(): can't get IOVA addr delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: Ask a virtual area of 0x200000 bytes EAL: Virtual area found at 0x201000a00000 (size = 0x200000) delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: attempted to allocate 1 segments, but only 0 were allocated delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: Restoring previous memory policy: 0 delphi-004.us.cray.com ERROR 2020/03/14 08:37:09 daos_io_server:0 EAL: FATAL: rte_service_init() failed Failed to initialize DPDK delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 error allocating rte services array EAL: rte_service_init() failed
I think the heart of the problem is, when DPDK reads /proc/self/pagemap, why is it getting a value where the bottom 55 bits are all zero?
Kevan
From:
<daos@daos.groups.io> on behalf of Kevan Rehm <kevan.rehm@...>
One update, the discarded garbage messages occur because MaxMessageSize is set to 4096 which is smaller than the message which daos_admin is trying to return because there are 11 (eventually 12) NVMe devices on this node. I raised the value of MaxMessageSize to 8192 and the “discarded garbage message” problem went away, but the rest of the issues still remain.
Kevan
From:
<daos@daos.groups.io> on behalf of Kevan Rehm <kevan.rehm@...>
Greetings,
I’ve been debugging a daos_io_server startup problem for a couple of days now and have gotten nowhere, so it’s time to call in the experts. Two IO servers are configured per node, one has 5 NVMe devices and one has 6. They both give the following errors:
03/12-08:33:32.33 delphi-004 DAOS[141245/141245] server INFO src/iosrv/init.c:491 server_init() Network successfully initialized 03/12-08:33:32.33 delphi-004 DAOS[141245/141245] server INFO src/iosrv/init.c:500 server_init() Module vos,rdb,rsvc,security,mgmt,pool,cont,dtx,obj,rebuild successfully loaded 03/12-08:33:32.43 delphi-004 DAOS[141245/141287] bio INFO src/bio/bio_xstream.c:961 bio_xsctxt_alloc() Initialize NVMe context, tgt_id:0, init_thread:(nil) 03/12-08:33:32.71 delphi-004 DAOS[141245/141287] bio ERR src/bio/bio_xstream.c:1019 bio_xsctxt_alloc() failed to initialize SPDK env, DER_INVAL(-1003) 03/12-08:33:32.71 delphi-004 DAOS[141245/141287] server ERR src/iosrv/srv.c:452 dss_srv_handler() failed to init spdk context for xstream(2) rc:-1003
The failing function in bio_xsctxt_alloc() is spdk_env_init(), which just returns -1. Looking in /dev/hugepages, I see 40 2 MiB hugepage files owned by root, which I did not expect, because daos_server and the two daos_io_server processes are all running as user ‘daos’. I believe I have the file permissions set correctly:
[root@delphi-004 tmp]# cd ~daos/daos/install/bin [root@delphi-004 bin]# ls -l daos_admin daos_server -rwsr-x---. 1 root daos_grp 5751760 Mar 12 2020 daos_admin -rwxr-sr-x. 1 root daos_grp 16219920 Mar 12 2020 daos_server
The daos_admin process had errors also and exited, and daos_control.log mentions it is discarding garbage responses.
Am I doing something obviously wrong? Attached are the daos_control.log and daos_io_server.log files, together with the daos .yml files.
If you need more info, let me know.
Thanks, Kevan --------------------------------------------------------------------- This e-mail and any attachments may contain confidential material for --------------------------------------------------------------------- This e-mail and any attachments may contain confidential material for --------------------------------------------------------------------- This e-mail and any attachments may contain confidential material for --------------------------------------------------------------------- This e-mail and any attachments may contain confidential material for --------------------------------------------------------------------- This e-mail and any attachments may contain confidential material for
|
|
Colin Ngam
Hi Tom,
I followed the instructions and got the following:
Checking for C library spdk... no MissingTargets: spdk has missing targets after build. See config.log for details: File "/home/users/daos/daos/SConstruct", line 404: scons() File "/home/users/daos/daos/SConstruct", line 349: preload_prereqs(prereqs) File "/home/users/daos/daos/SConstruct", line 132: prereqs.load_definitions(prebuild=reqs) File "/home/users/daos/daos/scons_local/prereq_tools/base.py", line 1057: self.require(env, comp) File "/home/users/daos/daos/scons_local/prereq_tools/base.py", line 1131: raise error
Keep in mind that I ran the build twice. So, I do not know if the config.log is twice as much.
Thanks.
Colin
From: <daos@daos.groups.io> on behalf of "Nabarro, Tom" <tom.nabarro@...>
Non-root supported for SCM only currently. Apologies for any inconsistency with that communication.
Why don’t you try building with the PR I suggested, that should get landed before too long.
Regards, Tom Nabarro – DCG/ESAD M: +44 (0)7786 260986 Skype: tom.nabarro
From: daos@daos.groups.io <daos@daos.groups.io>
On Behalf Of Kevan Rehm
Well, the daos admin guide describes how to run daos as non-root user, and how to install and configure daos_server and daos_admin for that use case. I have been following the documentation. I figured if it is documented, then it is supported.
Originally I followed the email that came out about a month or two ago on how to run daos as non-root, it described how to also do the daos_admin config. I took that to mean non-root was supported. Oddly, looking again just now at the admin guide, I see that the documentation is different again than it was a while ago, now there are a bunch of symlinks to set up. When did that change occur?
I think I am going to punt and run the daemon as root. And I am definitely not going to touch centos 8…
Kevan
From: <daos@daos.groups.io> on behalf of "Nabarro, Tom" <tom.nabarro@...>
Apologies that this has taken some of your time, up until now the only real supported option for running DAOS with NVMe is to run as root (many months ago you could get away with running UIO+SPDK as non-root but a security fix in kernel precluded that). We haven't communicated VFIO+SPDK non-root as a supported configuration yet as far as I know.
Let me know how you get on with the build from that PR.
We are not using Centos 8 yet so if you did you would be doing some useful pathfinding :-)
Regards, Tom Nabarro – DCG/ESAD M: +44 (0)7786 260986 Skype: tom.nabarro
From:
daos@daos.groups.io <daos@daos.groups.io>
On Behalf Of Kevan Rehm
Tom,
Sorry to hear that, being sick is no fun, hope you’re over the hump.
Yes, this is the problem I’ve been chasing for a couple of weeks, we are running centos 7.7. I learned more about DPDK internals than I planned to. 😊
Not sure how to make this happen, but it would be nice if breakages like this got communicated to the outside world when they happen so that we don’t lose cycles debugging things that have already been debugged. Maybe a web page of “current issues”? Or email messages here? You probably have better ideas. I did try reading through the Jira log daily for a while to watch for new issues, but that didn’t seem very productive, TMI that didn’t pertain.
I got some consolation from the fact that at least this time it wasn’t pilot error. 😊
Thanks, Kevan
P.S. We have talked from time to time about upgrading to centos 8. Would that be a bad idea?
From: <daos@daos.groups.io> on behalf of "Nabarro, Tom" <tom.nabarro@...>
Hello Kevan,
sincere apologies for not being able to reply sooner, I haven't been well and Mike has been on leave.
SPDK through VFIO (when running as non-root) broke with the version of SPDK (19.04) we use in our build when we moved from Centos 7.6->7.7 . The fix is to upgrade SPDK to 20.01, unfortunately there have been some API breakages between those versions and we have had to push for another release to properly bump .so versions to reflect the API changes. we are therefore waiting for 20.01.1 which is targeted for March 20th.
Think this is the issue you are seeing, this PR should enable you to run as non-root (I've been using it for a while): https://github.com/daos-stack/daos/pull/1902 Rebuild instructions: https://github.com/daos-stack/daos/pull/1902#issuecomment-595702110
https://jira.hpdd.intel.com/browse/DAOS-4164
Hope that helps
Regards, Tom Nabarro – DCG/ESAD M: +44 (0)7786 260986 Skype: tom.nabarro
From:
daos@daos.groups.io <daos@daos.groups.io>
On Behalf Of Kevan Rehm
I should probably be more accurate and say that “I think I am using vfio”. 😊. Here are the things I have checked, if there are other things to check, let me know.
# cat /proc/cmdline BOOT_IMAGE=/vmlinuz-3.10.0-1062.12.1.el7.x86_64 root=/dev/mapper/cl_delphi--004-root ro spectre_v2=retpoline rd.lvm.lv=cl_delphi-004/root rd.lvm.lv=cl_delphi-004/swap rhgb quiet intel_iommu=on console=ttyS1,115200
# ls -l total 0 lrwxrwxrwx. 1 root root 0 Mar 15 01:38 dmar0 -> ../../devices/virtual/iommu/dmar0 lrwxrwxrwx. 1 root root 0 Mar 15 01:38 dmar1 -> ../../devices/virtual/iommu/dmar1 lrwxrwxrwx. 1 root root 0 Mar 15 01:38 dmar2 -> ../../devices/virtual/iommu/dmar2 lrwxrwxrwx. 1 root root 0 Mar 15 01:38 dmar3 -> ../../devices/virtual/iommu/dmar3 lrwxrwxrwx. 1 root root 0 Mar 15 01:38 dmar4 -> ../../devices/virtual/iommu/dmar4 lrwxrwxrwx. 1 root root 0 Mar 15 01:38 dmar5 -> ../../devices/virtual/iommu/dmar5 lrwxrwxrwx. 1 root root 0 Mar 15 01:38 dmar6 -> ../../devices/virtual/iommu/dmar6 lrwxrwxrwx. 1 root root 0 Mar 15 01:38 dmar7 -> ../../devices/virtual/iommu/dmar7 [root@delphi-004 iommu]# pwd /sys/class/iommu
From daos_control.log: delphi-004.us.cray.com INFO 2020/03/15 02:31:29 daos_io_server:1 EAL: Initializing vfio delphi-004.us.cray.com INFO 2020/03/15 02:31:29 daos_io_server:1 EAL: Probing VFIO support... delphi-004.us.cray.com INFO 2020/03/15 02:31:29 daos_io_server:1 EAL: IOMMU type 1 (Type 1) is supported EAL: IOMMU type 7 (sPAPR) is not supported delphi-004.us.cray.com INFO 2020/03/15 02:31:29 daos_io_server:1 EAL: IOMMU type 8 (No-IOMMU) is not supported EAL: VFIO support initialized
[root@delphi-004 vfio]# cd /dev/vfio [root@delphi-004 vfio]# ls -l total 0 crw-rw-rw-. 1 root root 235, 11 Mar 15 02:34 1 crw-rw-rw-. 1 root root 235, 0 Mar 15 02:33 29 crw-rw-rw-. 1 root root 235, 1 Mar 15 02:33 41 crw-rw-rw-. 1 root root 235, 2 Mar 15 02:33 42 crw-rw-rw-. 1 root root 235, 3 Mar 15 02:33 43 crw-rw-rw-. 1 root root 235, 4 Mar 15 02:33 44 crw-rw-rw-. 1 root root 235, 12 Mar 15 02:34 55 crw-rw-rw-. 1 root root 235, 5 Mar 15 02:33 71 crw-rw-rw-. 1 root root 235, 6 Mar 15 02:33 72 crw-rw-rw-. 1 root root 235, 7 Mar 15 02:33 84 crw-rw-rw-. 1 root root 235, 8 Mar 15 02:33 85 crw-rw-rw-. 1 root root 235, 9 Mar 15 02:33 86 crw-rw-rw-. 1 root root 235, 10 Mar 15 02:34 87 crw-rw-rw-. 1 root root 10, 196 Mar 15 02:18 vfio
From daos_server.yml, server 1 has:
bdev_class: nvme bdev_list: ["0000:1a:00.0", "0000:3b:00.0", "0000:3c:00.0", "0000:3d:00.0", "0000:3e:00.0"] and server 2 has: bdev_class: nvme bdev_list: ["0000:86:00.0", "0000:87:00.0", "0000:af:00.0", "0000:b0:00.0", "0000:b1:00.0", "0000:b2:00.0"]
From: <daos@daos.groups.io> on behalf of "Lombardi, Johann" <johann.lombardi@...>
Hm, then I don’t understand why it is trying to read /proc/self/pagemap (only accessible to root in recent kernels). Maybe Tom and Mike can comment.
Cheers, Johann
From:
<daos@daos.groups.io> on behalf of Kevan Rehm <kevan.rehm@...>
I am using vfio, and my IOMMU is enabled.
From:
<daos@daos.groups.io> on behalf of "Lombardi, Johann" <johann.lombardi@...>
Hi Kevan,
To run SPDK as a non-root user, you need to switch from UIO to VFIO. Tom and Mike have spent some time recently to verify that the DAOS server can be run as a regular user (except for the setuid root on the daos_admin utility) when VFIO is enabled. It requires VT-d to be enabled in the BIOS. Please check: https://daos-stack.github.io/admin/deployment/#enable-iommu-optional
Cheers, Johann
From:
<daos@daos.groups.io> on behalf of Kevan Rehm <kevan.rehm@...>
(DAOS-4342 will be closed, a more recent master contains the fix. For the DPDK problem here, I am running yesterday’s master.)
My DPDK problem is related to permissions. I am running daos_server as user ‘daos’, group ‘daos_grp’, with daos_admin and daos_server permissions set as documented:
# ls -l daos_admin daos_server -rwsr-x---. 1 root daos_grp 6188984 Mar 14 16:12 daos_admin -rwxr-sr-x. 1 root daos_grp 16345032 Mar 14 16:13 daos_server
If I start the daemon manually like this: ~/daos/install/bin/daos_server start
it fails every time, the page frame number read by rte_mem_virt2phy() is always zero, and alloc_seg() always fails. If I instead start the daemon with: sudo ~/daos/install/bin/daos_server start
then the page frame numbers read by rte_mem_virt2phy() are correctly non-zero, and the daos_io_server daemons start up.
Am I doing something incorrectly in my attempts to run the daos servers as non-root?
Thanks, Kevan
From:
<daos@daos.groups.io> on behalf of Kevan Rehm <kevan.rehm@...>
Greetings,
I have opened Jira DAOS-4342 for the problem where the value of MaxMessageSize is too small.
I still have no solution for the DPDK failure I am seeing in daos_server startup, but I have more background information on the problem. The underlying problem is that DPDK cannot convert virtual addresses to physical addresses.
Step 1: Early in startup, rte_pci_get_iommu_class() is called from rte_eal_init(), it sets iova_mode to RTE_IOVA_PA (DMA using physical addresses).
Step 2: rte_eal_hugepage_init() gets called, which eventually calls test_phys_addrs_available(). That routine sets phys_addrs_available = false, and reports:
delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:1 EAL: Cannot obtain physical addresses: Permission denied. Only vfio will function.
The “Permission denied” is a stale errno value, the actual test in the code that fails is inside routine rte_mem_virt2phy():
/* * the pfn (page frame number) are bits 0-54 (see * pagemap.txt in linux Documentation) */ if ((page & 0x7fffffffffffffULL) == 0) { return RTE_BAD_IOVA; }
The bottom 55 bits of the word that was read are all zeros. The actual value of the word is 0x8180000000000000.
Step 3: Routine rte_service_init() gets called. It eventually calls alloc_seg() which wants to convert a virtual address to a physical address, but phys_addrs_available is false, so it fails. It tries to allocate segments on both sockets but both attempts fail, as phys_addrs_available applies to both sockets.
delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: Trying to obtain current memory policy. EAL: Setting policy MPOL_PREFERRED for socket 0 delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: alloc_seg(): can't get IOVA addr delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: Ask a virtual area of 0x200000 bytes delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: Virtual area found at 0x200000200000 (size = 0x200000) delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: attempted to allocate 1 segments, but only 0 were allocated EAL: Restoring previous memory policy: 0 delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: Trying to obtain current memory policy. delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: Setting policy MPOL_PREFERRED for socket 1 delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: alloc_seg(): can't get IOVA addr delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: Ask a virtual area of 0x200000 bytes EAL: Virtual area found at 0x201000a00000 (size = 0x200000) delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: attempted to allocate 1 segments, but only 0 were allocated delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: Restoring previous memory policy: 0 delphi-004.us.cray.com ERROR 2020/03/14 08:37:09 daos_io_server:0 EAL: FATAL: rte_service_init() failed Failed to initialize DPDK delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 error allocating rte services array EAL: rte_service_init() failed
I think the heart of the problem is, when DPDK reads /proc/self/pagemap, why is it getting a value where the bottom 55 bits are all zero?
Kevan
From:
<daos@daos.groups.io> on behalf of Kevan Rehm <kevan.rehm@...>
One update, the discarded garbage messages occur because MaxMessageSize is set to 4096 which is smaller than the message which daos_admin is trying to return because there are 11 (eventually 12) NVMe devices on this node. I raised the value of MaxMessageSize to 8192 and the “discarded garbage message” problem went away, but the rest of the issues still remain.
Kevan
From:
<daos@daos.groups.io> on behalf of Kevan Rehm <kevan.rehm@...>
Greetings,
I’ve been debugging a daos_io_server startup problem for a couple of days now and have gotten nowhere, so it’s time to call in the experts. Two IO servers are configured per node, one has 5 NVMe devices and one has 6. They both give the following errors:
03/12-08:33:32.33 delphi-004 DAOS[141245/141245] server INFO src/iosrv/init.c:491 server_init() Network successfully initialized 03/12-08:33:32.33 delphi-004 DAOS[141245/141245] server INFO src/iosrv/init.c:500 server_init() Module vos,rdb,rsvc,security,mgmt,pool,cont,dtx,obj,rebuild successfully loaded 03/12-08:33:32.43 delphi-004 DAOS[141245/141287] bio INFO src/bio/bio_xstream.c:961 bio_xsctxt_alloc() Initialize NVMe context, tgt_id:0, init_thread:(nil) 03/12-08:33:32.71 delphi-004 DAOS[141245/141287] bio ERR src/bio/bio_xstream.c:1019 bio_xsctxt_alloc() failed to initialize SPDK env, DER_INVAL(-1003) 03/12-08:33:32.71 delphi-004 DAOS[141245/141287] server ERR src/iosrv/srv.c:452 dss_srv_handler() failed to init spdk context for xstream(2) rc:-1003
The failing function in bio_xsctxt_alloc() is spdk_env_init(), which just returns -1. Looking in /dev/hugepages, I see 40 2 MiB hugepage files owned by root, which I did not expect, because daos_server and the two daos_io_server processes are all running as user ‘daos’. I believe I have the file permissions set correctly:
[root@delphi-004 tmp]# cd ~daos/daos/install/bin [root@delphi-004 bin]# ls -l daos_admin daos_server -rwsr-x---. 1 root daos_grp 5751760 Mar 12 2020 daos_admin -rwxr-sr-x. 1 root daos_grp 16219920 Mar 12 2020 daos_server
The daos_admin process had errors also and exited, and daos_control.log mentions it is discarding garbage responses.
Am I doing something obviously wrong? Attached are the daos_control.log and daos_io_server.log files, together with the daos .yml files.
If you need more info, let me know.
Thanks, Kevan --------------------------------------------------------------------- This e-mail and any attachments may contain confidential material for --------------------------------------------------------------------- This e-mail and any attachments may contain confidential material for --------------------------------------------------------------------- This e-mail and any attachments may contain confidential material for --------------------------------------------------------------------- This e-mail and any attachments may contain confidential material for --------------------------------------------------------------------- This e-mail and any attachments may contain confidential material for
|
|
Did you do a “git submodule update” to pull in the scons_local changes? Maybe I missed that out of the instructions, there was a change to the configure script check on the spdk libs.
Regards, Tom Nabarro – DCG/ESAD M: +44 (0)7786 260986 Skype: tom.nabarro
From: daos@daos.groups.io <daos@daos.groups.io>
On Behalf Of Colin Ngam
Hi Tom,
I followed the instructions and got the following:
Checking for C library spdk... no MissingTargets: spdk has missing targets after build. See config.log for details: File "/home/users/daos/daos/SConstruct", line 404: scons() File "/home/users/daos/daos/SConstruct", line 349: preload_prereqs(prereqs) File "/home/users/daos/daos/SConstruct", line 132: prereqs.load_definitions(prebuild=reqs) File "/home/users/daos/daos/scons_local/prereq_tools/base.py", line 1057: self.require(env, comp) File "/home/users/daos/daos/scons_local/prereq_tools/base.py", line 1131: raise error
Keep in mind that I ran the build twice. So, I do not know if the config.log is twice as much.
Thanks.
Colin
From: <daos@daos.groups.io> on behalf of "Nabarro, Tom" <tom.nabarro@...>
Non-root supported for SCM only currently. Apologies for any inconsistency with that communication.
Why don’t you try building with the PR I suggested, that should get landed before too long.
Regards, Tom Nabarro – DCG/ESAD M: +44 (0)7786 260986 Skype: tom.nabarro
From:
daos@daos.groups.io <daos@daos.groups.io>
On Behalf Of Kevan Rehm
Well, the daos admin guide describes how to run daos as non-root user, and how to install and configure daos_server and daos_admin for that use case. I have been following the documentation. I figured if it is documented, then it is supported.
Originally I followed the email that came out about a month or two ago on how to run daos as non-root, it described how to also do the daos_admin config. I took that to mean non-root was supported. Oddly, looking again just now at the admin guide, I see that the documentation is different again than it was a while ago, now there are a bunch of symlinks to set up. When did that change occur?
I think I am going to punt and run the daemon as root. And I am definitely not going to touch centos 8…
Kevan
From: <daos@daos.groups.io> on behalf of "Nabarro, Tom" <tom.nabarro@...>
Apologies that this has taken some of your time, up until now the only real supported option for running DAOS with NVMe is to run as root (many months ago you could get away with running UIO+SPDK as non-root but a security fix in kernel precluded that). We haven't communicated VFIO+SPDK non-root as a supported configuration yet as far as I know.
Let me know how you get on with the build from that PR.
We are not using Centos 8 yet so if you did you would be doing some useful pathfinding :-)
Regards, Tom Nabarro – DCG/ESAD M: +44 (0)7786 260986 Skype: tom.nabarro
From:
daos@daos.groups.io <daos@daos.groups.io>
On Behalf Of Kevan Rehm
Tom,
Sorry to hear that, being sick is no fun, hope you’re over the hump.
Yes, this is the problem I’ve been chasing for a couple of weeks, we are running centos 7.7. I learned more about DPDK internals than I planned to. 😊
Not sure how to make this happen, but it would be nice if breakages like this got communicated to the outside world when they happen so that we don’t lose cycles debugging things that have already been debugged. Maybe a web page of “current issues”? Or email messages here? You probably have better ideas. I did try reading through the Jira log daily for a while to watch for new issues, but that didn’t seem very productive, TMI that didn’t pertain.
I got some consolation from the fact that at least this time it wasn’t pilot error. 😊
Thanks, Kevan
P.S. We have talked from time to time about upgrading to centos 8. Would that be a bad idea?
From: <daos@daos.groups.io> on behalf of "Nabarro, Tom" <tom.nabarro@...>
Hello Kevan,
sincere apologies for not being able to reply sooner, I haven't been well and Mike has been on leave.
SPDK through VFIO (when running as non-root) broke with the version of SPDK (19.04) we use in our build when we moved from Centos 7.6->7.7 . The fix is to upgrade SPDK to 20.01, unfortunately there have been some API breakages between those versions and we have had to push for another release to properly bump .so versions to reflect the API changes. we are therefore waiting for 20.01.1 which is targeted for March 20th.
Think this is the issue you are seeing, this PR should enable you to run as non-root (I've been using it for a while): https://github.com/daos-stack/daos/pull/1902 Rebuild instructions: https://github.com/daos-stack/daos/pull/1902#issuecomment-595702110
https://jira.hpdd.intel.com/browse/DAOS-4164
Hope that helps
Regards, Tom Nabarro – DCG/ESAD M: +44 (0)7786 260986 Skype: tom.nabarro
From:
daos@daos.groups.io <daos@daos.groups.io>
On Behalf Of Kevan Rehm
I should probably be more accurate and say that “I think I am using vfio”. 😊. Here are the things I have checked, if there are other things to check, let me know.
# cat /proc/cmdline BOOT_IMAGE=/vmlinuz-3.10.0-1062.12.1.el7.x86_64 root=/dev/mapper/cl_delphi--004-root ro spectre_v2=retpoline rd.lvm.lv=cl_delphi-004/root rd.lvm.lv=cl_delphi-004/swap rhgb quiet intel_iommu=on console=ttyS1,115200
# ls -l total 0 lrwxrwxrwx. 1 root root 0 Mar 15 01:38 dmar0 -> ../../devices/virtual/iommu/dmar0 lrwxrwxrwx. 1 root root 0 Mar 15 01:38 dmar1 -> ../../devices/virtual/iommu/dmar1 lrwxrwxrwx. 1 root root 0 Mar 15 01:38 dmar2 -> ../../devices/virtual/iommu/dmar2 lrwxrwxrwx. 1 root root 0 Mar 15 01:38 dmar3 -> ../../devices/virtual/iommu/dmar3 lrwxrwxrwx. 1 root root 0 Mar 15 01:38 dmar4 -> ../../devices/virtual/iommu/dmar4 lrwxrwxrwx. 1 root root 0 Mar 15 01:38 dmar5 -> ../../devices/virtual/iommu/dmar5 lrwxrwxrwx. 1 root root 0 Mar 15 01:38 dmar6 -> ../../devices/virtual/iommu/dmar6 lrwxrwxrwx. 1 root root 0 Mar 15 01:38 dmar7 -> ../../devices/virtual/iommu/dmar7 [root@delphi-004 iommu]# pwd /sys/class/iommu
From daos_control.log: delphi-004.us.cray.com INFO 2020/03/15 02:31:29 daos_io_server:1 EAL: Initializing vfio delphi-004.us.cray.com INFO 2020/03/15 02:31:29 daos_io_server:1 EAL: Probing VFIO support... delphi-004.us.cray.com INFO 2020/03/15 02:31:29 daos_io_server:1 EAL: IOMMU type 1 (Type 1) is supported EAL: IOMMU type 7 (sPAPR) is not supported delphi-004.us.cray.com INFO 2020/03/15 02:31:29 daos_io_server:1 EAL: IOMMU type 8 (No-IOMMU) is not supported EAL: VFIO support initialized
[root@delphi-004 vfio]# cd /dev/vfio [root@delphi-004 vfio]# ls -l total 0 crw-rw-rw-. 1 root root 235, 11 Mar 15 02:34 1 crw-rw-rw-. 1 root root 235, 0 Mar 15 02:33 29 crw-rw-rw-. 1 root root 235, 1 Mar 15 02:33 41 crw-rw-rw-. 1 root root 235, 2 Mar 15 02:33 42 crw-rw-rw-. 1 root root 235, 3 Mar 15 02:33 43 crw-rw-rw-. 1 root root 235, 4 Mar 15 02:33 44 crw-rw-rw-. 1 root root 235, 12 Mar 15 02:34 55 crw-rw-rw-. 1 root root 235, 5 Mar 15 02:33 71 crw-rw-rw-. 1 root root 235, 6 Mar 15 02:33 72 crw-rw-rw-. 1 root root 235, 7 Mar 15 02:33 84 crw-rw-rw-. 1 root root 235, 8 Mar 15 02:33 85 crw-rw-rw-. 1 root root 235, 9 Mar 15 02:33 86 crw-rw-rw-. 1 root root 235, 10 Mar 15 02:34 87 crw-rw-rw-. 1 root root 10, 196 Mar 15 02:18 vfio
From daos_server.yml, server 1 has:
bdev_class: nvme bdev_list: ["0000:1a:00.0", "0000:3b:00.0", "0000:3c:00.0", "0000:3d:00.0", "0000:3e:00.0"] and server 2 has: bdev_class: nvme bdev_list: ["0000:86:00.0", "0000:87:00.0", "0000:af:00.0", "0000:b0:00.0", "0000:b1:00.0", "0000:b2:00.0"]
From: <daos@daos.groups.io> on behalf of "Lombardi, Johann" <johann.lombardi@...>
Hm, then I don’t understand why it is trying to read /proc/self/pagemap (only accessible to root in recent kernels). Maybe Tom and Mike can comment.
Cheers, Johann
From:
<daos@daos.groups.io> on behalf of Kevan Rehm <kevan.rehm@...>
I am using vfio, and my IOMMU is enabled.
From:
<daos@daos.groups.io> on behalf of "Lombardi, Johann" <johann.lombardi@...>
Hi Kevan,
To run SPDK as a non-root user, you need to switch from UIO to VFIO. Tom and Mike have spent some time recently to verify that the DAOS server can be run as a regular user (except for the setuid root on the daos_admin utility) when VFIO is enabled. It requires VT-d to be enabled in the BIOS. Please check: https://daos-stack.github.io/admin/deployment/#enable-iommu-optional
Cheers, Johann
From:
<daos@daos.groups.io> on behalf of Kevan Rehm <kevan.rehm@...>
(DAOS-4342 will be closed, a more recent master contains the fix. For the DPDK problem here, I am running yesterday’s master.)
My DPDK problem is related to permissions. I am running daos_server as user ‘daos’, group ‘daos_grp’, with daos_admin and daos_server permissions set as documented:
# ls -l daos_admin daos_server -rwsr-x---. 1 root daos_grp 6188984 Mar 14 16:12 daos_admin -rwxr-sr-x. 1 root daos_grp 16345032 Mar 14 16:13 daos_server
If I start the daemon manually like this: ~/daos/install/bin/daos_server start
it fails every time, the page frame number read by rte_mem_virt2phy() is always zero, and alloc_seg() always fails. If I instead start the daemon with: sudo ~/daos/install/bin/daos_server start
then the page frame numbers read by rte_mem_virt2phy() are correctly non-zero, and the daos_io_server daemons start up.
Am I doing something incorrectly in my attempts to run the daos servers as non-root?
Thanks, Kevan
From:
<daos@daos.groups.io> on behalf of Kevan Rehm <kevan.rehm@...>
Greetings,
I have opened Jira DAOS-4342 for the problem where the value of MaxMessageSize is too small.
I still have no solution for the DPDK failure I am seeing in daos_server startup, but I have more background information on the problem. The underlying problem is that DPDK cannot convert virtual addresses to physical addresses.
Step 1: Early in startup, rte_pci_get_iommu_class() is called from rte_eal_init(), it sets iova_mode to RTE_IOVA_PA (DMA using physical addresses).
Step 2: rte_eal_hugepage_init() gets called, which eventually calls test_phys_addrs_available(). That routine sets phys_addrs_available = false, and reports:
delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:1 EAL: Cannot obtain physical addresses: Permission denied. Only vfio will function.
The “Permission denied” is a stale errno value, the actual test in the code that fails is inside routine rte_mem_virt2phy():
/* * the pfn (page frame number) are bits 0-54 (see * pagemap.txt in linux Documentation) */ if ((page & 0x7fffffffffffffULL) == 0) { return RTE_BAD_IOVA; }
The bottom 55 bits of the word that was read are all zeros. The actual value of the word is 0x8180000000000000.
Step 3: Routine rte_service_init() gets called. It eventually calls alloc_seg() which wants to convert a virtual address to a physical address, but phys_addrs_available is false, so it fails. It tries to allocate segments on both sockets but both attempts fail, as phys_addrs_available applies to both sockets.
delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: Trying to obtain current memory policy. EAL: Setting policy MPOL_PREFERRED for socket 0 delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: alloc_seg(): can't get IOVA addr delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: Ask a virtual area of 0x200000 bytes delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: Virtual area found at 0x200000200000 (size = 0x200000) delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: attempted to allocate 1 segments, but only 0 were allocated EAL: Restoring previous memory policy: 0 delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: Trying to obtain current memory policy. delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: Setting policy MPOL_PREFERRED for socket 1 delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: alloc_seg(): can't get IOVA addr delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: Ask a virtual area of 0x200000 bytes EAL: Virtual area found at 0x201000a00000 (size = 0x200000) delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: attempted to allocate 1 segments, but only 0 were allocated delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 EAL: Restoring previous memory policy: 0 delphi-004.us.cray.com ERROR 2020/03/14 08:37:09 daos_io_server:0 EAL: FATAL: rte_service_init() failed Failed to initialize DPDK delphi-004.us.cray.com INFO 2020/03/14 08:37:09 daos_io_server:0 error allocating rte services array EAL: rte_service_init() failed
I think the heart of the problem is, when DPDK reads /proc/self/pagemap, why is it getting a value where the bottom 55 bits are all zero?
Kevan
From:
<daos@daos.groups.io> on behalf of Kevan Rehm <kevan.rehm@...>
One update, the discarded garbage messages occur because MaxMessageSize is set to 4096 which is smaller than the message which daos_admin is trying to return because there are 11 (eventually 12) NVMe devices on this node. I raised the value of MaxMessageSize to 8192 and the “discarded garbage message” problem went away, but the rest of the issues still remain.
Kevan
From:
<daos@daos.groups.io> on behalf of Kevan Rehm <kevan.rehm@...>
Greetings,
I’ve been debugging a daos_io_server startup problem for a couple of days now and have gotten nowhere, so it’s time to call in the experts. Two IO servers are configured per node, one has 5 NVMe devices and one has 6. They both give the following errors:
03/12-08:33:32.33 delphi-004 DAOS[141245/141245] server INFO src/iosrv/init.c:491 server_init() Network successfully initialized 03/12-08:33:32.33 delphi-004 DAOS[141245/141245] server INFO src/iosrv/init.c:500 server_init() Module vos,rdb,rsvc,security,mgmt,pool,cont,dtx,obj,rebuild successfully loaded 03/12-08:33:32.43 delphi-004 DAOS[141245/141287] bio INFO src/bio/bio_xstream.c:961 bio_xsctxt_alloc() Initialize NVMe context, tgt_id:0, init_thread:(nil) 03/12-08:33:32.71 delphi-004 DAOS[141245/141287] bio ERR src/bio/bio_xstream.c:1019 bio_xsctxt_alloc() failed to initialize SPDK env, DER_INVAL(-1003) 03/12-08:33:32.71 delphi-004 DAOS[141245/141287] server ERR src/iosrv/srv.c:452 dss_srv_handler() failed to init spdk context for xstream(2) rc:-1003
The failing function in bio_xsctxt_alloc() is spdk_env_init(), which just returns -1. Looking in /dev/hugepages, I see 40 2 MiB hugepage files owned by root, which I did not expect, because daos_server and the two daos_io_server processes are all running as user ‘daos’. I believe I have the file permissions set correctly:
[root@delphi-004 tmp]# cd ~daos/daos/install/bin [root@delphi-004 bin]# ls -l daos_admin daos_server -rwsr-x---. 1 root daos_grp 5751760 Mar 12 2020 daos_admin -rwxr-sr-x. 1 root daos_grp 16219920 Mar 12 2020 daos_server
The daos_admin process had errors also and exited, and daos_control.log mentions it is discarding garbage responses.
Am I doing something obviously wrong? Attached are the daos_control.log and daos_io_server.log files, together with the daos .yml files.
If you need more info, let me know.
Thanks, Kevan --------------------------------------------------------------------- This e-mail and any attachments may contain confidential material for --------------------------------------------------------------------- This e-mail and any attachments may contain confidential material for --------------------------------------------------------------------- This e-mail and any attachments may contain confidential material for --------------------------------------------------------------------- This e-mail and any attachments may contain confidential material for --------------------------------------------------------------------- This e-mail and any attachments may contain confidential material for --------------------------------------------------------------------- This e-mail and any attachments may contain confidential material for
|
|