Re: dmg pool operation stuck


Allen
 

Hi, 

The total memory of my server is 512GB, and the hugepagesize is 1GB.
$ free -h
              total        used        free      shared  buff/cache   available
Mem:          503Gi       130Gi       372Gi       132Mi       1.0Gi       370Gi
Swap:         8.0Gi          0B       8.0Gi
$ cat /proc/meminfo | grep Huge
AnonHugePages:         0 kB
ShmemHugePages:        0 kB
FileHugePages:         0 kB
HugePages_Total:     128
HugePages_Free:      124
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:    1048576 kB
Hugetlb:        134217728 kB
 
When I set nr_hugepages: 4096 and targets: 8, the following error will be printed when daos_server starts:
$ daos_server start -o ~/daos/build/etc/daos_server.yml
DAOS Server config loaded from /home/daos/daos/build/etc/daos_server.yml
daos_server logging to file /tmp/daos_server.log
DEBUG 01:52:49.731469 start.go:89: Switching control log level to DEBUG
DEBUG 01:52:49.831569 netdetect.go:279: 2 NUMA nodes detected with 28 cores per node
DEBUG 01:52:49.831823 netdetect.go:284: initDeviceScan completed.  Depth -6, numObj 27, systemDeviceNames [lo ens4f0 ens5f0 ens5f1 ens4f1 enx0a148ab58408], hwlocDeviceNames [dma0chan0 dma1chan0 dma2chan0 dma3chan0 dma4chan0 dma5chan0 dma6chan0 dma7chan0 enx0a148ab58408 card0 sda nvme1n1 dma8chan0 dma9chan0 dma10chan0 dma11chan0 dma12chan0 dma13chan0 dma14chan0 dma15chan0 nvme2n1 ens4f0 ens4f1 ens5f0 mlx5_0 ens5f1 mlx5_1]
DEBUG 01:52:49.831859 netdetect.go:913: Calling ValidateProviderConfig with ens5f0, ofi+verbs;ofi_rxm
DEBUG 01:52:49.831876 netdetect.go:964: Input provider string: ofi+verbs;ofi_rxm
DEBUG 01:52:49.832098 netdetect.go:995: There are 0 hfi1 devices in the system
DEBUG 01:52:49.832132 netdetect.go:572: There are 2 NUMA nodes.
DEBUG 01:52:49.832155 netdetect.go:928: Device ens5f0 supports provider: ofi+verbs;ofi_rxm
DEBUG 01:52:49.833024 server.go:401: Active config saved to /home/daos/daos/build/etc/.daos_server.active.yml (read-only)
DEBUG 01:52:49.833067 server.go:113: fault domain: /sw2
DEBUG 01:52:49.833306 server.go:163: automatic NVMe prepare req: {ForwardableRequest:{Forwarded:false} HugePageCount:4096 DisableCleanHugePages:false PCIWhitelist:0000:98:00.0 PCIBlacklist: TargetUser:daos_server ResetOnly:false DisableVFIO:false DisableVMD:true}
DEBUG 01:53:05.988214 main.go:70: server: code = 610 description = "requested 4096 hugepages; got 494"
ERROR: server: code = 610 description = "requested 4096 hugepages; got 494"
ERROR: server: code = 610 resolution = "reboot the system or manually clear /dev/hugepages as appropriate"
 
It looks like Daos wants to alloc nr_hugepages * hugepagesize memory.
$ numastat -mc | egrep "Node|Huge"
Token Node not in hash table.
Token Node not in hash table.
Token Node not in hash table.
Token Node not in hash table.
Token Node not in hash table.
Token Node not in hash table.
Token Node not in hash table.
Token Node not in hash table.
Token Node not in hash table.
Token Node not in hash table.
                 Node 0 Node 1  Total
AnonHugePages         0      0      0
HugePages_Total  253952 254976 508928
HugePages_Free   250880 254976 505856
HugePages_Surp        0      0      0
 


But spdk setup.sh will not encounter such problems.

daos_server@sw2:~/daos/build/prereq/release/spdk/share/spdk/scripts$ sudo HUGEMEM=4096 ./setup.sh
0000:31:00.0 (8086 0a55): nvme -> vfio-pci
0000:4c:00.0 (8086 0a55): nvme -> vfio-pci
0000:98:00.0 (8086 0a55): nvme -> vfio-pci
0000:00:01.0 (8086 0b00): ioatdma -> vfio-pci
0000:00:01.1 (8086 0b00): ioatdma -> vfio-pci
0000:00:01.2 (8086 0b00): ioatdma -> vfio-pci
0000:00:01.3 (8086 0b00): ioatdma -> vfio-pci
0000:00:01.4 (8086 0b00): ioatdma -> vfio-pci
0000:00:01.5 (8086 0b00): ioatdma -> vfio-pci
0000:00:01.6 (8086 0b00): ioatdma -> vfio-pci
0000:00:01.7 (8086 0b00): ioatdma -> vfio-pci
0000:80:01.0 (8086 0b00): ioatdma -> vfio-pci
0000:80:01.1 (8086 0b00): ioatdma -> vfio-pci
0000:80:01.2 (8086 0b00): ioatdma -> vfio-pci
0000:80:01.3 (8086 0b00): ioatdma -> vfio-pci
0000:80:01.4 (8086 0b00): ioatdma -> vfio-pci
0000:80:01.5 (8086 0b00): ioatdma -> vfio-pci
0000:80:01.6 (8086 0b00): ioatdma -> vfio-pci
0000:80:01.7 (8086 0b00): ioatdma -> vfio-pci
daos_server@sw2:~/daos/build/prereq/release/spdk/share/spdk/scripts$ numastat -mc | egrep "Node|Huge"
Token Node not in hash table.
Token Node not in hash table.
Token Node not in hash table.
Token Node not in hash table.
Token Node not in hash table.
Token Node not in hash table.
Token Node not in hash table.
Token Node not in hash table.
Token Node not in hash table.
Token Node not in hash table.
                 Node 0 Node 1  Total
AnonHugePages         0      0      0
HugePages_Total    2048   2048   4096
HugePages_Free     2048   2048   4096
HugePages_Surp        0      0      0
 

Join daos@daos.groups.io to automatically receive all group messages.