Re: dmg pool operation stuck
Hello,
The format is completing and the engine process is being spawned, now we need to look at the engine log which is specified in the server config file (consult the admin guide for more details: https://docs.daos.io/admin/deployment/). Could you try specifying the engine/server specific log file and mask to DEBUG and paste your server config file here please.
Regards, Tom
From: daos@daos.groups.io <daos@daos.groups.io>
On Behalf Of allen.zhuo@...
Hi Tom, The same issue still exists after changing the hugepagesize to 2MB. When dmg pool create, daos_server did not print any message. I think this is abnormal. So, can we add some debugging code? $ cat /proc/meminfo | grep Huge AnonHugePages: 0 kB ShmemHugePages: 0 kB FileHugePages: 0 kB HugePages_Total: 4096 HugePages_Free: 3931 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 2048 kB Hugetlb: 8388608 kB
daos_server@sw2:~/daos$ dmg -i storage scan Hosts SCM Total NVMe Total ----- --------- ---------- sw2 0 B (0 modules) 4.0 TB (1 controller) daos_server@sw2:~/daos$ dmg -i storage format Format Summary: Hosts SCM Devices NVMe Devices ----- ----------- ------------ sw2 1 1 daos_server@sw2:~/daos$ dmg -i pool create -z 100GB Creating DAOS pool with automatic storage allocation: 100 GB NVMe + 6.00% SCM ERROR: dmg: context deadline exceeded
The latest daos_server log: daos_server@sw2:~/daos$ daos_server start -o ~/daos/build/etc/daos_server.yml DAOS Server config loaded from /home/daos/daos/build/etc/daos_server.yml daos_server logging to file /tmp/daos_server.log DEBUG 01:58:17.438639 start.go:89: Switching control log level to DEBUG DEBUG 01:58:17.537242 netdetect.go:279: 2 NUMA nodes detected with 28 cores per node DEBUG 01:58:17.537639 netdetect.go:284: initDeviceScan completed. Depth -6, numObj 27, systemDeviceNames [lo ens4f0 ens5f0 ens5f1 ens4f1 enx0a148ab58408], hwlocDeviceNames [dma0chan0 dma1chan0 dma2chan0 dma3chan0 dma4chan0 dma5chan0 dma6chan0 dma7chan0 enx0a148ab58408 card0 sda nvme1n1 dma8chan0 dma9chan0 dma10chan0 dma11chan0 dma12chan0 dma13chan0 dma14chan0 dma15chan0 nvme2n1 ens4f0 ens4f1 ens5f0 mlx5_0 ens5f1 mlx5_1] DEBUG 01:58:17.537780 netdetect.go:913: Calling ValidateProviderConfig with ens5f0, ofi+verbs;ofi_rxm DEBUG 01:58:17.537805 netdetect.go:964: Input provider string: ofi+verbs;ofi_rxm DEBUG 01:58:17.538059 netdetect.go:995: There are 0 hfi1 devices in the system DEBUG 01:58:17.538100 netdetect.go:572: There are 2 NUMA nodes. DEBUG 01:58:17.538121 netdetect.go:928: Device ens5f0 supports provider: ofi+verbs;ofi_rxm DEBUG 01:58:17.539248 server.go:401: Active config saved to /home/daos/daos/build/etc/.daos_server.active.yml (read-only) DEBUG 01:58:17.539297 server.go:113: fault domain: /sw2 DEBUG 01:58:17.539619 server.go:163: automatic NVMe prepare req: {ForwardableRequest:{Forwarded:false} HugePageCount:4096 DisableCleanHugePages:false PCIWhitelist:0000:98:00.0 PCIBlacklist: TargetUser:daos_server ResetOnly:false DisableVFIO:false DisableVMD:true} DEBUG 01:58:32.790943 netdetect.go:279: 2 NUMA nodes detected with 28 cores per node DEBUG 01:58:32.791323 netdetect.go:284: initDeviceScan completed. Depth -6, numObj 26, systemDeviceNames [lo ens4f0 ens5f0 ens5f1 ens4f1 enx0a148ab58408], hwlocDeviceNames [dma0chan0 dma1chan0 dma2chan0 dma3chan0 dma4chan0 dma5chan0 dma6chan0 dma7chan0 enx0a148ab58408 card0 sda nvme1n1 dma8chan0 dma9chan0 dma10chan0 dma11chan0 dma12chan0 dma13chan0 dma14chan0 dma15chan0 ens4f0 ens4f1 ens5f0 mlx5_0 ens5f1 mlx5_1] DEBUG 01:58:32.791406 netdetect.go:669: Searching for a device alias for: ens5f0 DEBUG 01:58:32.791447 netdetect.go:693: Device alias for ens5f0 is mlx5_0 DEBUG 01:58:32.791495 class.go:209: output bdev conf file set to /mnt/daos/daos_nvme.conf DEBUG 01:58:33.319087 provider.go:217: bdev scan: update cache (1 devices) DAOS Control Server v1.2 (pid 3681) listening on 0.0.0.0:10001 DEBUG 01:58:33.320410 instance_exec.go:35: instance 0: checking if storage is formatted Checking DAOS I/O Engine instance 0 storage ... DEBUG 01:58:33.320503 instance_storage.go:74: /mnt/daos: checking formatting DEBUG 01:58:33.346603 instance_storage.go:90: /mnt/daos (ram) needs format: true SCM format required on instance 0 DEBUG 01:59:04.391268 ctl_storage_rpc.go:368: received StorageScan RPC DEBUG 01:59:04.391386 provider.go:217: bdev scan: reuse cache (1 devices) DEBUG 01:59:04.420740 ctl_storage_rpc.go:387: responding to StorageScan RPC DEBUG 01:59:08.933555 ctl_storage_rpc.go:407: received StorageFormat RPC ; proceeding to instance storage format Formatting scm storage for DAOS I/O Engine instance 0 (reformat: false) DEBUG 01:59:08.933794 instance_storage.go:74: /mnt/daos: checking formatting DEBUG 01:59:08.961338 instance_storage.go:90: /mnt/daos (ram) needs format: true Instance 0: starting format of SCM (ram:/mnt/daos) Instance 0: finished format of SCM (ram:/mnt/daos) Formatting nvme storage for DAOS I/O Engine instance 0 DEBUG 01:59:09.018278 instance_superblock.go:90: /mnt/daos: checking superblock DEBUG 01:59:09.018801 instance_superblock.go:94: /mnt/daos: needs superblock (doesn't exist) Instance 0: starting format of nvme block devices [0000:98:00.0] Instance 0: finished format of nvme block devices [0000:98:00.0] DAOS I/O Engine instance 0 storage ready DEBUG 01:59:13.503527 instance_superblock.go:90: /mnt/daos: checking superblock DEBUG 01:59:13.504009 instance_superblock.go:94: /mnt/daos: needs superblock (doesn't exist) DEBUG 01:59:13.504107 instance_superblock.go:119: idx 0 createSuperblock() DEBUG 01:59:13.504432 instance_superblock.go:149: creating /mnt/daos/superblock: (rank: NilRank, uuid: 8dd7c6e2-8b2e-43b7-b180-f68ed64e8960) DEBUG 01:59:13.504745 instance_exec.go:62: instance start() DEBUG 01:59:13.505003 class.go:241: create /mnt/daos/daos_nvme.conf with [0000:98:00.0] bdevs SCM @ /mnt/daos: 137 GB Total/137 GB Avail DEBUG 01:59:13.505327 instance_exec.go:79: instance 0: awaiting DAOS I/O Engine init DEBUG 01:59:13.506206 exec.go:69: daos_engine:0 args: [-t 8 -x 6 -g daos_server -d /var/run/daos_server -s /mnt/daos -n /mnt/daos/daos_nvme.conf -I 0] DEBUG 01:59:13.506300 exec.go:70: daos_engine:0 env: [CRT_PHY_ADDR_STR=ofi+verbs;ofi_rxm CRT_TIMEOUT=1200 D_LOG_MASK=DEBUG D_LOG_FILE=/tmp/daos_engine.0.log CRT_CTX_SHARE_ADDR=0 OFI_DOMAIN=mlx5_0 VOS_BDEV_CLASS=NVME OFI_INTERFACE=ens5f0 OFI_PORT=20000] Starting I/O Engine instance 0: /home/daos/daos/build/bin/daos_engine daos_engine:0 Using legacy core allocation algorithm daos_engine:0 Starting SPDK v20.01.2 git sha1 b2808069e / DPDK 19.11.6 initialization... [ DPDK EAL parameters: daos --no-shconf -c 0x1 --pci-whitelist=0000:98:00.0 --log-level=lib.eal:6 --log-level=lib.cryptodev:5 --log-level=user1:6 --log-level=lib.eal:4 --base-virtaddr=0x200000000000 --match-allocations --file-prefix=spdk_pid4969 ]
|
|