Re: dmg pool operation stuck


Nabarro, Tom
 

The failure seems to be happening when creating the blobstore but just to reduce variables, can you try with the following env_vars:

  env_vars:

  - CRT_TIMEOUT=300

  - CRT_CREDIT_EP_CTX=0

 

And simplify by running with

provider: ofi+sockets

 

Regards,

Tom

 

From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of allen.zhuo@...
Sent: Friday, December 3, 2021 1:17 PM
To: daos@daos.groups.io
Subject: Re: [daos] dmg pool operation stuck

 

Hi Tom,
I noticed an error in the engine log.
DAOS[11610/11614] bio  DBUG src/bio/bio_xstream.c:662 load_blobstore() load blobstore failed -1025
Is it because of this? And what does "-1025" mean?

Some parameters of calling spdk_bs_load are as follows:
bs_dev->blocklen = 512
bs_dev->blockcnt = 7814037168
bs_opts.max_md_ops = 32
bs_opts.max_channel_ops = 4096 
bs_opts.cluster_sz = 1073741824

And the memory information of the server is as follows:

daos_debug@sw2:~$ free -h

              total        used        free      shared  buff/cache   available

Mem:          503Gi        11Gi       490Gi       131Mi       1.7Gi       489Gi

Swap:         8.0Gi          0B       8.0Gi

daos_debug@sw2:~$ numastat -mc | egrep "Node|Huge"

Token Node not in hash table.

Token Node not in hash table.

Token Node not in hash table.

Token Node not in hash table.

Token Node not in hash table.

Token Node not in hash table.

Token Node not in hash table.

Token Node not in hash table.

Token Node not in hash table.

Token Node not in hash table.

                 Node 0 Node 1  Total

AnonHugePages         0      0      0

HugePages_Total    4096   4096   8192

HugePages_Free     3766   4096   7862

HugePages_Surp        0      0      0

Join daos@daos.groups.io to automatically receive all group messages.