回复:Re: [daos] dmg pool operation stuck


Allen
 


Hi Tom,
Yes, please check the previous reply.

--------------原始邮件--------------
发件人:"Nabarro, Tom "<tom.nabarro@...>;
发送时间:2021年12月3日(星期五) 晚上9:24
收件人:"daos@daos.groups.io" <daos@daos.groups.io>;
主题:Re: [daos] dmg pool operation stuck
-----------------------------------
.qmbox v:* {} .qmbox o:* {} .qmbox w:* {} .qmbox .shape {} .qmbox !-- @font-face {font-family:"Cambria Math"; panose-1:2 4 5 3 5 4 6 3 2 4;} @font-face {font-family:Calibri; panose-1:2 15 5 2 2 2 4 3 2 4;} /* Style Definitions */ .qmbox p.MsoNormal, .qmbox li.MsoNormal, .qmbox div.MsoNormal {margin:0in; font-size:11.0pt; font-family:"Calibri",sans-serif;} .qmbox a:link, .qmbox span.MsoHyperlink {mso-style-priority:99; color:blue; text-decoration:underline;} .qmbox span.EmailStyle18 {mso-style-type:personal-reply; font-family:"Calibri",sans-serif; color:windowtext;} .qmbox .MsoChpDefault {mso-style-type:export-only; font-size:10.0pt; font-family:"Calibri",sans-serif; mso-fareast-language:EN-US;} @page WordSection1 {size:8.5in 11.0in; margin:1.0in 1.0in 1.0in 1.0in;} .qmbox div.WordSection1 {page:WordSection1;}

Did you manage to get the engine log with DD_MASK=all, that will give us more information about why the engine is not completing start-up (and why you don’t have any joined ranks reported by "dmg system query").

The load blobstore failed message is expected. It just means they need to be created.

 

Also can you please confirm that you have tried with the adjusted settings as per previous e-mail:

"- set engines->targets to 4 and engines->nr_xs_helpers to 0"

 

Regards,

Tom

 

From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of allen.zhuo@...
Sent: Friday, December 3, 2021 1:17 PM
To: daos@daos.groups.io
Subject: Re: [daos] dmg pool operation stuck

 

Hi Tom,
I noticed an error in the engine log.
DAOS[11610/11614] bio  DBUG src/bio/bio_xstream.c:662 load_blobstore() load blobstore failed -1025
Is it because of this? And what does "-1025" mean?

Some parameters of calling spdk_bs_load are as follows:
bs_dev->blocklen = 512
bs_dev->blockcnt = 7814037168
bs_opts.max_md_ops = 32
bs_opts.max_channel_ops = 4096
bs_opts.cluster_sz = 1073741824

And the memory information of the server is as follows:

daos_debug@sw2:~$ free -h

              total        used        free      shared  buff/cache   available

Mem:          503Gi        11Gi       490Gi       131Mi       1.7Gi       489Gi

Swap:         8.0Gi          0B       8.0Gi

daos_debug@sw2:~$ numastat -mc | egrep "Node|Huge"

Token Node not in hash table.

Token Node not in hash table.

Token Node not in hash table.

Token Node not in hash table.

Token Node not in hash table.

Token Node not in hash table.

Token Node not in hash table.

Token Node not in hash table.

Token Node not in hash table.

Token Node not in hash table.

                 Node 0 Node 1  Total

AnonHugePages         0      0      0

HugePages_Total    4096   4096   8192

HugePages_Free     3766   4096   7862

HugePages_Surp        0      0      0