[External] Re: [daos] failed to create pool: -1023 #chat


Shengyu SY19 Zhang
 

Hello,

 

Thank you for your help, since the ompi/pmix is depends projects of daos, I didn’t touch them, just follow quick-start in github, to build and run it.

I enter the path _build.external/ompi and pmix,  and then make && make install, all finished successfully.

For report-uri, since I’m running daos server in the /root/daos, I can see the uri file was created and can see the content, therefor just specify full path for the client, seems connection between client and server is OK (if I specify wrong, or nic not started, there is another error).

And also if I run: daosctl create-pool testpool, will get the same issue.

So I’m wondering where the problem is.

There are some information in the logs:

PMIx_Lookup group daos_server failed, rc: -46, value.type 0.
crt_pmix_attach group daos_server failed, rc: -1023.

 

 

Best Regards

From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of xuezhao.liu@...
Sent: Thursday, June 20, 2019 4:08 PM
To: daos@daos.groups.io
Subject: [External] Re: [daos] failed to create pool: -1023 #chat

 

Hi,
-1023 is DER_PMIX, that commonly due to your ompi/pmix building/usage is incorrect.
You may check "which orterun" to see if the orterun path is the one built by DAOS package.
and for server loading cmd line, at least you need "--report-uri /root/daos/uuu" to allow the dmg client can hook with it (by --ompi-server option).


Lombardi, Johann
 

Hi,

 

First of all, please note that we are in the process of implementing our own wire-up protocol that will eventually allow us to start the DAOS server on each storage node independently. At that point, we won’t require opmi/pmix any longer and will be able to start the DAOS servers via systemd, kubernetes or any parallel launchers (e.g. pdsh, …). This feature will be available this summer.

 

Meanwhile, it would be great to see the output of the orterun … daos_server" command. I suspect that the backend storage hasn’t been formatted and the data plane not started yet. Could you please file a ticket on https://jira.hpdd.intel.com and attach the daos_server output? Thanks.

 

Cheers,

Johann

 

From: <daos@daos.groups.io> on behalf of Shengyu SY19 Zhang <zhangsy19@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Thursday 20 June 2019 at 10:35
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: Re: [External] Re: [daos] failed to create pool: -1023 #chat

 

Hello,

 

Thank you for your help, since the ompi/pmix is depends projects of daos, I didn’t touch them, just follow quick-start in github, to build and run it.

I enter the path _build.external/ompi and pmix,  and then make && make install, all finished successfully.

For report-uri, since I’m running daos server in the /root/daos, I can see the uri file was created and can see the content, therefor just specify full path for the client, seems connection between client and server is OK (if I specify wrong, or nic not started, there is another error).

And also if I run: daosctl create-pool testpool, will get the same issue.

So I’m wondering where the problem is.

There are some information in the logs:

PMIx_Lookup group daos_server failed, rc: -46, value.type 0.
crt_pmix_attach group daos_server failed, rc: -1023.

 

 

Best Regards

From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of xuezhao.liu@...
Sent: Thursday, June 20, 2019 4:08 PM
To: daos@daos.groups.io
Subject: [External] Re: [daos] failed to create pool: -1023 #chat

 

Hi,
-1023 is DER_PMIX, that commonly due to your ompi/pmix building/usage is incorrect.
You may check "which orterun" to see if the orterun path is the one built by DAOS package.
and for server loading cmd line, at least you need "--report-uri /root/daos/uuu" to allow the dmg client can hook with it (by --ompi-server option).

---------------------------------------------------------------------
Intel Corporation SAS (French simplified joint stock company)
Registered headquarters: "Les Montalets"- 2, rue de Paris,
92196 Meudon Cedex, France
Registration Number:  302 456 199 R.C.S. NANTERRE
Capital: 4,572,000 Euros

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


Shengyu SY19 Zhang
 

Hello Johann,

 

Since there is no nvdimm on my system, I’m using tempfs mountted to /mnt/daos as described in the document, and nvme is leaving unformatted, it is using via SPDK.

I can’t post file on the jira, I haven’t got a portal to register on it.

Here is the outpus of the server side :

 

2019/06/20 18:18:12 config.go:108: debug: DAOS config read from /usr/local/etc/daos_server.yml

2019/06/20 18:18:12 config.go:144: debug: Active config saved to /usr/local/etc/.daos_server.active.yml (read-only)

2019/06/20 18:18:12 config.go:353: debug: Switching control log level to DEBUG

2019/06/20 18:18:12 config.go:368: debug: daos_server logging to file /tmp/daos_control.log

Starting SPDK v18.07-pre / DPDK 18.02.0 initialization...

[ DPDK EAL parameters: spdk -c 0x1 --file-prefix=spdk1929786431 --base-virtaddr=0x200000000000 --proc-type=auto ]

EAL: Detected 20 lcore(s)

EAL: Auto-detected process type: PRIMARY

EAL: No free hugepages reported in hugepages-1048576kB

EAL: Multi-process socket /var/run/.spdk1929786431_unix

EAL: Probing VFIO support...

EAL: VFIO support initialized

EAL: PCI device 0000:03:00.0 on NUMA socket 0

EAL:   probe driver: 8086:953 spdk_nvme

EAL:   using IOMMU type 1 (Type 1)

EAL: PCI device 0000:05:00.0 on NUMA socket 0

EAL:   probe driver: 8086:953 spdk_nvme

no NVDIMMs found!

waiting for storage format on server 0

 

From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of Lombardi, Johann
Sent: Thursday, June 20, 2019 6:01 PM
To: daos@daos.groups.io
Subject: Re: [External] Re: [daos] failed to create pool: -1023 #chat

 

Hi,

 

First of all, please note that we are in the process of implementing our own wire-up protocol that will eventually allow us to start the DAOS server on each storage node independently. At that point, we won’t require opmi/pmix any longer and will be able to start the DAOS servers via systemd, kubernetes or any parallel launchers (e.g. pdsh, …). This feature will be available this summer.

 

Meanwhile, it would be great to see the output of the “orterun … daos_server" command. I suspect that the backend storage hasn’t been formatted and the data plane not started yet. Could you please file a ticket on https://jira.hpdd.intel.com and attach the daos_server output? Thanks.

 

Cheers,

Johann

 

From: <daos@daos.groups.io> on behalf of Shengyu SY19 Zhang <zhangsy19@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Thursday 20 June 2019 at 10:35
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: Re: [External] Re: [daos] failed to create pool: -1023 #chat

 

Hello,

 

Thank you for your help, since the ompi/pmix is depends projects of daos, I didnt touch them, just follow quick-start in github, to build and run it.

I enter the path _build.external/ompi and pmix,  and then make && make install, all finished successfully.

For report-uri, since Im running daos server in the /root/daos, I can see the uri file was created and can see the content, therefor just specify full path for the client, seems connection between client and server is OK (if I specify wrong, or nic not started, there is another error).

And also if I run: daosctl create-pool testpool, will get the same issue.

So Im wondering where the problem is.

There are some information in the logs:

PMIx_Lookup group daos_server failed, rc: -46, value.type 0.
crt_pmix_attach group daos_server failed, rc: -1023.

 

 

Best Regards

From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of xuezhao.liu@...
Sent: Thursday, June 20, 2019 4:08 PM
To: daos@daos.groups.io
Subject: [External] Re: [daos] failed to create pool: -1023 #chat

 

Hi,
-1023 is DER_PMIX, that commonly due to your ompi/pmix building/usage is incorrect.
You may check "which orterun" to see if the orterun path is the one built by DAOS package.
and for server loading cmd line, at least you need "--report-uri /root/daos/uuu" to allow the dmg client can hook with it (by --ompi-server option).

---------------------------------------------------------------------
Intel Corporation SAS (French simplified joint stock company)
Registered headquarters: "Les Montalets"- 2, rue de Paris,
92196 Meudon Cedex, France
Registration Number:  302 456 199 R.C.S. NANTERRE
Capital: 4,572,000 Euros

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


Liu, Xuezhao
 

looks like your daos_server was not started successfully.
you may check the details in the config file /usr/local/etc/daos_server.yml, try to change some setting to see if it can work, for example can test to comment out (add "#" to start of the line) all the "bdev_" started options.
If still cannot work, you may post your daos_server.yml and the daos log (path configured by "log_file" option, can set "log_mask: DEBUG") to jira ticket or here if jira does not work for you.


Lombardi, Johann
 

Right, the storage engine isn’t started since the backend storage hasn’t been formatted:

waiting for storage format on server 0

 

Even with no NVDIMMs in the system, we still need to wipe out the SSDs so that all blocks are marked as not allocated (i.e. for wear leveling). The following command should allow you to format & start the engine:

$ daos_shell storage format

 

As suggested by Xuezhao, you should create your own daos_server.yml with the list of SSDs you want to use.

 

To list the SSDs available on the system, you can run the following commands:

$ daos_server storage prep-nvme

2019/06/20 19:06:24 storage_nvme.go:96: debug: spdk setup with _NRHUGE=1024

2019/06/20 19:06:24 storage_nvme.go:100: debug: spdk setup with _TARGET_USER=root

$ daos_server storage scan

[…]

NVMe:

- model: 'INTEL SSDPED1K375GA '

  serial: 'PHKS7335009W375AGN  '

  pciaddr: 0000:87:00.0

  fwrev: E2010324

  namespaces:

  - id: 1

    capacity: 375

- model: 'INTEL SSDPEDMD016T4 '

  serial: 'CVFT5226001C1P6DGN  '

  pciaddr: 0000:da:00.0

  fwrev: 8DV10171

  namespaces:

  - id: 1

    capacity: 1600

- model: 'INTEL SSDPEDMD016T4 '

  serial: 'CVFT5506004Z1P6DGN  '

  pciaddr: 0000:81:00.0

  fwrev: 8DV10171

  namespaces:

  - id: 1

    capacity: 1600

 

And then populate the yaml file with the devices that you want to use, for instance:

  bdev_class: nvme

  bdev_list: ["0000:81:00.0", "0000:da:00.0"]

 

We are working on automatic storage configuration with CPU affinity detection, but this feature isn’t available yet.

 

HTH

 

Cheers,

Johann

 

From: <daos@daos.groups.io> on behalf of "xuezhao.liu@..." <xuezhao.liu@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Thursday 20 June 2019 at 13:24
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: Re: [External] Re: [daos] failed to create pool: -1023 #chat

 

looks like your daos_server was not started successfully.
you may check the details in the config file /usr/local/etc/daos_server.yml, try to change some setting to see if it can work, for example can test to comment out (add "#" to start of the line) all the "bdev_" started options.
If still cannot work, you may post your daos_server.yml and the daos log (path configured by "log_file" option, can set "log_mask: DEBUG") to jira ticket or here if jira does not work for you.

 

---------------------------------------------------------------------
Intel Corporation SAS (French simplified joint stock company)
Registered headquarters: "Les Montalets"- 2, rue de Paris,
92196 Meudon Cedex, France
Registration Number:  302 456 199 R.C.S. NANTERRE
Capital: 4,572,000 Euros

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


Shengyu SY19 Zhang
 

Hello,

 

I got first issue resolved after run: $ daos_shell storage format

I think you could add this step into the quick start document.

Yes I have already created daos_server.yml, from the one at install/etc/daos_server.yml.

Now the storage server seems formatted, but there is new issue happen :

When I run dmg create, it encounter another issue :

failed to create pool: -1007
-------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
orterun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

  Process name: [[27363,1],0]
  Exit code:    1

 

 

If I execute: daos_shell pool create –s 1G

It says:

2019/06/21 17:29:19 config.go:122: debug: DAOS Client config read from /usr/local/etc/daos.yml

Active connections: [localhost:10001]

 

Creating DAOS pool with 1GB SCM and 0B NvMe storage (1.000 ratio)

parsing rank list: element 0: strconv.ParseUint: parsing "": invalid syntax

 

I’m trying to resolve by my ways, any hints will be appreciated.

 

Regards,

Shengyu

From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of Lombardi, Johann
Sent: Friday, June 21, 2019 3:14 AM
To: daos@daos.groups.io
Subject: Re: [External] Re: [daos] failed to create pool: -1023 #chat

 

Right, the storage engine isn’t started since the backend storage hasn’t been formatted:

waiting for storage format on server 0

 

Even with no NVDIMMs in the system, we still need to wipe out the SSDs so that all blocks are marked as not allocated (i.e. for wear leveling). The following command should allow you to format & start the engine:

$ daos_shell storage format

 

As suggested by Xuezhao, you should create your own daos_server.yml with the list of SSDs you want to use.

 

To list the SSDs available on the system, you can run the following commands:

$ daos_server storage prep-nvme

2019/06/20 19:06:24 storage_nvme.go:96: debug: spdk setup with _NRHUGE=1024

2019/06/20 19:06:24 storage_nvme.go:100: debug: spdk setup with _TARGET_USER=root

$ daos_server storage scan

[…]

NVMe:

- model: 'INTEL SSDPED1K375GA '

  serial: 'PHKS7335009W375AGN  '

  pciaddr: 0000:87:00.0

  fwrev: E2010324

  namespaces:

  - id: 1

    capacity: 375

- model: 'INTEL SSDPEDMD016T4 '

  serial: 'CVFT5226001C1P6DGN  '

  pciaddr: 0000:da:00.0

  fwrev: 8DV10171

  namespaces:

  - id: 1

    capacity: 1600

- model: 'INTEL SSDPEDMD016T4 '

  serial: 'CVFT5506004Z1P6DGN  '

  pciaddr: 0000:81:00.0

  fwrev: 8DV10171

  namespaces:

  - id: 1

    capacity: 1600

 

And then populate the yaml file with the devices that you want to use, for instance:

  bdev_class: nvme

  bdev_list: ["0000:81:00.0", "0000:da:00.0"]

 

We are working on automatic storage configuration with CPU affinity detection, but this feature isn’t available yet.

 

HTH

 

Cheers,

Johann

 

From: <daos@daos.groups.io> on behalf of "xuezhao.liu@..." <xuezhao.liu@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Thursday 20 June 2019 at 13:24
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: Re: [External] Re: [daos] failed to create pool: -1023 #chat

 

looks like your daos_server was not started successfully.
you may check the details in the config file /usr/local/etc/daos_server.yml, try to change some setting to see if it can work, for example can test to comment out (add "#" to start of the line) all the "bdev_" started options.
If still cannot work, you may post your daos_server.yml and the daos log (path configured by "log_file" option, can set "log_mask: DEBUG") to jira ticket or here if jira does not work for you.

 

---------------------------------------------------------------------
Intel Corporation SAS (French simplified joint stock company)
Registered headquarters: "Les Montalets"- 2, rue de Paris,
92196 Meudon Cedex, France
Registration Number:  302 456 199 R.C.S. NANTERRE
Capital: 4,572,000 Euros

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


Lombardi, Johann
 

Hi Shengyu,

 

We are about to retire the quick start document in favor of the admin guide that has been integrated into the source code (https://github.com/daos-stack/daos/tree/master/doc/admin)

The documentation for format was actually landed this morning: https://github.com/daos-stack/daos/blob/master/doc/admin/deployment.md#basic-workflow

 

As for the -1007 error, it means that you don’t have enough space available to allocate the pool (https://github.com/daos-stack/daos/blob/master/doc/admin/troubleshooting.md#daos-errors).

How much space have you allocated with tmpfs under /mnt/daos?

 

Cheers,

Johann

 

 

From: <daos@daos.groups.io> on behalf of Shengyu SY19 Zhang <zhangsy19@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Friday 21 June 2019 at 11:35
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: Re: [External] Re: [daos] failed to create pool: -1023 #chat

 

Hello,

 

I got first issue resolved after run: $ daos_shell storage format

I think you could add this step into the quick start document.

Yes I have already created daos_server.yml, from the one at install/etc/daos_server.yml.

Now the storage server seems formatted, but there is new issue happen :

When I run dmg create, it encounter another issue :

failed to create pool: -1007
-------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
orterun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

  Process name: [[27363,1],0]
  Exit code:    1

 

 

If I execute: daos_shell pool create –s 1G

It says:

2019/06/21 17:29:19 config.go:122: debug: DAOS Client config read from /usr/local/etc/daos.yml

Active connections: [localhost:10001]

 

Creating DAOS pool with 1GB SCM and 0B NvMe storage (1.000 ratio)

parsing rank list: element 0: strconv.ParseUint: parsing "": invalid syntax

 

I’m trying to resolve by my ways, any hints will be appreciated.

 

Regards,

Shengyu

 

---------------------------------------------------------------------
Intel Corporation SAS (French simplified joint stock company)
Registered headquarters: "Les Montalets"- 2, rue de Paris,
92196 Meudon Cedex, France
Registration Number:  302 456 199 R.C.S. NANTERRE
Capital: 4,572,000 Euros

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


Nabarro, Tom
 

I’m afraid the patch for this has not landed yet (regarding the handling of the request), it’s going through a round of reviews, https://github.com/daos-stack/daos/pull/637 .

 

Please feel free to experiment with the patch, as it should work, otherwise please use the "dmg" tool to create pools in the interim

 

Regards,

Tom Nabarro – DCG/ESAD

M: +44 (0)7786 260986

Skype: tom.nabarro

 

From: daos@daos.groups.io [mailto:daos@daos.groups.io] On Behalf Of Shengyu SY19 Zhang
Sent: Friday, June 21, 2019 10:35 AM
To: daos@daos.groups.io
Subject: Re: [External] Re: [daos] failed to create pool: -1023 #chat

 

Hello,

 

I got first issue resolved after run: $ daos_shell storage format

I think you could add this step into the quick start document.

Yes I have already created daos_server.yml, from the one at install/etc/daos_server.yml.

Now the storage server seems formatted, but there is new issue happen :

When I run dmg create, it encounter another issue :

failed to create pool: -1007
-------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
orterun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

  Process name: [[27363,1],0]
  Exit code:    1

 

 

If I execute: daos_shell pool create –s 1G

It says:

2019/06/21 17:29:19 config.go:122: debug: DAOS Client config read from /usr/local/etc/daos.yml

Active connections: [localhost:10001]

 

Creating DAOS pool with 1GB SCM and 0B NvMe storage (1.000 ratio)

parsing rank list: element 0: strconv.ParseUint: parsing "": invalid syntax

 

I’m trying to resolve by my ways, any hints will be appreciated.

 

Regards,

Shengyu

From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of Lombardi, Johann
Sent: Friday, June 21, 2019 3:14 AM
To: daos@daos.groups.io
Subject: Re: [External] Re: [daos] failed to create pool: -1023 #chat

 

Right, the storage engine isn’t started since the backend storage hasn’t been formatted:

waiting for storage format on server 0

 

Even with no NVDIMMs in the system, we still need to wipe out the SSDs so that all blocks are marked as not allocated (i.e. for wear leveling). The following command should allow you to format & start the engine:

$ daos_shell storage format

 

As suggested by Xuezhao, you should create your own daos_server.yml with the list of SSDs you want to use.

 

To list the SSDs available on the system, you can run the following commands:

$ daos_server storage prep-nvme

2019/06/20 19:06:24 storage_nvme.go:96: debug: spdk setup with _NRHUGE=1024

2019/06/20 19:06:24 storage_nvme.go:100: debug: spdk setup with _TARGET_USER=root

$ daos_server storage scan

[…]

NVMe:

- model: 'INTEL SSDPED1K375GA '

  serial: 'PHKS7335009W375AGN  '

  pciaddr: 0000:87:00.0

  fwrev: E2010324

  namespaces:

  - id: 1

    capacity: 375

- model: 'INTEL SSDPEDMD016T4 '

  serial: 'CVFT5226001C1P6DGN  '

  pciaddr: 0000:da:00.0

  fwrev: 8DV10171

  namespaces:

  - id: 1

    capacity: 1600

- model: 'INTEL SSDPEDMD016T4 '

  serial: 'CVFT5506004Z1P6DGN  '

  pciaddr: 0000:81:00.0

  fwrev: 8DV10171

  namespaces:

  - id: 1

    capacity: 1600

 

And then populate the yaml file with the devices that you want to use, for instance:

  bdev_class: nvme

  bdev_list: ["0000:81:00.0", "0000:da:00.0"]

 

We are working on automatic storage configuration with CPU affinity detection, but this feature isn’t available yet.

 

HTH

 

Cheers,

Johann

 

From: <daos@daos.groups.io> on behalf of "xuezhao.liu@..." <xuezhao.liu@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Thursday 20 June 2019 at 13:24
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: Re: [External] Re: [daos] failed to create pool: -1023 #chat

 

looks like your daos_server was not started successfully.
you may check the details in the config file /usr/local/etc/daos_server.yml, try to change some setting to see if it can work, for example can test to comment out (add "#" to start of the line) all the "bdev_" started options.
If still cannot work, you may post your daos_server.yml and the daos log (path configured by "log_file" option, can set "log_mask: DEBUG") to jira ticket or here if jira does not work for you.

 

---------------------------------------------------------------------
Intel Corporation SAS (French simplified joint stock company)
Registered headquarters: "Les Montalets"- 2, rue de Paris,
92196 Meudon Cedex, France
Registration Number:  302 456 199 R.C.S. NANTERRE
Capital: 4,572,000 Euros

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

---------------------------------------------------------------------
Intel Corporation (UK) Limited
Registered No. 1134945 (England)
Registered Office: Pipers Way, Swindon SN3 1RJ
VAT No: 860 2173 47

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


Shengyu SY19 Zhang
 

Hello Johann,

 

Great.

For /mnt/daos, its space should be sufficient, here is the outputs of mount:

tmpfs on /mnt/daos type tmpfs (rw,nosuid,nodev,noexec,noatime,seclabel,size=6291456k)

However I can see it was already used 88% of its space, then I remount a larger one (20G), now I’m able to create storage pool.

 

Regards,

Shengyu.

 

From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of Lombardi, Johann
Sent: Friday, June 21, 2019 8:28 PM
To: daos@daos.groups.io
Subject: Re: [External] Re: [daos] failed to create pool: -1023 #chat

 

Hi Shengyu,

 

We are about to retire the quick start document in favor of the admin guide that has been integrated into the source code (https://github.com/daos-stack/daos/tree/master/doc/admin)

The documentation for format was actually landed this morning: https://github.com/daos-stack/daos/blob/master/doc/admin/deployment.md#basic-workflow

 

As for the -1007 error, it means that you don’t have enough space available to allocate the pool (https://github.com/daos-stack/daos/blob/master/doc/admin/troubleshooting.md#daos-errors).

How much space have you allocated with tmpfs under /mnt/daos?

 

Cheers,

Johann

 

 

From: <daos@daos.groups.io> on behalf of Shengyu SY19 Zhang <zhangsy19@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Friday 21 June 2019 at 11:35
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: Re: [External] Re: [daos] failed to create pool: -1023 #chat

 

Hello,

 

I got first issue resolved after run: $ daos_shell storage format

I think you could add this step into the quick start document.

Yes I have already created daos_server.yml, from the one at install/etc/daos_server.yml.

Now the storage server seems formatted, but there is new issue happen :

When I run dmg create, it encounter another issue :

failed to create pool: -1007
-------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
orterun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

  Process name: [[27363,1],0]
  Exit code:    1

 

 

If I execute: daos_shell pool create –s 1G

It says:

2019/06/21 17:29:19 config.go:122: debug: DAOS Client config read from /usr/local/etc/daos.yml

Active connections: [localhost:10001]

 

Creating DAOS pool with 1GB SCM and 0B NvMe storage (1.000 ratio)

parsing rank list: element 0: strconv.ParseUint: parsing "": invalid syntax

 

I’m trying to resolve by my ways, any hints will be appreciated.

 

Regards,

Shengyu

 

---------------------------------------------------------------------
Intel Corporation SAS (French simplified joint stock company)
Registered headquarters: "Les Montalets"- 2, rue de Paris,
92196 Meudon Cedex, France
Registration Number:  302 456 199 R.C.S. NANTERRE
Capital: 4,572,000 Euros

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


Shengyu SY19 Zhang
 

Hello Tom,

 

Thank you for the infor.

Now I’m able to create storage pool, via dmg tool, I’ll try the patch later time when I need.

 

Regards,

Shengyu.

From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of Nabarro, Tom
Sent: Friday, June 21, 2019 9:06 PM
To: daos@daos.groups.io
Subject: Re: [External] Re: [daos] failed to create pool: -1023 #chat

 

I’m afraid the patch for this has not landed yet (regarding the handling of the request), it’s going through a round of reviews, https://github.com/daos-stack/daos/pull/637 .

 

Please feel free to experiment with the patch, as it should work, otherwise please use the "dmg" tool to create pools in the interim

 

Regards,

Tom Nabarro – DCG/ESAD

M: +44 (0)7786 260986

Skype: tom.nabarro

 

From: daos@daos.groups.io [mailto:daos@daos.groups.io] On Behalf Of Shengyu SY19 Zhang
Sent: Friday, June 21, 2019 10:35 AM
To: daos@daos.groups.io
Subject: Re: [External] Re: [daos] failed to create pool: -1023 #chat

 

Hello,

 

I got first issue resolved after run: $ daos_shell storage format

I think you could add this step into the quick start document.

Yes I have already created daos_server.yml, from the one at install/etc/daos_server.yml.

Now the storage server seems formatted, but there is new issue happen :

When I run dmg create, it encounter another issue :

failed to create pool: -1007
-------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
orterun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

  Process name: [[27363,1],0]
  Exit code:    1

 

 

If I execute: daos_shell pool create –s 1G

It says:

2019/06/21 17:29:19 config.go:122: debug: DAOS Client config read from /usr/local/etc/daos.yml

Active connections: [localhost:10001]

 

Creating DAOS pool with 1GB SCM and 0B NvMe storage (1.000 ratio)

parsing rank list: element 0: strconv.ParseUint: parsing "": invalid syntax

 

I’m trying to resolve by my ways, any hints will be appreciated.

 

Regards,

Shengyu

From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of Lombardi, Johann
Sent: Friday, June 21, 2019 3:14 AM
To: daos@daos.groups.io
Subject: Re: [External] Re: [daos] failed to create pool: -1023 #chat

 

Right, the storage engine isn’t started since the backend storage hasn’t been formatted:

waiting for storage format on server 0

 

Even with no NVDIMMs in the system, we still need to wipe out the SSDs so that all blocks are marked as not allocated (i.e. for wear leveling). The following command should allow you to format & start the engine:

$ daos_shell storage format

 

As suggested by Xuezhao, you should create your own daos_server.yml with the list of SSDs you want to use.

 

To list the SSDs available on the system, you can run the following commands:

$ daos_server storage prep-nvme

2019/06/20 19:06:24 storage_nvme.go:96: debug: spdk setup with _NRHUGE=1024

2019/06/20 19:06:24 storage_nvme.go:100: debug: spdk setup with _TARGET_USER=root

$ daos_server storage scan

[…]

NVMe:

- model: 'INTEL SSDPED1K375GA '

  serial: 'PHKS7335009W375AGN  '

  pciaddr: 0000:87:00.0

  fwrev: E2010324

  namespaces:

  - id: 1

    capacity: 375

- model: 'INTEL SSDPEDMD016T4 '

  serial: 'CVFT5226001C1P6DGN  '

  pciaddr: 0000:da:00.0

  fwrev: 8DV10171

  namespaces:

  - id: 1

    capacity: 1600

- model: 'INTEL SSDPEDMD016T4 '

  serial: 'CVFT5506004Z1P6DGN  '

  pciaddr: 0000:81:00.0

  fwrev: 8DV10171

  namespaces:

  - id: 1

    capacity: 1600

 

And then populate the yaml file with the devices that you want to use, for instance:

  bdev_class: nvme

  bdev_list: ["0000:81:00.0", "0000:da:00.0"]

 

We are working on automatic storage configuration with CPU affinity detection, but this feature isn’t available yet.

 

HTH

 

Cheers,

Johann

 

From: <daos@daos.groups.io> on behalf of "xuezhao.liu@..." <xuezhao.liu@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Thursday 20 June 2019 at 13:24
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: Re: [External] Re: [daos] failed to create pool: -1023 #chat

 

looks like your daos_server was not started successfully.
you may check the details in the config file /usr/local/etc/daos_server.yml, try to change some setting to see if it can work, for example can test to comment out (add "#" to start of the line) all the "bdev_" started options.
If still cannot work, you may post your daos_server.yml and the daos log (path configured by "log_file" option, can set "log_mask: DEBUG") to jira ticket or here if jira does not work for you.

 

---------------------------------------------------------------------
Intel Corporation SAS (French simplified joint stock company)
Registered headquarters: "Les Montalets"- 2, rue de Paris,
92196 Meudon Cedex, France
Registration Number:  302 456 199 R.C.S. NANTERRE
Capital: 4,572,000 Euros

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

---------------------------------------------------------------------
Intel Corporation (UK) Limited
Registered No. 1134945 (England)
Registered Office: Pipers Way, Swindon SN3 1RJ
VAT No: 860 2173 47

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


Shengyu SY19 Zhang
 

Hello,

 

I hope to get additional help in how to make DAOS working in its ecosystem, run it fuse, or hdfs, or testing for iops.

dmg query and fuse not work in my environment, and I noticed the admin document of DAOS is mismatch with code of fuse part.

Pool created OK however dmg query and fuse mount always returns invalid parameters error code (1003).

orterun … dmg query --pool 06c10125-c3ea-4040-a030-10a9e5f10004 --svc 1

Therefor as for now if I can get any guides to test/run DAOS in its ecosystem is better to learn more about the project, any information will be appreciated.

 

Best Regards,

Shengyu

From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of Shengyu SY19 Zhang
Sent: Monday, June 24, 2019 4:52 PM
To: daos@daos.groups.io
Subject: Re: [External] Re: [daos] failed to create pool: -1023 #chat

 

Hello Tom,

 

Thank you for the infor.

Now Im able to create storage pool, via dmg tool, Ill try the patch later time when I need.

 

Regards,

Shengyu.

From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of Nabarro, Tom
Sent: Friday, June 21, 2019 9:06 PM
To: daos@daos.groups.io
Subject: Re: [External] Re: [daos] failed to create pool: -1023 #chat

 

I’m afraid the patch for this has not landed yet (regarding the handling of the request), it’s going through a round of reviews, https://github.com/daos-stack/daos/pull/637 .

 

Please feel free to experiment with the patch, as it should work, otherwise please use the "dmg" tool to create pools in the interim

 

Regards,

Tom Nabarro – DCG/ESAD

M: +44 (0)7786 260986

Skype: tom.nabarro

 

From: daos@daos.groups.io [mailto:daos@daos.groups.io] On Behalf Of Shengyu SY19 Zhang
Sent: Friday, June 21, 2019 10:35 AM
To: daos@daos.groups.io
Subject: Re: [External] Re: [daos] failed to create pool: -1023 #chat

 

Hello,

 

I got first issue resolved after run: $ daos_shell storage format

I think you could add this step into the quick start document.

Yes I have already created daos_server.yml, from the one at install/etc/daos_server.yml.

Now the storage server seems formatted, but there is new issue happen :

When I run dmg create, it encounter another issue :

failed to create pool: -1007
-------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
orterun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

  Process name: [[27363,1],0]
  Exit code:    1

 

 

If I execute: daos_shell pool create –s 1G

It says:

2019/06/21 17:29:19 config.go:122: debug: DAOS Client config read from /usr/local/etc/daos.yml

Active connections: [localhost:10001]

 

Creating DAOS pool with 1GB SCM and 0B NvMe storage (1.000 ratio)

parsing rank list: element 0: strconv.ParseUint: parsing "": invalid syntax

 

I’m trying to resolve by my ways, any hints will be appreciated.

 

Regards,

Shengyu

From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of Lombardi, Johann
Sent: Friday, June 21, 2019 3:14 AM
To: daos@daos.groups.io
Subject: Re: [External] Re: [daos] failed to create pool: -1023 #chat

 

Right, the storage engine isn’t started since the backend storage hasn’t been formatted:

waiting for storage format on server 0

 

Even with no NVDIMMs in the system, we still need to wipe out the SSDs so that all blocks are marked as not allocated (i.e. for wear leveling). The following command should allow you to format & start the engine:

$ daos_shell storage format

 

As suggested by Xuezhao, you should create your own daos_server.yml with the list of SSDs you want to use.

 

To list the SSDs available on the system, you can run the following commands:

$ daos_server storage prep-nvme

2019/06/20 19:06:24 storage_nvme.go:96: debug: spdk setup with _NRHUGE=1024

2019/06/20 19:06:24 storage_nvme.go:100: debug: spdk setup with _TARGET_USER=root

$ daos_server storage scan

[…]

NVMe:

- model: 'INTEL SSDPED1K375GA '

  serial: 'PHKS7335009W375AGN  '

  pciaddr: 0000:87:00.0

  fwrev: E2010324

  namespaces:

  - id: 1

    capacity: 375

- model: 'INTEL SSDPEDMD016T4 '

  serial: 'CVFT5226001C1P6DGN  '

  pciaddr: 0000:da:00.0

  fwrev: 8DV10171

  namespaces:

  - id: 1

    capacity: 1600

- model: 'INTEL SSDPEDMD016T4 '

  serial: 'CVFT5506004Z1P6DGN  '

  pciaddr: 0000:81:00.0

  fwrev: 8DV10171

  namespaces:

  - id: 1

    capacity: 1600

 

And then populate the yaml file with the devices that you want to use, for instance:

  bdev_class: nvme

  bdev_list: ["0000:81:00.0", "0000:da:00.0"]

 

We are working on automatic storage configuration with CPU affinity detection, but this feature isn’t available yet.

 

HTH

 

Cheers,

Johann

 

From: <daos@daos.groups.io> on behalf of "xuezhao.liu@..." <xuezhao.liu@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Thursday 20 June 2019 at 13:24
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: Re: [External] Re: [daos] failed to create pool: -1023 #chat

 

looks like your daos_server was not started successfully.
you may check the details in the config file /usr/local/etc/daos_server.yml, try to change some setting to see if it can work, for example can test to comment out (add "#" to start of the line) all the "bdev_" started options.
If still cannot work, you may post your daos_server.yml and the daos log (path configured by "log_file" option, can set "log_mask: DEBUG") to jira ticket or here if jira does not work for you.

 

---------------------------------------------------------------------
Intel Corporation SAS (French simplified joint stock company)
Registered headquarters: "Les Montalets"- 2, rue de Paris,
92196 Meudon Cedex, France
Registration Number:  302 456 199 R.C.S. NANTERRE
Capital: 4,572,000 Euros

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

---------------------------------------------------------------------
Intel Corporation (UK) Limited
Registered No. 1134945 (England)
Registered Office: Pipers Way, Swindon SN3 1RJ
VAT No: 860 2173 47

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


Lombardi, Johann
 

Hi Shengyu,

 

I assume that you have followed the instructions to set up /var/run/daos_agent, correct?

If not, please check https://github.com/daos-stack/daos/blob/master/doc/admin/deployment.md#runtime-directory-setup

 

We also landed support for the daos_agent to v0.5 (David sent an email to the list), but the admin guide hasn’t been updated yet:

https://github.com/daos-stack/daos/blob/master/doc/admin/deployment.md#agent-configuration

On the compute nodes, you should just run “daos_agent &”. A systemd script to automate this should be available soon.

 

David, could you please submit a PR to document in the admin guide how to setup & start the daos_agent? Thanks in advance.

 

Cheers,

Johann

 

From: <daos@daos.groups.io> on behalf of Shengyu SY19 Zhang <zhangsy19@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Monday 1 July 2019 at 11:19
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: Re: [External] Re: [daos] failed to create pool: -1023 #chat

 

Hello,

 

I hope to get additional help in how to make DAOS working in its ecosystem, run it fuse, or hdfs, or testing for iops.

dmg query and fuse not work in my environment, and I noticed the admin document of DAOS is mismatch with code of fuse part.

Pool created OK however dmg query and fuse mount always returns invalid parameters error code (1003).

orterun … dmg query --pool 06c10125-c3ea-4040-a030-10a9e5f10004 --svc 1

Therefor as for now if I can get any guides to test/run DAOS in its ecosystem is better to learn more about the project, any information will be appreciated.

 

Best Regards,

Shengyu

From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of Shengyu SY19 Zhang
Sent: Monday, June 24, 2019 4:52 PM
To: daos@daos.groups.io
Subject: Re: [External] Re: [daos] failed to create pool: -1023 #chat

 

Hello Tom,

 

Thank you for the infor.

Now Im able to create storage pool, via dmg tool, Ill try the patch later time when I need.

 

Regards,

Shengyu.

From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of Nabarro, Tom
Sent: Friday, June 21, 2019 9:06 PM
To: daos@daos.groups.io
Subject: Re: [External] Re: [daos] failed to create pool: -1023 #chat

 

I’m afraid the patch for this has not landed yet (regarding the handling of the request), it’s going through a round of reviews, https://github.com/daos-stack/daos/pull/637 .

 

Please feel free to experiment with the patch, as it should work, otherwise please use the "dmg" tool to create pools in the interim

 

Regards,

Tom Nabarro – DCG/ESAD

M: +44 (0)7786 260986

Skype: tom.nabarro

 

From: daos@daos.groups.io [mailto:daos@daos.groups.io] On Behalf Of Shengyu SY19 Zhang
Sent: Friday, June 21, 2019 10:35 AM
To: daos@daos.groups.io
Subject: Re: [External] Re: [daos] failed to create pool: -1023 #chat

 

Hello,

 

I got first issue resolved after run: $ daos_shell storage format

I think you could add this step into the quick start document.

Yes I have already created daos_server.yml, from the one at install/etc/daos_server.yml.

Now the storage server seems formatted, but there is new issue happen :

When I run dmg create, it encounter another issue :

failed to create pool: -1007
-------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
orterun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

  Process name: [[27363,1],0]
  Exit code:    1

 

 

If I execute: daos_shell pool create –s 1G

It says:

2019/06/21 17:29:19 config.go:122: debug: DAOS Client config read from /usr/local/etc/daos.yml

Active connections: [localhost:10001]

 

Creating DAOS pool with 1GB SCM and 0B NvMe storage (1.000 ratio)

parsing rank list: element 0: strconv.ParseUint: parsing "": invalid syntax

 

I’m trying to resolve by my ways, any hints will be appreciated.

 

Regards,

Shengyu

From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of Lombardi, Johann
Sent: Friday, June 21, 2019 3:14 AM
To: daos@daos.groups.io
Subject: Re: [External] Re: [daos] failed to create pool: -1023 #chat

 

Right, the storage engine isn’t started since the backend storage hasn’t been formatted:

waiting for storage format on server 0

 

Even with no NVDIMMs in the system, we still need to wipe out the SSDs so that all blocks are marked as not allocated (i.e. for wear leveling). The following command should allow you to format & start the engine:

$ daos_shell storage format

 

As suggested by Xuezhao, you should create your own daos_server.yml with the list of SSDs you want to use.

 

To list the SSDs available on the system, you can run the following commands:

$ daos_server storage prep-nvme

2019/06/20 19:06:24 storage_nvme.go:96: debug: spdk setup with _NRHUGE=1024

2019/06/20 19:06:24 storage_nvme.go:100: debug: spdk setup with _TARGET_USER=root

$ daos_server storage scan

[…]

NVMe:

- model: 'INTEL SSDPED1K375GA '

  serial: 'PHKS7335009W375AGN  '

  pciaddr: 0000:87:00.0

  fwrev: E2010324

  namespaces:

  - id: 1

    capacity: 375

- model: 'INTEL SSDPEDMD016T4 '

  serial: 'CVFT5226001C1P6DGN  '

  pciaddr: 0000:da:00.0

  fwrev: 8DV10171

  namespaces:

  - id: 1

    capacity: 1600

- model: 'INTEL SSDPEDMD016T4 '

  serial: 'CVFT5506004Z1P6DGN  '

  pciaddr: 0000:81:00.0

  fwrev: 8DV10171

  namespaces:

  - id: 1

    capacity: 1600

 

And then populate the yaml file with the devices that you want to use, for instance:

  bdev_class: nvme

  bdev_list: ["0000:81:00.0", "0000:da:00.0"]

 

We are working on automatic storage configuration with CPU affinity detection, but this feature isn’t available yet.

 

HTH

 

Cheers,

Johann

 

From: <daos@daos.groups.io> on behalf of "xuezhao.liu@..." <xuezhao.liu@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Thursday 20 June 2019 at 13:24
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: Re: [External] Re: [daos] failed to create pool: -1023 #chat

 

looks like your daos_server was not started successfully.
you may check the details in the config file /usr/local/etc/daos_server.yml, try to change some setting to see if it can work, for example can test to comment out (add "#" to start of the line) all the "bdev_" started options.
If still cannot work, you may post your daos_server.yml and the daos log (path configured by "log_file" option, can set "log_mask: DEBUG") to jira ticket or here if jira does not work for you.

 

---------------------------------------------------------------------
Intel Corporation SAS (French simplified joint stock company)
Registered headquarters: "Les Montalets"- 2, rue de Paris,
92196 Meudon Cedex, France
Registration Number:  302 456 199 R.C.S. NANTERRE
Capital: 4,572,000 Euros

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

---------------------------------------------------------------------
Intel Corporation (UK) Limited
Registered No. 1134945 (England)
Registered Office: Pipers Way, Swindon SN3 1RJ
VAT No: 860 2173 47

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

---------------------------------------------------------------------
Intel Corporation SAS (French simplified joint stock company)
Registered headquarters: "Les Montalets"- 2, rue de Paris,
92196 Meudon Cedex, France
Registration Number:  302 456 199 R.C.S. NANTERRE
Capital: 4,572,000 Euros

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


Nabarro, Tom
 

Hello Shengyu,

 

Also the “svc” parameter to dmg query is comma separated list of 0-based  indices so you might want “--svc 0” (refer to the second item returned from the create call)

 

Regards,

Tom Nabarro – DCG/ESAD

M: +44 (0)7786 260986

Skype: tom.nabarro

 

From: daos@daos.groups.io [mailto:daos@daos.groups.io] On Behalf Of Lombardi, Johann
Sent: Monday, July 1, 2019 1:07 PM
To: daos@daos.groups.io; Quigley, David <david.quigley@...>
Subject: Re: [External] Re: [daos] failed to create pool: -1023 #chat

 

Hi Shengyu,

 

I assume that you have followed the instructions to set up /var/run/daos_agent, correct?

If not, please check https://github.com/daos-stack/daos/blob/master/doc/admin/deployment.md#runtime-directory-setup

 

We also landed support for the daos_agent to v0.5 (David sent an email to the list), but the admin guide hasn’t been updated yet:

https://github.com/daos-stack/daos/blob/master/doc/admin/deployment.md#agent-configuration

On the compute nodes, you should just run “daos_agent &”. A systemd script to automate this should be available soon.

 

David, could you please submit a PR to document in the admin guide how to setup & start the daos_agent? Thanks in advance.

 

Cheers,

Johann

 

From: <daos@daos.groups.io> on behalf of Shengyu SY19 Zhang <zhangsy19@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Monday 1 July 2019 at 11:19
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: Re: [External] Re: [daos] failed to create pool: -1023 #chat

 

Hello,

 

I hope to get additional help in how to make DAOS working in its ecosystem, run it fuse, or hdfs, or testing for iops.

dmg query and fuse not work in my environment, and I noticed the admin document of DAOS is mismatch with code of fuse part.

Pool created OK however dmg query and fuse mount always returns invalid parameters error code (1003).

orterun … dmg query --pool 06c10125-c3ea-4040-a030-10a9e5f10004 --svc 1

Therefor as for now if I can get any guides to test/run DAOS in its ecosystem is better to learn more about the project, any information will be appreciated.

 

Best Regards,

Shengyu

From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of Shengyu SY19 Zhang
Sent: Monday, June 24, 2019 4:52 PM
To: daos@daos.groups.io
Subject: Re: [External] Re: [daos] failed to create pool: -1023 #chat

 

Hello Tom,

 

Thank you for the infor.

Now I’m able to create storage pool, via dmg tool, I’ll try the patch later time when I need.

 

Regards,

Shengyu.

From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of Nabarro, Tom
Sent: Friday, June 21, 2019 9:06 PM
To: daos@daos.groups.io
Subject: Re: [External] Re: [daos] failed to create pool: -1023 #chat

 

I’m afraid the patch for this has not landed yet (regarding the handling of the request), it’s going through a round of reviews, https://github.com/daos-stack/daos/pull/637 .

 

Please feel free to experiment with the patch, as it should work, otherwise please use the "dmg" tool to create pools in the interim

 

Regards,

Tom Nabarro – DCG/ESAD

M: +44 (0)7786 260986

Skype: tom.nabarro

 

From: daos@daos.groups.io [mailto:daos@daos.groups.io] On Behalf Of Shengyu SY19 Zhang
Sent: Friday, June 21, 2019 10:35 AM
To: daos@daos.groups.io
Subject: Re: [External] Re: [daos] failed to create pool: -1023 #chat

 

Hello,

 

I got first issue resolved after run: $ daos_shell storage format

I think you could add this step into the quick start document.

Yes I have already created daos_server.yml, from the one at install/etc/daos_server.yml.

Now the storage server seems formatted, but there is new issue happen :

When I run dmg create, it encounter another issue :

failed to create pool: -1007
-------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
-------------------------------------------------------
--------------------------------------------------------------------------
orterun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

  Process name: [[27363,1],0]
  Exit code:    1

 

 

If I execute: daos_shell pool create –s 1G

It says:

2019/06/21 17:29:19 config.go:122: debug: DAOS Client config read from /usr/local/etc/daos.yml

Active connections: [localhost:10001]

 

Creating DAOS pool with 1GB SCM and 0B NvMe storage (1.000 ratio)

parsing rank list: element 0: strconv.ParseUint: parsing "": invalid syntax

 

I’m trying to resolve by my ways, any hints will be appreciated.

 

Regards,

Shengyu

From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of Lombardi, Johann
Sent: Friday, June 21, 2019 3:14 AM
To: daos@daos.groups.io
Subject: Re: [External] Re: [daos] failed to create pool: -1023 #chat

 

Right, the storage engine isn’t started since the backend storage hasn’t been formatted:

waiting for storage format on server 0

 

Even with no NVDIMMs in the system, we still need to wipe out the SSDs so that all blocks are marked as not allocated (i.e. for wear leveling). The following command should allow you to format & start the engine:

$ daos_shell storage format

 

As suggested by Xuezhao, you should create your own daos_server.yml with the list of SSDs you want to use.

 

To list the SSDs available on the system, you can run the following commands:

$ daos_server storage prep-nvme

2019/06/20 19:06:24 storage_nvme.go:96: debug: spdk setup with _NRHUGE=1024

2019/06/20 19:06:24 storage_nvme.go:100: debug: spdk setup with _TARGET_USER=root

$ daos_server storage scan

[…]

NVMe:

- model: 'INTEL SSDPED1K375GA '

  serial: 'PHKS7335009W375AGN  '

  pciaddr: 0000:87:00.0

  fwrev: E2010324

  namespaces:

  - id: 1

    capacity: 375

- model: 'INTEL SSDPEDMD016T4 '

  serial: 'CVFT5226001C1P6DGN  '

  pciaddr: 0000:da:00.0

  fwrev: 8DV10171

  namespaces:

  - id: 1

    capacity: 1600

- model: 'INTEL SSDPEDMD016T4 '

  serial: 'CVFT5506004Z1P6DGN  '

  pciaddr: 0000:81:00.0

  fwrev: 8DV10171

  namespaces:

  - id: 1

    capacity: 1600

 

And then populate the yaml file with the devices that you want to use, for instance:

  bdev_class: nvme

  bdev_list: ["0000:81:00.0", "0000:da:00.0"]

 

We are working on automatic storage configuration with CPU affinity detection, but this feature isn’t available yet.

 

HTH

 

Cheers,

Johann

 

From: <daos@daos.groups.io> on behalf of "xuezhao.liu@..." <xuezhao.liu@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Thursday 20 June 2019 at 13:24
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: Re: [External] Re: [daos] failed to create pool: -1023 #chat

 

looks like your daos_server was not started successfully.
you may check the details in the config file /usr/local/etc/daos_server.yml, try to change some setting to see if it can work, for example can test to comment out (add "#" to start of the line) all the "bdev_" started options.
If still cannot work, you may post your daos_server.yml and the daos log (path configured by "log_file" option, can set "log_mask: DEBUG") to jira ticket or here if jira does not work for you.

 

---------------------------------------------------------------------
Intel Corporation SAS (French simplified joint stock company)
Registered headquarters: "Les Montalets"- 2, rue de Paris,
92196 Meudon Cedex, France
Registration Number:  302 456 199 R.C.S. NANTERRE
Capital: 4,572,000 Euros

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

---------------------------------------------------------------------
Intel Corporation (UK) Limited
Registered No. 1134945 (England)
Registered Office: Pipers Way, Swindon SN3 1RJ
VAT No: 860 2173 47

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

---------------------------------------------------------------------
Intel Corporation SAS (French simplified joint stock company)
Registered headquarters: "Les Montalets"- 2, rue de Paris,
92196 Meudon Cedex, France
Registration Number:  302 456 199 R.C.S. NANTERRE
Capital: 4,572,000 Euros

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

---------------------------------------------------------------------
Intel Corporation (UK) Limited
Registered No. 1134945 (England)
Registered Office: Pipers Way, Swindon SN3 1RJ
VAT No: 860 2173 47

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.