Re: [External] Re: [daos] failed to create pool: -1023
#chat
Liu, Xuezhao
looks like your daos_server was not started successfully.
you may check the details in the config file /usr/local/etc/daos_server.yml, try to change some setting to see if it can work, for example can test to comment out (add "#" to start of the line) all the "bdev_" started options. If still cannot work, you may post your daos_server.yml and the daos log (path configured by "log_file" option, can set "log_mask: DEBUG") to jira ticket or here if jira does not work for you. |
|
Re: [External] Re: [daos] failed to create pool: -1023
#chat
Lombardi, Johann
Right, the storage engine isn’t started since the backend storage hasn’t been formatted: waiting for storage format on server 0
Even with no NVDIMMs in the system, we still need to wipe out the SSDs so that all blocks are marked as not allocated (i.e. for wear leveling). The following command should allow you to format & start the engine: $ daos_shell storage format
As suggested by Xuezhao, you should create your own daos_server.yml with the list of SSDs you want to use.
To list the SSDs available on the system, you can run the following commands: $ daos_server storage prep-nvme 2019/06/20 19:06:24 storage_nvme.go:96: debug: spdk setup with _NRHUGE=1024 2019/06/20 19:06:24 storage_nvme.go:100: debug: spdk setup with _TARGET_USER=root $ daos_server storage scan […] NVMe: - model: 'INTEL SSDPED1K375GA ' serial: 'PHKS7335009W375AGN ' pciaddr: 0000:87:00.0 fwrev: E2010324 namespaces: - id: 1 capacity: 375 - model: 'INTEL SSDPEDMD016T4 ' serial: 'CVFT5226001C1P6DGN ' pciaddr: 0000:da:00.0 fwrev: 8DV10171 namespaces: - id: 1 capacity: 1600 - model: 'INTEL SSDPEDMD016T4 ' serial: 'CVFT5506004Z1P6DGN ' pciaddr: 0000:81:00.0 fwrev: 8DV10171 namespaces: - id: 1 capacity: 1600
And then populate the yaml file with the devices that you want to use, for instance: bdev_class: nvme bdev_list: ["0000:81:00.0", "0000:da:00.0"]
We are working on automatic storage configuration with CPU affinity detection, but this feature isn’t available yet.
HTH
Cheers, Johann
From:
<daos@daos.groups.io> on behalf of "xuezhao.liu@..." <xuezhao.liu@...>
looks like your daos_server was not started successfully.
--------------------------------------------------------------------- This e-mail and any attachments may contain confidential material for |
|
Re: [External] Re: [daos] failed to create pool: -1023
#chat
Shengyu SY19 Zhang
Hello,
I got first issue resolved after run: $ daos_shell storage format I think you could add this step into the quick start document. Yes I have already created daos_server.yml, from the one at install/etc/daos_server.yml. Now the storage server seems formatted, but there is new issue happen : When I run dmg create, it encounter another issue : failed to create pool: -1007
If I execute: daos_shell pool create –s 1G It says: 2019/06/21 17:29:19 config.go:122: debug: DAOS Client config read from /usr/local/etc/daos.yml Active connections: [localhost:10001]
Creating DAOS pool with 1GB SCM and 0B NvMe storage (1.000 ratio) parsing rank list: element 0: strconv.ParseUint: parsing "": invalid syntax
I’m trying to resolve by my ways, any hints will be appreciated.
Regards, Shengyu From: daos@daos.groups.io <daos@daos.groups.io>
On Behalf Of Lombardi, Johann
Sent: Friday, June 21, 2019 3:14 AM To: daos@daos.groups.io Subject: Re: [External] Re: [daos] failed to create pool: -1023 #chat
Right, the storage engine isn’t started since the backend storage hasn’t been formatted: waiting for storage format on server 0
Even with no NVDIMMs in the system, we still need to wipe out the SSDs so that all blocks are marked as not allocated (i.e. for wear leveling). The following command should allow you to format & start the engine: $ daos_shell storage format
As suggested by Xuezhao, you should create your own daos_server.yml with the list of SSDs you want to use.
To list the SSDs available on the system, you can run the following commands: $ daos_server storage prep-nvme 2019/06/20 19:06:24 storage_nvme.go:96: debug: spdk setup with _NRHUGE=1024 2019/06/20 19:06:24 storage_nvme.go:100: debug: spdk setup with _TARGET_USER=root $ daos_server storage scan […] NVMe: - model: 'INTEL SSDPED1K375GA ' serial: 'PHKS7335009W375AGN ' pciaddr: 0000:87:00.0 fwrev: E2010324 namespaces: - id: 1 capacity: 375 - model: 'INTEL SSDPEDMD016T4 ' serial: 'CVFT5226001C1P6DGN ' pciaddr: 0000:da:00.0 fwrev: 8DV10171 namespaces: - id: 1 capacity: 1600 - model: 'INTEL SSDPEDMD016T4 ' serial: 'CVFT5506004Z1P6DGN ' pciaddr: 0000:81:00.0 fwrev: 8DV10171 namespaces: - id: 1 capacity: 1600
And then populate the yaml file with the devices that you want to use, for instance: bdev_class: nvme bdev_list: ["0000:81:00.0", "0000:da:00.0"]
We are working on automatic storage configuration with CPU affinity detection, but this feature isn’t available yet.
HTH
Cheers, Johann
From:
<daos@daos.groups.io> on behalf of "xuezhao.liu@..." <xuezhao.liu@...>
looks like your daos_server was not started successfully.
--------------------------------------------------------------------- This e-mail and any attachments may contain confidential material for |
|
Re: [External] Re: [daos] failed to create pool: -1023
#chat
Lombardi, Johann
Hi Shengyu,
We are about to retire the quick start document in favor of the admin guide that has been integrated into the source code (https://github.com/daos-stack/daos/tree/master/doc/admin) The documentation for format was actually landed this morning: https://github.com/daos-stack/daos/blob/master/doc/admin/deployment.md#basic-workflow
As for the -1007 error, it means that you don’t have enough space available to allocate the pool (https://github.com/daos-stack/daos/blob/master/doc/admin/troubleshooting.md#daos-errors). How much space have you allocated with tmpfs under /mnt/daos?
Cheers, Johann
From:
<daos@daos.groups.io> on behalf of Shengyu SY19 Zhang <zhangsy19@...>
Hello,
I got first issue resolved after run: $ daos_shell storage format I think you could add this step into the quick start document. Yes I have already created daos_server.yml, from the one at install/etc/daos_server.yml. Now the storage server seems formatted, but there is new issue happen : When I run dmg create, it encounter another issue : failed to create pool: -1007
If I execute: daos_shell pool create –s 1G It says: 2019/06/21 17:29:19 config.go:122: debug: DAOS Client config read from /usr/local/etc/daos.yml Active connections: [localhost:10001]
Creating DAOS pool with 1GB SCM and 0B NvMe storage (1.000 ratio) parsing rank list: element 0: strconv.ParseUint: parsing "": invalid syntax
I’m trying to resolve by my ways, any hints will be appreciated.
Regards, Shengyu
--------------------------------------------------------------------- This e-mail and any attachments may contain confidential material for |
|
Re: [External] Re: [daos] failed to create pool: -1023
#chat
I’m afraid the patch for this has not landed yet (regarding the handling of the request), it’s going through a round of reviews, https://github.com/daos-stack/daos/pull/637 .
Please feel free to experiment with the patch, as it should work, otherwise please use the "dmg" tool to create pools in the interim
Regards, Tom Nabarro – DCG/ESAD M: +44 (0)7786 260986 Skype: tom.nabarro
From: daos@daos.groups.io [mailto:daos@daos.groups.io]
On Behalf Of Shengyu SY19 Zhang
Sent: Friday, June 21, 2019 10:35 AM To: daos@daos.groups.io Subject: Re: [External] Re: [daos] failed to create pool: -1023 #chat
Hello,
I got first issue resolved after run: $ daos_shell storage format I think you could add this step into the quick start document. Yes I have already created daos_server.yml, from the one at install/etc/daos_server.yml. Now the storage server seems formatted, but there is new issue happen : When I run dmg create, it encounter another issue : failed to create pool: -1007
If I execute: daos_shell pool create –s 1G It says: 2019/06/21 17:29:19 config.go:122: debug: DAOS Client config read from /usr/local/etc/daos.yml Active connections: [localhost:10001]
Creating DAOS pool with 1GB SCM and 0B NvMe storage (1.000 ratio) parsing rank list: element 0: strconv.ParseUint: parsing "": invalid syntax
I’m trying to resolve by my ways, any hints will be appreciated.
Regards, Shengyu From:
daos@daos.groups.io <daos@daos.groups.io>
On Behalf Of Lombardi, Johann
Right, the storage engine isn’t started since the backend storage hasn’t been formatted: waiting for storage format on server 0
Even with no NVDIMMs in the system, we still need to wipe out the SSDs so that all blocks are marked as not allocated (i.e. for wear leveling). The following command should allow you to format & start the engine: $ daos_shell storage format
As suggested by Xuezhao, you should create your own daos_server.yml with the list of SSDs you want to use.
To list the SSDs available on the system, you can run the following commands: $ daos_server storage prep-nvme 2019/06/20 19:06:24 storage_nvme.go:96: debug: spdk setup with _NRHUGE=1024 2019/06/20 19:06:24 storage_nvme.go:100: debug: spdk setup with _TARGET_USER=root $ daos_server storage scan […] NVMe: - model: 'INTEL SSDPED1K375GA ' serial: 'PHKS7335009W375AGN ' pciaddr: 0000:87:00.0 fwrev: E2010324 namespaces: - id: 1 capacity: 375 - model: 'INTEL SSDPEDMD016T4 ' serial: 'CVFT5226001C1P6DGN ' pciaddr: 0000:da:00.0 fwrev: 8DV10171 namespaces: - id: 1 capacity: 1600 - model: 'INTEL SSDPEDMD016T4 ' serial: 'CVFT5506004Z1P6DGN ' pciaddr: 0000:81:00.0 fwrev: 8DV10171 namespaces: - id: 1 capacity: 1600
And then populate the yaml file with the devices that you want to use, for instance: bdev_class: nvme bdev_list: ["0000:81:00.0", "0000:da:00.0"]
We are working on automatic storage configuration with CPU affinity detection, but this feature isn’t available yet.
HTH
Cheers, Johann
From:
<daos@daos.groups.io> on behalf of "xuezhao.liu@..." <xuezhao.liu@...>
looks like your daos_server was not started successfully.
--------------------------------------------------------------------- This e-mail and any attachments may contain confidential material for --------------------------------------------------------------------- This e-mail and any attachments may contain confidential material for |
|
Re: [External] Re: [daos] failed to create pool: -1023
#chat
Shengyu SY19 Zhang
Hello Johann,
Great. For /mnt/daos, its space should be sufficient, here is the outputs of mount: tmpfs on /mnt/daos type tmpfs (rw,nosuid,nodev,noexec,noatime,seclabel,size=6291456k) However I can see it was already used 88% of its space, then I remount a larger one (20G), now I’m able to create storage pool.
Regards, Shengyu.
From: daos@daos.groups.io <daos@daos.groups.io>
On Behalf Of Lombardi, Johann
Sent: Friday, June 21, 2019 8:28 PM To: daos@daos.groups.io Subject: Re: [External] Re: [daos] failed to create pool: -1023 #chat
Hi Shengyu,
We are about to retire the quick start document in favor of the admin guide that has been integrated into the source code (https://github.com/daos-stack/daos/tree/master/doc/admin) The documentation for format was actually landed this morning: https://github.com/daos-stack/daos/blob/master/doc/admin/deployment.md#basic-workflow
As for the -1007 error, it means that you don’t have enough space available to allocate the pool (https://github.com/daos-stack/daos/blob/master/doc/admin/troubleshooting.md#daos-errors). How much space have you allocated with tmpfs under /mnt/daos?
Cheers, Johann
From:
<daos@daos.groups.io> on behalf of Shengyu SY19 Zhang <zhangsy19@...>
Hello,
I got first issue resolved after run: $ daos_shell storage format I think you could add this step into the quick start document. Yes I have already created daos_server.yml, from the one at install/etc/daos_server.yml. Now the storage server seems formatted, but there is new issue happen : When I run dmg create, it encounter another issue : failed to create pool: -1007
If I execute: daos_shell pool create –s 1G It says: 2019/06/21 17:29:19 config.go:122: debug: DAOS Client config read from /usr/local/etc/daos.yml Active connections: [localhost:10001]
Creating DAOS pool with 1GB SCM and 0B NvMe storage (1.000 ratio) parsing rank list: element 0: strconv.ParseUint: parsing "": invalid syntax
I’m trying to resolve by my ways, any hints will be appreciated.
Regards, Shengyu
--------------------------------------------------------------------- This e-mail and any attachments may contain confidential material for |
|
Re: [External] Re: [daos] failed to create pool: -1023
#chat
Shengyu SY19 Zhang
Hello Tom,
Thank you for the infor. Now I’m able to create storage pool, via dmg tool, I’ll try the patch later time when I need.
Regards, Shengyu. From: daos@daos.groups.io <daos@daos.groups.io>
On Behalf Of Nabarro, Tom
Sent: Friday, June 21, 2019 9:06 PM To: daos@daos.groups.io Subject: Re: [External] Re: [daos] failed to create pool: -1023 #chat
I’m afraid the patch for this has not landed yet (regarding the handling of the request), it’s going through a round of reviews, https://github.com/daos-stack/daos/pull/637 .
Please feel free to experiment with the patch, as it should work, otherwise please use the "dmg" tool to create pools in the interim
Regards, Tom Nabarro – DCG/ESAD M: +44 (0)7786 260986 Skype: tom.nabarro
From:
daos@daos.groups.io [mailto:daos@daos.groups.io]
On Behalf Of Shengyu SY19 Zhang
Hello,
I got first issue resolved after run: $ daos_shell storage format I think you could add this step into the quick start document. Yes I have already created daos_server.yml, from the one at install/etc/daos_server.yml. Now the storage server seems formatted, but there is new issue happen : When I run dmg create, it encounter another issue : failed to create pool: -1007
If I execute: daos_shell pool create –s 1G It says: 2019/06/21 17:29:19 config.go:122: debug: DAOS Client config read from /usr/local/etc/daos.yml Active connections: [localhost:10001]
Creating DAOS pool with 1GB SCM and 0B NvMe storage (1.000 ratio) parsing rank list: element 0: strconv.ParseUint: parsing "": invalid syntax
I’m trying to resolve by my ways, any hints will be appreciated.
Regards, Shengyu From:
daos@daos.groups.io <daos@daos.groups.io>
On Behalf Of Lombardi, Johann
Right, the storage engine isn’t started since the backend storage hasn’t been formatted: waiting for storage format on server 0
Even with no NVDIMMs in the system, we still need to wipe out the SSDs so that all blocks are marked as not allocated (i.e. for wear leveling). The following command should allow you to format & start the engine: $ daos_shell storage format
As suggested by Xuezhao, you should create your own daos_server.yml with the list of SSDs you want to use.
To list the SSDs available on the system, you can run the following commands: $ daos_server storage prep-nvme 2019/06/20 19:06:24 storage_nvme.go:96: debug: spdk setup with _NRHUGE=1024 2019/06/20 19:06:24 storage_nvme.go:100: debug: spdk setup with _TARGET_USER=root $ daos_server storage scan […] NVMe: - model: 'INTEL SSDPED1K375GA ' serial: 'PHKS7335009W375AGN ' pciaddr: 0000:87:00.0 fwrev: E2010324 namespaces: - id: 1 capacity: 375 - model: 'INTEL SSDPEDMD016T4 ' serial: 'CVFT5226001C1P6DGN ' pciaddr: 0000:da:00.0 fwrev: 8DV10171 namespaces: - id: 1 capacity: 1600 - model: 'INTEL SSDPEDMD016T4 ' serial: 'CVFT5506004Z1P6DGN ' pciaddr: 0000:81:00.0 fwrev: 8DV10171 namespaces: - id: 1 capacity: 1600
And then populate the yaml file with the devices that you want to use, for instance: bdev_class: nvme bdev_list: ["0000:81:00.0", "0000:da:00.0"]
We are working on automatic storage configuration with CPU affinity detection, but this feature isn’t available yet.
HTH
Cheers, Johann
From:
<daos@daos.groups.io> on behalf of "xuezhao.liu@..." <xuezhao.liu@...>
looks like your daos_server was not started successfully.
--------------------------------------------------------------------- This e-mail and any attachments may contain confidential material for --------------------------------------------------------------------- This e-mail and any attachments may contain confidential material for |
|
Introducing Certificate Support for DAOS
Quigley, David
Hello,
In the near future a series of patches will land that introduces secure communications support for all of the gRPC connections in DAOS. These channels are used for communications between the Go components in daos (daos_shell, daos_agent, and daos_server). By default certificates are required however it is easy to turn them off. The two ways of turning off certificate support are as follows
1) In daos.yml, daos_agent.yml, and daos_server.yml you can add the line insecure:true. This will tell all of the component not to attempt to load any certificates and will keep all of the channels insecure (plain text http/2). 2) When starting daos_agent, daos_server, and daos_shell pass either –i or --insecure on the command-line (this is the approach taken in the various tests in DAOS.
Regardless of which method you chose make sure all 3 components are either running with certificates or without. Mixing the components will cause the system to fail. It should notify you in the error logs that it is a TLS failure but it might not always be obvious.
Once the patches are merged I will present on how to use the certificate support if desired for testing. There is already a script for generating a set of certificates including a Certificate Authority for the DAOS cluster. For now though it is best to either modify your configuration files or pass in the appropriate command line flags once the patches are merged.
Dave Quigley
|
|
Re: [External] Re: [daos] failed to create pool: -1023
#chat
Shengyu SY19 Zhang
Hello,
I hope to get additional help in how to make DAOS working in its ecosystem, run it fuse, or hdfs, or testing for iops. dmg query and fuse not work in my environment, and I noticed the admin document of DAOS is mismatch with code of fuse part. Pool created OK however dmg query and fuse mount always returns invalid parameters error code (1003). orterun … dmg query --pool 06c10125-c3ea-4040-a030-10a9e5f10004 --svc 1 Therefor as for now if I can get any guides to test/run DAOS in its ecosystem is better to learn more about the project, any information will be appreciated.
Best Regards, Shengyu From: daos@daos.groups.io <daos@daos.groups.io>
On Behalf Of Shengyu SY19 Zhang
Sent: Monday, June 24, 2019 4:52 PM To: daos@daos.groups.io Subject: Re: [External] Re: [daos] failed to create pool: -1023 #chat
Hello Tom,
Thank you for the infor. Now I’m able to create storage pool, via dmg tool, I’ll try the patch later time when I need.
Regards, Shengyu. From:
daos@daos.groups.io <daos@daos.groups.io>
On Behalf Of Nabarro, Tom
I’m afraid the patch for this has not landed yet (regarding the handling of the request), it’s going through a round of reviews, https://github.com/daos-stack/daos/pull/637 .
Please feel free to experiment with the patch, as it should work, otherwise please use the "dmg" tool to create pools in the interim
Regards, Tom Nabarro – DCG/ESAD M: +44 (0)7786 260986 Skype: tom.nabarro
From:
daos@daos.groups.io [mailto:daos@daos.groups.io]
On Behalf Of Shengyu SY19 Zhang
Hello,
I got first issue resolved after run: $ daos_shell storage format I think you could add this step into the quick start document. Yes I have already created daos_server.yml, from the one at install/etc/daos_server.yml. Now the storage server seems formatted, but there is new issue happen : When I run dmg create, it encounter another issue : failed to create pool: -1007
If I execute: daos_shell pool create –s 1G It says: 2019/06/21 17:29:19 config.go:122: debug: DAOS Client config read from /usr/local/etc/daos.yml Active connections: [localhost:10001]
Creating DAOS pool with 1GB SCM and 0B NvMe storage (1.000 ratio) parsing rank list: element 0: strconv.ParseUint: parsing "": invalid syntax
I’m trying to resolve by my ways, any hints will be appreciated.
Regards, Shengyu From:
daos@daos.groups.io <daos@daos.groups.io>
On Behalf Of Lombardi, Johann
Right, the storage engine isn’t started since the backend storage hasn’t been formatted: waiting for storage format on server 0
Even with no NVDIMMs in the system, we still need to wipe out the SSDs so that all blocks are marked as not allocated (i.e. for wear leveling). The following command should allow you to format & start the engine: $ daos_shell storage format
As suggested by Xuezhao, you should create your own daos_server.yml with the list of SSDs you want to use.
To list the SSDs available on the system, you can run the following commands: $ daos_server storage prep-nvme 2019/06/20 19:06:24 storage_nvme.go:96: debug: spdk setup with _NRHUGE=1024 2019/06/20 19:06:24 storage_nvme.go:100: debug: spdk setup with _TARGET_USER=root $ daos_server storage scan […] NVMe: - model: 'INTEL SSDPED1K375GA ' serial: 'PHKS7335009W375AGN ' pciaddr: 0000:87:00.0 fwrev: E2010324 namespaces: - id: 1 capacity: 375 - model: 'INTEL SSDPEDMD016T4 ' serial: 'CVFT5226001C1P6DGN ' pciaddr: 0000:da:00.0 fwrev: 8DV10171 namespaces: - id: 1 capacity: 1600 - model: 'INTEL SSDPEDMD016T4 ' serial: 'CVFT5506004Z1P6DGN ' pciaddr: 0000:81:00.0 fwrev: 8DV10171 namespaces: - id: 1 capacity: 1600
And then populate the yaml file with the devices that you want to use, for instance: bdev_class: nvme bdev_list: ["0000:81:00.0", "0000:da:00.0"]
We are working on automatic storage configuration with CPU affinity detection, but this feature isn’t available yet.
HTH
Cheers, Johann
From:
<daos@daos.groups.io> on behalf of "xuezhao.liu@..." <xuezhao.liu@...>
looks like your daos_server was not started successfully.
--------------------------------------------------------------------- This e-mail and any attachments may contain confidential material for --------------------------------------------------------------------- This e-mail and any attachments may contain confidential material for |
|
Re: [External] Re: [daos] failed to create pool: -1023
#chat
Lombardi, Johann
Hi Shengyu,
I assume that you have followed the instructions to set up /var/run/daos_agent, correct? If not, please check https://github.com/daos-stack/daos/blob/master/doc/admin/deployment.md#runtime-directory-setup
We also landed support for the daos_agent to v0.5 (David sent an email to the list), but the admin guide hasn’t been updated yet: https://github.com/daos-stack/daos/blob/master/doc/admin/deployment.md#agent-configuration On the compute nodes, you should just run “daos_agent &”. A systemd script to automate this should be available soon.
David, could you please submit a PR to document in the admin guide how to setup & start the daos_agent? Thanks in advance.
Cheers, Johann
From:
<daos@daos.groups.io> on behalf of Shengyu SY19 Zhang <zhangsy19@...>
Hello,
I hope to get additional help in how to make DAOS working in its ecosystem, run it fuse, or hdfs, or testing for iops. dmg query and fuse not work in my environment, and I noticed the admin document of DAOS is mismatch with code of fuse part. Pool created OK however dmg query and fuse mount always returns invalid parameters error code (1003). orterun … dmg query --pool 06c10125-c3ea-4040-a030-10a9e5f10004 --svc 1 Therefor as for now if I can get any guides to test/run DAOS in its ecosystem is better to learn more about the project, any information will be appreciated.
Best Regards, Shengyu From: daos@daos.groups.io <daos@daos.groups.io>
On Behalf Of Shengyu SY19 Zhang
Sent: Monday, June 24, 2019 4:52 PM To: daos@daos.groups.io Subject: Re: [External] Re: [daos] failed to create pool: -1023 #chat
Hello Tom,
Thank you for the infor. Now I’m able to create storage pool, via dmg tool, I’ll try the patch later time when I need.
Regards, Shengyu. From:
daos@daos.groups.io <daos@daos.groups.io>
On Behalf Of Nabarro, Tom
I’m afraid the patch for this has not landed yet (regarding the handling of the request), it’s going through a round of reviews, https://github.com/daos-stack/daos/pull/637 .
Please feel free to experiment with the patch, as it should work, otherwise please use the "dmg" tool to create pools in the interim
Regards, Tom Nabarro – DCG/ESAD M: +44 (0)7786 260986 Skype: tom.nabarro
From:
daos@daos.groups.io [mailto:daos@daos.groups.io]
On Behalf Of Shengyu SY19 Zhang
Hello,
I got first issue resolved after run: $ daos_shell storage format I think you could add this step into the quick start document. Yes I have already created daos_server.yml, from the one at install/etc/daos_server.yml. Now the storage server seems formatted, but there is new issue happen : When I run dmg create, it encounter another issue : failed to create pool: -1007
If I execute: daos_shell pool create –s 1G It says: 2019/06/21 17:29:19 config.go:122: debug: DAOS Client config read from /usr/local/etc/daos.yml Active connections: [localhost:10001]
Creating DAOS pool with 1GB SCM and 0B NvMe storage (1.000 ratio) parsing rank list: element 0: strconv.ParseUint: parsing "": invalid syntax
I’m trying to resolve by my ways, any hints will be appreciated.
Regards, Shengyu From:
daos@daos.groups.io <daos@daos.groups.io>
On Behalf Of Lombardi, Johann
Right, the storage engine isn’t started since the backend storage hasn’t been formatted: waiting for storage format on server 0
Even with no NVDIMMs in the system, we still need to wipe out the SSDs so that all blocks are marked as not allocated (i.e. for wear leveling). The following command should allow you to format & start the engine: $ daos_shell storage format
As suggested by Xuezhao, you should create your own daos_server.yml with the list of SSDs you want to use.
To list the SSDs available on the system, you can run the following commands: $ daos_server storage prep-nvme 2019/06/20 19:06:24 storage_nvme.go:96: debug: spdk setup with _NRHUGE=1024 2019/06/20 19:06:24 storage_nvme.go:100: debug: spdk setup with _TARGET_USER=root $ daos_server storage scan […] NVMe: - model: 'INTEL SSDPED1K375GA ' serial: 'PHKS7335009W375AGN ' pciaddr: 0000:87:00.0 fwrev: E2010324 namespaces: - id: 1 capacity: 375 - model: 'INTEL SSDPEDMD016T4 ' serial: 'CVFT5226001C1P6DGN ' pciaddr: 0000:da:00.0 fwrev: 8DV10171 namespaces: - id: 1 capacity: 1600 - model: 'INTEL SSDPEDMD016T4 ' serial: 'CVFT5506004Z1P6DGN ' pciaddr: 0000:81:00.0 fwrev: 8DV10171 namespaces: - id: 1 capacity: 1600
And then populate the yaml file with the devices that you want to use, for instance: bdev_class: nvme bdev_list: ["0000:81:00.0", "0000:da:00.0"]
We are working on automatic storage configuration with CPU affinity detection, but this feature isn’t available yet.
HTH
Cheers, Johann
From:
<daos@daos.groups.io> on behalf of "xuezhao.liu@..." <xuezhao.liu@...>
looks like your daos_server was not started successfully.
--------------------------------------------------------------------- This e-mail and any attachments may contain confidential material for --------------------------------------------------------------------- This e-mail and any attachments may contain confidential material for --------------------------------------------------------------------- This e-mail and any attachments may contain confidential material for |
|
Re: [External] Re: [daos] failed to create pool: -1023
#chat
Hello Shengyu,
Also the “svc” parameter to dmg query is comma separated list of 0-based indices so you might want “--svc 0” (refer to the second item returned from the create call)
Regards, Tom Nabarro – DCG/ESAD M: +44 (0)7786 260986 Skype: tom.nabarro
From: daos@daos.groups.io [mailto:daos@daos.groups.io]
On Behalf Of Lombardi, Johann
Sent: Monday, July 1, 2019 1:07 PM To: daos@daos.groups.io; Quigley, David <david.quigley@...> Subject: Re: [External] Re: [daos] failed to create pool: -1023 #chat
Hi Shengyu,
I assume that you have followed the instructions to set up /var/run/daos_agent, correct? If not, please check https://github.com/daos-stack/daos/blob/master/doc/admin/deployment.md#runtime-directory-setup
We also landed support for the daos_agent to v0.5 (David sent an email to the list), but the admin guide hasn’t been updated yet: https://github.com/daos-stack/daos/blob/master/doc/admin/deployment.md#agent-configuration On the compute nodes, you should just run “daos_agent &”. A systemd script to automate this should be available soon.
David, could you please submit a PR to document in the admin guide how to setup & start the daos_agent? Thanks in advance.
Cheers, Johann
From:
<daos@daos.groups.io> on behalf of Shengyu SY19 Zhang <zhangsy19@...>
Hello,
I hope to get additional help in how to make DAOS working in its ecosystem, run it fuse, or hdfs, or testing for iops. dmg query and fuse not work in my environment, and I noticed the admin document of DAOS is mismatch with code of fuse part. Pool created OK however dmg query and fuse mount always returns invalid parameters error code (1003). orterun … dmg query --pool 06c10125-c3ea-4040-a030-10a9e5f10004 --svc 1 Therefor as for now if I can get any guides to test/run DAOS in its ecosystem is better to learn more about the project, any information will be appreciated.
Best Regards, Shengyu From:
daos@daos.groups.io <daos@daos.groups.io>
On Behalf Of Shengyu SY19 Zhang
Hello Tom,
Thank you for the infor. Now I’m able to create storage pool, via dmg tool, I’ll try the patch later time when I need.
Regards, Shengyu. From:
daos@daos.groups.io <daos@daos.groups.io>
On Behalf Of Nabarro, Tom
I’m afraid the patch for this has not landed yet (regarding the handling of the request), it’s going through a round of reviews, https://github.com/daos-stack/daos/pull/637 .
Please feel free to experiment with the patch, as it should work, otherwise please use the "dmg" tool to create pools in the interim
Regards, Tom Nabarro – DCG/ESAD M: +44 (0)7786 260986 Skype: tom.nabarro
From:
daos@daos.groups.io [mailto:daos@daos.groups.io]
On Behalf Of Shengyu SY19 Zhang
Hello,
I got first issue resolved after run: $ daos_shell storage format I think you could add this step into the quick start document. Yes I have already created daos_server.yml, from the one at install/etc/daos_server.yml. Now the storage server seems formatted, but there is new issue happen : When I run dmg create, it encounter another issue : failed to create pool: -1007
If I execute: daos_shell pool create –s 1G It says: 2019/06/21 17:29:19 config.go:122: debug: DAOS Client config read from /usr/local/etc/daos.yml Active connections: [localhost:10001]
Creating DAOS pool with 1GB SCM and 0B NvMe storage (1.000 ratio) parsing rank list: element 0: strconv.ParseUint: parsing "": invalid syntax
I’m trying to resolve by my ways, any hints will be appreciated.
Regards, Shengyu From:
daos@daos.groups.io <daos@daos.groups.io>
On Behalf Of Lombardi, Johann
Right, the storage engine isn’t started since the backend storage hasn’t been formatted: waiting for storage format on server 0
Even with no NVDIMMs in the system, we still need to wipe out the SSDs so that all blocks are marked as not allocated (i.e. for wear leveling). The following command should allow you to format & start the engine: $ daos_shell storage format
As suggested by Xuezhao, you should create your own daos_server.yml with the list of SSDs you want to use.
To list the SSDs available on the system, you can run the following commands: $ daos_server storage prep-nvme 2019/06/20 19:06:24 storage_nvme.go:96: debug: spdk setup with _NRHUGE=1024 2019/06/20 19:06:24 storage_nvme.go:100: debug: spdk setup with _TARGET_USER=root $ daos_server storage scan […] NVMe: - model: 'INTEL SSDPED1K375GA ' serial: 'PHKS7335009W375AGN ' pciaddr: 0000:87:00.0 fwrev: E2010324 namespaces: - id: 1 capacity: 375 - model: 'INTEL SSDPEDMD016T4 ' serial: 'CVFT5226001C1P6DGN ' pciaddr: 0000:da:00.0 fwrev: 8DV10171 namespaces: - id: 1 capacity: 1600 - model: 'INTEL SSDPEDMD016T4 ' serial: 'CVFT5506004Z1P6DGN ' pciaddr: 0000:81:00.0 fwrev: 8DV10171 namespaces: - id: 1 capacity: 1600
And then populate the yaml file with the devices that you want to use, for instance: bdev_class: nvme bdev_list: ["0000:81:00.0", "0000:da:00.0"]
We are working on automatic storage configuration with CPU affinity detection, but this feature isn’t available yet.
HTH
Cheers, Johann
From:
<daos@daos.groups.io> on behalf of "xuezhao.liu@..." <xuezhao.liu@...>
looks like your daos_server was not started successfully.
--------------------------------------------------------------------- This e-mail and any attachments may contain confidential material for --------------------------------------------------------------------- This e-mail and any attachments may contain confidential material for --------------------------------------------------------------------- This e-mail and any attachments may contain confidential material for --------------------------------------------------------------------- This e-mail and any attachments may contain confidential material for |
|
Issues creating a DAOS pool
Rene Salmon <salmonr@...>
Hi Daos list, I am trying to bring up DAOS using various docs on the github page. That said I am running into trouble while trying to create a DAOS Pool. I have three DAOS servers and one client. daos-1 = client daos-[2-4] = servers [user@daos-1 ~]$ orterun -np 1 --ompi-server file:/tmp/urifile.txt dmg create --size=2G failed to create pool: -1005 ------------------------------------------------------- Primary job terminated normally, but 1 process returned a non-zero exit code. Per user-direction, the job has been aborted. ------------------------------------------------------- -------------------------------------------------------------------------- orterun detected that one or more processes exited with non-zero status, thus causing the job to be terminated. The first process to do so was: Process name: [[40764,1],0] Exit code: 1 -------------------------------------------------------------------------- Any ideas where to look for a hint? Thanks Rene |
|
Re: Issues creating a DAOS pool
Chaarawi, Mohamad
On the issue below, it seems that the uri file that the server generates is not written to a place where the client can read: /tmp/urifile.txt
I had an offline chat with Rene who will retry this after writing the uri file to a shared FS, but wanted to updated the mailing list on the issue.
Thanks, Mohamad
From: <daos@daos.groups.io> on behalf of "Rene Salmon via Groups.Io" <salmonr@...>
Hi Daos list,
I am trying to bring up DAOS using various docs on the github page. That said I am running into trouble while trying to create a DAOS Pool.
I have three DAOS servers and one client. daos-1 = client daos-[2-4] = servers
[user@daos-1 ~]$ orterun -np 1 --ompi-server file:/tmp/urifile.txt dmg create --size=2G failed to create pool: -1005 ------------------------------------------------------- Primary job terminated normally, but 1 process returned a non-zero exit code. Per user-direction, the job has been aborted. ------------------------------------------------------- -------------------------------------------------------------------------- orterun detected that one or more processes exited with non-zero status, thus causing the job to be terminated. The first process to do so was:
Process name: [[40764,1],0] Exit code: 1 --------------------------------------------------------------------------
Any ideas where to look for a hint? Thanks
Rene |
|
DAOS hardware requirements
BASDEN, ALASTAIR G.
Hi,
Having read https://github.com/daos-stack/daos/blob/master/doc/storage_model.md is it the case that all storage nodes must have DCPMM (apache pass) memory? I was hoping to install a test installation using a metadata server with DCPMM memory, and storage nodes with only NVMe drivers (these are not cascade lake, so no DPCMM). However, I am now unclear whether this will be possible. Thanks, Alastair. |
|
Re: DAOS hardware requirements
Carrier, John
DAOS metadata resides in persistent memory (DCPMM) on the same node as the block storage (NVMe drives). DAOS is an object store, not a file system. The metadata in the DCPMM are the data structures that each DAOS server maintains to access the application data stored in DAOS containers and objects on the SSDs.
toggle quoted message
Show quoted text
--jc -----Original Message-----
From: daos@daos.groups.io [mailto:daos@daos.groups.io] On Behalf Of BASDEN, ALASTAIR G. Sent: Friday, July 26, 2019 3:41 AM To: daos@daos.groups.io Subject: [daos] DAOS hardware requirements Hi, Having read https://github.com/daos-stack/daos/blob/master/doc/storage_model.md is it the case that all storage nodes must have DCPMM (apache pass) memory? I was hoping to install a test installation using a metadata server with DCPMM memory, and storage nodes with only NVMe drivers (these are not cascade lake, so no DPCMM). However, I am now unclear whether this will be possible. Thanks, Alastair. |
|
Re: DAOS hardware requirements
Chaarawi, Mohamad
Note that the DCPMM also captures small I/Os (< 4k) in addition to the DAOS metadata.
toggle quoted message
Show quoted text
That being said, for a quick test setup, you can use DRAM instead of DCPMM (use a tmpfs mount). Again this is just for testing (if you reboot of course all data / metadata in your tmpfs is gone). Thanks, Mohamad On 7/26/19, 8:56 AM, "daos@daos.groups.io on behalf of Carrier, John" <daos@daos.groups.io on behalf of john.carrier@...> wrote: DAOS metadata resides in persistent memory (DCPMM) on the same node as the block storage (NVMe drives). DAOS is an object store, not a file system. The metadata in the DCPMM are the data structures that each DAOS server maintains to access the application data stored in DAOS containers and objects on the SSDs. --jc -----Original Message-----
From: daos@daos.groups.io [mailto:daos@daos.groups.io] On Behalf Of BASDEN, ALASTAIR G. Sent: Friday, July 26, 2019 3:41 AM To: daos@daos.groups.io Subject: [daos] DAOS hardware requirements Hi, Having read https://github.com/daos-stack/daos/blob/master/doc/storage_model.md is it the case that all storage nodes must have DCPMM (apache pass) memory? I was hoping to install a test installation using a metadata server with DCPMM memory, and storage nodes with only NVMe drivers (these are not cascade lake, so no DPCMM). However, I am now unclear whether this will be possible. Thanks, Alastair. |
|
Build using Docker
Colin Ngam
Hi,
I am trying to build using Docker on Mac. I am seeing the following error:
$ docker build -t daos -f Dockerfile.centos\:7 github.com/daos-stack/daos#:utils/docker unable to prepare context: unable to 'git clone' to temporary context directory: error initializing submodules: Submodule 'scons_local' (https://github.com/daos-stack/scons_local.git) registered for path 'scons_local' Submodule 'raft' (https://github.com/daos-stack/raft.git) registered for path 'src/rdb/raft' Cloning into '/private/var/folders/b5/b52tjd4j71z15bj53sms7jrw006f5s/T/docker-build-git049590752/scons_local'... Cloning into '/private/var/folders/b5/b52tjd4j71z15bj53sms7jrw006f5s/T/docker-build-git049590752/src/rdb/raft'... error: Server does not allow request for unadvertised object 171c9b254d9c40463c77d7e7567fd93633a9d78a Fetched in submodule path 'scons_local', but it did not contain 171c9b254d9c40463c77d7e7567fd93633a9d78a. Direct fetching of that commit failed. : exit status 1
Thanks.
Colin
|
|
Re: Build using Docker
Hi Colin,
I’ve not seen that before but after cloning DAOS, the build should be doing a git submodule update which would clone scons_local and raft submodules using the urls you see in the messages. The scons_local repository has that git hash so I’m wondering if something is happening at the initial clone step. Do you have an https proxy that needs to be set perhaps?
-Jeff
From: daos@daos.groups.io [mailto:daos@daos.groups.io]
On Behalf Of Colin Ngam
Sent: Tuesday, August 13, 2019 10:56 AM To: daos@daos.groups.io Subject: [daos] Build using Docker
Hi,
I am trying to build using Docker on Mac. I am seeing the following error:
$ docker build -t daos -f Dockerfile.centos\:7 github.com/daos-stack/daos#:utils/docker unable to prepare context: unable to 'git clone' to temporary context directory: error initializing submodules: Submodule 'scons_local' (https://github.com/daos-stack/scons_local.git) registered for path 'scons_local' Submodule 'raft' (https://github.com/daos-stack/raft.git) registered for path 'src/rdb/raft' Cloning into '/private/var/folders/b5/b52tjd4j71z15bj53sms7jrw006f5s/T/docker-build-git049590752/scons_local'... Cloning into '/private/var/folders/b5/b52tjd4j71z15bj53sms7jrw006f5s/T/docker-build-git049590752/src/rdb/raft'... error: Server does not allow request for unadvertised object 171c9b254d9c40463c77d7e7567fd93633a9d78a Fetched in submodule path 'scons_local', but it did not contain 171c9b254d9c40463c77d7e7567fd93633a9d78a. Direct fetching of that commit failed. : exit status 1
Thanks.
Colin
|
|
Re: Build using Docker
Colin Ngam
Hi,
I can do this manually:
cngam@cngam daos $ git clone https://github.com/daos-stack/daos.git Cloning into 'daos'... remote: Enumerating objects: 229, done. remote: Counting objects: 100% (229/229), done. remote: Compressing objects: 100% (201/201), done. remote: Total 35637 (delta 103), reused 64 (delta 28), pack-reused 35408 Receiving objects: 100% (35637/35637), 24.55 MiB | 6.29 MiB/s, done. Resolving deltas: 100% (27005/27005), done. cngam@cngam daos $ cd daos cngam@cngam daos $ git submodule init Submodule 'scons_local' (https://github.com/daos-stack/scons_local.git) registered for path 'scons_local' Submodule 'raft' (https://github.com/daos-stack/raft.git) registered for path 'src/rdb/raft' cngam@cngam daos $ git submodule update Cloning into '/Users/cngam/DAOS/daos/scons_local'... Cloning into '/Users/cngam/DAOS/daos/src/rdb/raft'... Submodule path 'scons_local': checked out '171c9b254d9c40463c77d7e7567fd93633a9d78a' Submodule path 'src/rdb/raft': checked out 'e6c4369635d5ca8cbe71173eb4eb15fb3809510f'
So, I do not believe it is a https proxy issue.
Thanks.
Colin
From: <daos@daos.groups.io> on behalf of "Olivier, Jeffrey V" <jeffrey.v.olivier@...>
Hi Colin,
I’ve not seen that before but after cloning DAOS, the build should be doing a git submodule update which would clone scons_local and raft submodules using the urls you see in the messages. The scons_local repository has that git hash so I’m wondering if something is happening at the initial clone step. Do you have an https proxy that needs to be set perhaps?
-Jeff
From: daos@daos.groups.io [mailto:daos@daos.groups.io]
On Behalf Of Colin Ngam
Sent: Tuesday, August 13, 2019 10:56 AM To: daos@daos.groups.io Subject: [daos] Build using Docker
Hi,
I am trying to build using Docker on Mac. I am seeing the following error:
$ docker build -t daos -f Dockerfile.centos\:7 github.com/daos-stack/daos#:utils/docker unable to prepare context: unable to 'git clone' to temporary context directory: error initializing submodules: Submodule 'scons_local' (https://github.com/daos-stack/scons_local.git) registered for path 'scons_local' Submodule 'raft' (https://github.com/daos-stack/raft.git) registered for path 'src/rdb/raft' Cloning into '/private/var/folders/b5/b52tjd4j71z15bj53sms7jrw006f5s/T/docker-build-git049590752/scons_local'... Cloning into '/private/var/folders/b5/b52tjd4j71z15bj53sms7jrw006f5s/T/docker-build-git049590752/src/rdb/raft'... error: Server does not allow request for unadvertised object 171c9b254d9c40463c77d7e7567fd93633a9d78a Fetched in submodule path 'scons_local', but it did not contain 171c9b254d9c40463c77d7e7567fd93633a9d78a. Direct fetching of that commit failed. : exit status 1
Thanks.
Colin
|
|
Re: Build using Docker
Hi Colin,
Since you can do it manually, it may be quickest to do checkout outside of the container and mount the volume in the container using –v as described here: https://docs.docker.com/storage/volumes/
-Jeff
From: daos@daos.groups.io [mailto:daos@daos.groups.io]
On Behalf Of Colin Ngam
Sent: Tuesday, August 13, 2019 1:25 PM To: daos@daos.groups.io Subject: Re: [daos] Build using Docker
Hi,
I can do this manually:
cngam@cngam daos $ git clone https://github.com/daos-stack/daos.git Cloning into 'daos'... remote: Enumerating objects: 229, done. remote: Counting objects: 100% (229/229), done. remote: Compressing objects: 100% (201/201), done. remote: Total 35637 (delta 103), reused 64 (delta 28), pack-reused 35408 Receiving objects: 100% (35637/35637), 24.55 MiB | 6.29 MiB/s, done. Resolving deltas: 100% (27005/27005), done. cngam@cngam daos $ cd daos cngam@cngam daos $ git submodule init Submodule 'scons_local' (https://github.com/daos-stack/scons_local.git) registered for path 'scons_local' Submodule 'raft' (https://github.com/daos-stack/raft.git) registered for path 'src/rdb/raft' cngam@cngam daos $ git submodule update Cloning into '/Users/cngam/DAOS/daos/scons_local'... Cloning into '/Users/cngam/DAOS/daos/src/rdb/raft'... Submodule path 'scons_local': checked out '171c9b254d9c40463c77d7e7567fd93633a9d78a' Submodule path 'src/rdb/raft': checked out 'e6c4369635d5ca8cbe71173eb4eb15fb3809510f'
So, I do not believe it is a https proxy issue.
Thanks.
Colin
From: <daos@daos.groups.io> on behalf of "Olivier, Jeffrey V" <jeffrey.v.olivier@...>
Hi Colin,
I’ve not seen that before but after cloning DAOS, the build should be doing a git submodule update which would clone scons_local and raft submodules using the urls you see in the messages. The scons_local repository has that git hash so I’m wondering if something is happening at the initial clone step. Do you have an https proxy that needs to be set perhaps?
-Jeff
From:
daos@daos.groups.io [mailto:daos@daos.groups.io]
On Behalf Of Colin Ngam
Hi,
I am trying to build using Docker on Mac. I am seeing the following error:
$ docker build -t daos -f Dockerfile.centos\:7 github.com/daos-stack/daos#:utils/docker unable to prepare context: unable to 'git clone' to temporary context directory: error initializing submodules: Submodule 'scons_local' (https://github.com/daos-stack/scons_local.git) registered for path 'scons_local' Submodule 'raft' (https://github.com/daos-stack/raft.git) registered for path 'src/rdb/raft' Cloning into '/private/var/folders/b5/b52tjd4j71z15bj53sms7jrw006f5s/T/docker-build-git049590752/scons_local'... Cloning into '/private/var/folders/b5/b52tjd4j71z15bj53sms7jrw006f5s/T/docker-build-git049590752/src/rdb/raft'... error: Server does not allow request for unadvertised object 171c9b254d9c40463c77d7e7567fd93633a9d78a Fetched in submodule path 'scons_local', but it did not contain 171c9b254d9c40463c77d7e7567fd93633a9d78a. Direct fetching of that commit failed. : exit status 1
Thanks.
Colin
|
|