Re: DAOS master error when formatting tmpfs


Nabarro, Tom
 

I haven't been developing or testing with IOAT devices, Niu might be able to help with where we are at on DAOS & VFIO/IOAT SSDs. You can supply a PCI address whitelist to storage prepare but currently not blacklist.

 

Regards,

Tom Nabarro – DCG/ESAD

M: +44 (0)7786 260986

Skype: tom.nabarro

 

From: daos@daos.groups.io [mailto:daos@daos.groups.io] On Behalf Of Kevan Rehm
Sent: Wednesday, October 30, 2019 1:09 PM
To: daos@daos.groups.io
Subject: Re: [daos] DAOS master error when formatting tmpfs

 

Tom,

 

Sorry, while my symptoms were similar, my issue only relates to SSDs, not SCM.   A non-root daos daemon cannot open an IOAT device because of the /dev/sdX root permissions.  Can you provide more detail on what configurations are supported for IOAT?   

 

Suppose you have a system with a mix of NVMe and IOAT SSDs.   For NVMe we would want to enable the IOMMU to get the vfio_pci driver, and we would want to run daos daemons as non-root.   But that doesn’t work for IOAT SSDs, it appears the rule is that if there are any IOAT devices, then the daos daemons must run as root.   What about the driver rebinding, does that interfere with SPDK’s bdev driver doing AIO, do we have to blacklist IOAT drives during “storage prepare” to prevent the rebinding, or is the driver rebinding for IOAT devices harmless?

 

Thanks, Kevan

 

From: <daos@daos.groups.io> on behalf of "Nabarro, Tom" <tom.nabarro@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Friday, October 25, 2019 at 10:11 PM
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: Re: [daos] DAOS master error when formatting tmpfs

 

Are you seeing that when running as non-root? if so then do you have a mounted empty SCM available prior to starting? When running as non-root, server will not wait for format, will either create superblock and continue to start IO server if SCM is mounted otherwise it will bail.

 

Ignore ALSR message as doesn't currently cause any practical problems as far as we know.

 

We are not currently testing/developing with VFIO/IOMMU/IOAT

 

Regards,

Tom Nabarro – DCG/ESAD

M: +44 (0)7786 260986

Skype: tom.nabarro

 

From: daos@daos.groups.io [mailto:daos@daos.groups.io] On Behalf Of Kevan Rehm
Sent: Friday, October 25, 2019 10:51 PM
To: daos@daos.groups.io
Subject: Re: [daos] DAOS master error when formatting tmpfs

 

Tom,

 

I am seeing the same message that Jordan reports when running as non-root, the “daos_server start” command fails immediately, doesn’t wait for formatting:

 

no NVMe controllers found

DAOS control server listening on 0.0.0.0:10001

no NVDIMMs found!

ERROR: failed to read existing superblock: can't read superblock from unformatted storage

 

But perhaps my situation is different.   I followed your instructions and got the output below.   For storage I am using an IOAT device, I do not have NVMe devices but I am using the vfio_pci driver, the iommu is enabled.   Is non-root supported with IOAT devices?  (I don’t see any code to chown /dev/sdX to user daos in setup.sh.)  If not supported, sorry for the interruption, I’ll switch to root.

 

Thanks, Kevan

 

P.S. Should I be worried about the ASLR message?

 

 

-bash-4.2$ daos/install/bin/orterun -np 1 -H localhost --report-uri /tmp/urifile daos_server start -t 1 -d /tmp/ -o /home/users/daos/daos/utils/config/examples/daos_server_local.yml

daos_server logging to file /tmp/daos_control.log

Starting SPDK v18.07-pre / DPDK 18.02.0 initialization...

[ DPDK EAL parameters: spdk -c 0x1 --file-prefix=spdk1327119562 --base-virtaddr=0x200000000000 --proc-type=auto ]

EAL: Detected 32 lcore(s)

EAL: Auto-detected process type: PRIMARY

EAL: No free hugepages reported in hugepages-1048576kB

EAL: Multi-process socket /home/users/daos/.spdk1327119562_unix

EAL: Probing VFIO support...

EAL: Cannot obtain physical addresses: No such file or directory. Only vfio will function.

no NVMe controllers found

DAOS control server listening on 0.0.0.0:10001

no NVDIMMs found!

Starting I/O server instance 0: /home/users/daos/daos/install/bin/daos_io_server

daos_io_server:0 Using legacy core allocation algorithm

daos_io_server:0 Starting SPDK v18.07-pre / DPDK 18.02.0 initialization...

[ DPDK EAL parameters: daos -c 0x1 --file-prefix=spdk1327119562 --base-virtaddr=0x200000000000 --proc-type=auto ]

ERROR: daos_io_server:0 EAL: Detected 32 lcore(s)

ERROR: daos_io_server:0 EAL: Auto-detected process type: SECONDARY

daos_io_server:0 EAL: Multi-process socket /home/users/daos/.spdk1327119562_unix_15634_10ea63cd2a562

daos_io_server:0 EAL: Probing VFIO support...

daos_io_server:0 EAL: WARNING: Address Space Layout Randomization (ASLR) is enabled in the kernel.

daos_io_server:0 EAL:    This may cause issues with mapping memory into secondary processes

daos_io_server:0 EAL: Cannot obtain physical addresses: No such file or directory. Only vfio will function.

ERROR: daos_io_server:0 bdev_aio.c:  83:bdev_aio_open: *ERROR*: open() failed (file:/dev/sdb), errno 13: Permission denied

ERROR: daos_io_server:0 bdev_aio.c: 470:create_aio_disk: *ERROR*: Unable to open file /dev/sdb. fd: -1 errno: 13

bdev_aio.c: 599:bdev_aio_initialize: *ERROR*: Unable to create AIO bdev from file /dev/sdb

ERROR: DAOS I/O Server exited with error: /home/users/daos/daos/install/bin/daos_io_server (instance 0) exited: exit status 1

-------------------------------------------------------

Primary job  terminated normally, but 1 process returned

a non-zero exit code. Per user-direction, the job has been aborted.

-------------------------------------------------------

--------------------------------------------------------------------------

orterun detected that one or more processes exited with non-zero status, thus causing

the job to be terminated. The first process to do so was:

 

  Process name: [[21676,1],0]

  Exit code:    1

--------------------------------------------------------------------------

 

 

From: <daos@daos.groups.io> on behalf of "Nabarro, Tom" <tom.nabarro@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Friday, October 25, 2019 at 12:37 PM
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: Re: [daos] DAOS master error when formatting tmpfs

 

Let’s focus on the non-root case to start with, after you have manually created empty /mnt/daos and mounted tmpfs, change permissions to 777 (just for this experiment) on the SCM directory and run daos_server with control_log_mask: DEBUG (please also paste config file). The superblock should be created and the IO server start.

 

Thanks

 

Regards,

Tom Nabarro – DCG/ESAD

M: +44 (0)7786 260986

Skype: tom.nabarro

 

From: daos@daos.groups.io [mailto:daos@daos.groups.io] On Behalf Of Jordan Henderson
Sent: Friday, October 25, 2019 5:00 PM
To: daos@daos.groups.io
Subject: Re: [daos] DAOS master error when formatting tmpfs

 

Hi Tom,

 

in general I don't usually run as root, but in this case it did seem to be the only way that I could get the server to wait for storage formatting. However, even when I started from a clean slate for the tmpfs mount, as per these instructions, it didn't seem to matter whether I manually mounted the tmpfs myself or allowed the storage formatting to do so. When running as root, the storage format appeared to be successful in both cases, but the server still immediately returned an error for the storage not being formatted and then exited. When running as non-root, the server didn't wait for storage formatting in either case.

 

It might be worth noting that after each successful format command, my tmpfs mount still didn't contain a superblock file. I'm guessing that this is probably why the server is returning a formatting error right after the storage format command?

 


From: daos@daos.groups.io <daos@daos.groups.io> on behalf of Nabarro, Tom via Groups.Io <tom.nabarro@...>
Sent: Friday, October 25, 2019 8:58 AM
To: Chaarawi, Mohamad <mohamad.chaarawi@...>; Jordan Henderson <jhenderson@...>
Cc: daos@daos.groups.io <daos@daos.groups.io>
Subject: Re: [daos] DAOS master error when formatting tmpfs

 

Hello Jordan

 

If running as root, could you please try “umount /mnt/daos; rm -rf /mnt/daos” then start server which should wait for format, then format from daos_shell.

 

$  umount /mnt/daos; rm -rf /mnt/daos

$  orterun -N 1 -H localhost --report-uri /tmp/urifile --allow-run-as-root daos_server start -t 1 -o ~tanabarr/projects/daos_m/utils/config/examples/daos_server_sockets.yml -i

$ daos_shell -i storage format

 

If running as non-root, the following works for me:

 

$  sudo umount /mnt/daos; sudo rm -rf /mnt/daos; sudo mkdir /mnt/daos; sudo mount -t tmpfs -o size=64G tmpfs /mnt/daos

$ orterun -np 1 -H localhost --report-uri /tmp/urifile daos_server start -t 1 -d /tmp/ -o ~tanabarr/projects/daos_m/utils/config/examples/daos_server_sockets.yml –i

 

Above tried on commit 15168685005843766c038afff45fd6681c07f341 .

86f730a37d0170fdb733c1b08308162a245e5aea did introduce changes to the formatting code but I haven’t observed any subsequent regressions in my testing, it may just need a clean slate in your situation.

 

Thanks

 

Regards,

Tom Nabarro – DCG/ESAD

M: +44 (0)7786 260986

Skype: tom.nabarro

 

From: Chaarawi, Mohamad
Sent: Friday, October 25, 2019 2:24 PM
To: Jordan Henderson <jhenderson@...>
Cc: Nabarro, Tom <tom.nabarro@...>
Subject: Re: DAOS master error when formatting tmpfs

 

Jordan, it would be great to send such emails to the DAOS user list where there are more people (from the control plane) who can answer. Please subscribe here:

https://daos.groups.io/g/daos

 

did you clear your tmpfs beforehand?

Im not sure why you would be getting this over tmpfs, but I never start the server as root.

 

Mohamad

 

 

From: Jordan Henderson <jhenderson@...>
Date: Thursday, October 24, 2019 at 10:22 PM
To: "Chaarawi, Mohamad" <mohamad.chaarawi@...>
Subject: DAOS master error when formatting tmpfs

 

Hi Mohamad,

 

are you aware of any bugs with the current DAOS master (commit 15168685005843766c038afff45fd6681c07f341) and trying to format storage when the SCM is emulated through a tmpfs? I tried updating my DAOS to the latest master but when I tried to start DAOS I got the error below. If I started the server as root it then waited for me to format the storage, but when I did so the format command returned success and then the server immediately returned the same error as below and exited. Once I switched to the v0.6 tag (which was also ahead of my current install), I was able to run the server as non-root, format the tmpfs and start the server fine.

 

DEBUG 21:03:05.903946 instance.go:199: /home/jhenderson/Work/DAOS_Workspace/daos_fs/ (ram) needs format: true

ERROR: failed to read existing superblock: can't read superblock from unformatted storage

DEBUG 21:03:05.904215 main.go:67: can't read superblock from unformatted storage

github.com/daos-stack/daos/src/control/server.(*IOServerInstance).ReadSuperblock

/home/jhenderson/git/daos/build/src/control/src/github.com/daos-stack/daos/src/control/server/superblock.go:176

github.com/daos-stack/daos/src/control/server.(*IOServerInstance).NeedsSuperblock

/home/jhenderson/git/daos/build/src/control/src/github.com/daos-stack/daos/src/control/server/superblock.go:110

github.com/daos-stack/daos/src/control/server.(*IOServerHarness).CreateSuperblocks

/home/jhenderson/git/daos/build/src/control/src/github.com/daos-stack/daos/src/control/server/harness.go:118

github.com/daos-stack/daos/src/control/server.Start

/home/jhenderson/git/daos/build/src/control/src/github.com/daos-stack/daos/src/control/server/server.go:166

main.(*startCmd).Execute

/home/jhenderson/git/daos/build/src/control/src/github.com/daos-stack/daos/src/control/cmd/daos_server/start.go:167

main.parseOpts.func1

/home/jhenderson/git/daos/build/src/control/src/github.com/daos-stack/daos/src/control/cmd/daos_server/main.go:103

github.com/daos-stack/daos/src/control/vendor/github.com/jessevdk/go-flags.(*Parser).ParseArgs

/home/jhenderson/git/daos/build/src/control/src/github.com/daos-stack/daos/src/control/vendor/github.com/jessevdk/go-flags/parser.go:314

main.parseOpts

/home/jhenderson/git/daos/build/src/control/src/github.com/daos-stack/daos/src/control/cmd/daos_server/main.go:111

main.main

/home/jhenderson/git/daos/build/src/control/src/github.com/daos-stack/daos/src/control/cmd/daos_server/main.go:123

runtime.main

/usr/lib64/go1.11.9/go/src/runtime/proc.go:201

runtime.goexit

/usr/lib64/go1.11.9/go/src/runtime/asm_amd64.s:1333

failed to read existing superblock

github.com/daos-stack/daos/src/control/server.(*IOServerInstance).NeedsSuperblock

/home/jhenderson/git/daos/build/src/control/src/github.com/daos-stack/daos/src/control/server/superblock.go:117

github.com/daos-stack/daos/src/control/server.(*IOServerHarness).CreateSuperblocks

/home/jhenderson/git/daos/build/src/control/src/github.com/daos-stack/daos/src/control/server/harness.go:118

github.com/daos-stack/daos/src/control/server.Start

/home/jhenderson/git/daos/build/src/control/src/github.com/daos-stack/daos/src/control/server/server.go:166

main.(*startCmd).Execute

/home/jhenderson/git/daos/build/src/control/src/github.com/daos-stack/daos/src/control/cmd/daos_server/start.go:167

main.parseOpts.func1

/home/jhenderson/git/daos/build/src/control/src/github.com/daos-stack/daos/src/control/cmd/daos_server/main.go:103

github.com/daos-stack/daos/src/control/vendor/github.com/jessevdk/go-flags.(*Parser).ParseArgs

/home/jhenderson/git/daos/build/src/control/src/github.com/daos-stack/daos/src/control/vendor/github.com/jessevdk/go-flags/parser.go:314

main.parseOpts

/home/jhenderson/git/daos/build/src/control/src/github.com/daos-stack/daos/src/control/cmd/daos_server/main.go:111

main.main

/home/jhenderson/git/daos/build/src/control/src/github.com/daos-stack/daos/src/control/cmd/daos_server/main.go:123

runtime.main

/usr/lib64/go1.11.9/go/src/runtime/proc.go:201

runtime.goexit

/usr/lib64/go1.11.9/go/src/runtime/asm_amd64.s:1333

---------------------------------------------------------------------
Intel Corporation (UK) Limited
Registered No. 1134945 (England)
Registered Office: Pipers Way, Swindon SN3 1RJ
VAT No: 860 2173 47

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

---------------------------------------------------------------------
Intel Corporation (UK) Limited
Registered No. 1134945 (England)
Registered Office: Pipers Way, Swindon SN3 1RJ
VAT No: 860 2173 47

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

---------------------------------------------------------------------
Intel Corporation (UK) Limited
Registered No. 1134945 (England)
Registered Office: Pipers Way, Swindon SN3 1RJ
VAT No: 860 2173 47

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

---------------------------------------------------------------------
Intel Corporation (UK) Limited
Registered No. 1134945 (England)
Registered Office: Pipers Way, Swindon SN3 1RJ
VAT No: 860 2173 47

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

Join daos@daos.groups.io to automatically receive all group messages.