Re: DAOS master error when formatting tmpfs
I haven't been developing or testing with IOAT devices, Niu might be able to help with where we are at on DAOS & VFIO/IOAT SSDs. You can supply a PCI address whitelist to storage prepare but currently not blacklist.
Regards, Tom Nabarro – DCG/ESAD M: +44 (0)7786 260986 Skype: tom.nabarro
From: daos@daos.groups.io [mailto:daos@daos.groups.io]
On Behalf Of Kevan Rehm
Sent: Wednesday, October 30, 2019 1:09 PM To: daos@daos.groups.io Subject: Re: [daos] DAOS master error when formatting tmpfs
Tom,
Sorry, while my symptoms were similar, my issue only relates to SSDs, not SCM. A non-root daos daemon cannot open an IOAT device because of the /dev/sdX root permissions. Can you provide more detail on what configurations are supported for IOAT?
Suppose you have a system with a mix of NVMe and IOAT SSDs. For NVMe we would want to enable the IOMMU to get the vfio_pci driver, and we would want to run daos daemons as non-root. But that doesn’t work for IOAT SSDs, it appears the rule is that if there are any IOAT devices, then the daos daemons must run as root. What about the driver rebinding, does that interfere with SPDK’s bdev driver doing AIO, do we have to blacklist IOAT drives during “storage prepare” to prevent the rebinding, or is the driver rebinding for IOAT devices harmless?
Thanks, Kevan
From:
<daos@daos.groups.io> on behalf of "Nabarro, Tom" <tom.nabarro@...>
Are you seeing that when running as non-root? if so then do you have a mounted empty SCM available prior to starting? When running as non-root, server will not wait for format, will either create superblock and continue to start IO server if SCM is mounted otherwise it will bail.
Ignore ALSR message as doesn't currently cause any practical problems as far as we know.
We are not currently testing/developing with VFIO/IOMMU/IOAT
Regards, Tom Nabarro – DCG/ESAD M: +44 (0)7786 260986 Skype: tom.nabarro
From:
daos@daos.groups.io [mailto:daos@daos.groups.io]
On Behalf Of Kevan Rehm
Tom,
I am seeing the same message that Jordan reports when running as non-root, the “daos_server start” command fails immediately, doesn’t wait for formatting:
no NVMe controllers found DAOS control server listening on 0.0.0.0:10001 no NVDIMMs found! ERROR: failed to read existing superblock: can't read superblock from unformatted storage
But perhaps my situation is different. I followed your instructions and got the output below. For storage I am using an IOAT device, I do not have NVMe devices but I am using the vfio_pci driver, the iommu is enabled. Is non-root supported with IOAT devices? (I don’t see any code to chown /dev/sdX to user daos in setup.sh.) If not supported, sorry for the interruption, I’ll switch to root.
Thanks, Kevan
P.S. Should I be worried about the ASLR message?
-bash-4.2$ daos/install/bin/orterun -np 1 -H localhost --report-uri /tmp/urifile daos_server start -t 1 -d /tmp/ -o /home/users/daos/daos/utils/config/examples/daos_server_local.yml daos_server logging to file /tmp/daos_control.log Starting SPDK v18.07-pre / DPDK 18.02.0 initialization... [ DPDK EAL parameters: spdk -c 0x1 --file-prefix=spdk1327119562 --base-virtaddr=0x200000000000 --proc-type=auto ] EAL: Detected 32 lcore(s) EAL: Auto-detected process type: PRIMARY EAL: No free hugepages reported in hugepages-1048576kB EAL: Multi-process socket /home/users/daos/.spdk1327119562_unix EAL: Probing VFIO support... EAL: Cannot obtain physical addresses: No such file or directory. Only vfio will function. no NVMe controllers found DAOS control server listening on 0.0.0.0:10001 no NVDIMMs found! Starting I/O server instance 0: /home/users/daos/daos/install/bin/daos_io_server daos_io_server:0 Using legacy core allocation algorithm daos_io_server:0 Starting SPDK v18.07-pre / DPDK 18.02.0 initialization... [ DPDK EAL parameters: daos -c 0x1 --file-prefix=spdk1327119562 --base-virtaddr=0x200000000000 --proc-type=auto ] ERROR: daos_io_server:0 EAL: Detected 32 lcore(s) ERROR: daos_io_server:0 EAL: Auto-detected process type: SECONDARY daos_io_server:0 EAL: Multi-process socket /home/users/daos/.spdk1327119562_unix_15634_10ea63cd2a562 daos_io_server:0 EAL: Probing VFIO support... daos_io_server:0 EAL: WARNING: Address Space Layout Randomization (ASLR) is enabled in the kernel. daos_io_server:0 EAL: This may cause issues with mapping memory into secondary processes daos_io_server:0 EAL: Cannot obtain physical addresses: No such file or directory. Only vfio will function. ERROR: daos_io_server:0 bdev_aio.c: 83:bdev_aio_open: *ERROR*: open() failed (file:/dev/sdb), errno 13: Permission denied ERROR: daos_io_server:0 bdev_aio.c: 470:create_aio_disk: *ERROR*: Unable to open file /dev/sdb. fd: -1 errno: 13 bdev_aio.c: 599:bdev_aio_initialize: *ERROR*: Unable to create AIO bdev from file /dev/sdb ERROR: DAOS I/O Server exited with error: /home/users/daos/daos/install/bin/daos_io_server (instance 0) exited: exit status 1 ------------------------------------------------------- Primary job terminated normally, but 1 process returned a non-zero exit code. Per user-direction, the job has been aborted. ------------------------------------------------------- -------------------------------------------------------------------------- orterun detected that one or more processes exited with non-zero status, thus causing the job to be terminated. The first process to do so was:
Process name: [[21676,1],0] Exit code: 1 --------------------------------------------------------------------------
From:
<daos@daos.groups.io> on behalf of "Nabarro, Tom" <tom.nabarro@...>
Let’s focus on the non-root case to start with, after you have manually created empty /mnt/daos and mounted tmpfs, change permissions to 777 (just for this experiment) on the SCM directory and run daos_server with control_log_mask: DEBUG (please also paste config file). The superblock should be created and the IO server start.
Thanks
Regards, Tom Nabarro – DCG/ESAD M: +44 (0)7786 260986 Skype: tom.nabarro
From:
daos@daos.groups.io [mailto:daos@daos.groups.io]
On Behalf Of Jordan Henderson
Hi Tom,
in general I don't usually run as root, but in this case it did seem to be the only way that I could get the server to wait for storage formatting. However, even when I started from a clean slate for the tmpfs mount, as per these instructions, it didn't seem to matter whether I manually mounted the tmpfs myself or allowed the storage formatting to do so. When running as root, the storage format appeared to be successful in both cases, but the server still immediately returned an error for the storage not being formatted and then exited. When running as non-root, the server didn't wait for storage formatting in either case.
It might be worth noting that after each successful format command, my tmpfs mount still didn't contain a superblock file. I'm guessing that this is probably why the server is returning a formatting error right after the storage format command?
From:
daos@daos.groups.io <daos@daos.groups.io> on behalf of Nabarro, Tom via Groups.Io <tom.nabarro@...>
Hello Jordan
If running as root, could you please try “umount /mnt/daos; rm -rf /mnt/daos” then start server which should wait for format, then format from daos_shell.
$ umount /mnt/daos; rm -rf /mnt/daos $ orterun -N 1 -H localhost --report-uri /tmp/urifile --allow-run-as-root daos_server start -t 1 -o ~tanabarr/projects/daos_m/utils/config/examples/daos_server_sockets.yml -i $ daos_shell -i storage format
If running as non-root, the following works for me:
$ sudo umount /mnt/daos; sudo rm -rf /mnt/daos; sudo mkdir /mnt/daos; sudo mount -t tmpfs -o size=64G tmpfs /mnt/daos $ orterun -np 1 -H localhost --report-uri /tmp/urifile daos_server start -t 1 -d /tmp/ -o ~tanabarr/projects/daos_m/utils/config/examples/daos_server_sockets.yml –i
Above tried on commit 15168685005843766c038afff45fd6681c07f341 . 86f730a37d0170fdb733c1b08308162a245e5aea did introduce changes to the formatting code but I haven’t observed any subsequent regressions in my testing, it may just need a clean slate in your situation.
Thanks
Regards, Tom Nabarro – DCG/ESAD M: +44 (0)7786 260986 Skype: tom.nabarro
From: Chaarawi, Mohamad
Jordan, it would be great to send such emails to the DAOS user list where there are more people (from the control plane) who can answer. Please subscribe here:
did you clear your tmpfs beforehand? Im not sure why you would be getting this over tmpfs, but I never start the server as root.
Mohamad
From:
Jordan Henderson <jhenderson@...>
Hi Mohamad,
are you aware of any bugs with the current DAOS master (commit 15168685005843766c038afff45fd6681c07f341) and trying to format storage when the SCM is emulated through a tmpfs? I tried updating my DAOS to the latest master but when I tried to start DAOS I got the error below. If I started the server as root it then waited for me to format the storage, but when I did so the format command returned success and then the server immediately returned the same error as below and exited. Once I switched to the v0.6 tag (which was also ahead of my current install), I was able to run the server as non-root, format the tmpfs and start the server fine.
DEBUG 21:03:05.903946 instance.go:199: /home/jhenderson/Work/DAOS_Workspace/daos_fs/ (ram) needs format: true ERROR: failed to read existing superblock: can't read superblock from unformatted storage DEBUG 21:03:05.904215 main.go:67: can't read superblock from unformatted storage github.com/daos-stack/daos/src/control/server.(*IOServerInstance).ReadSuperblock /home/jhenderson/git/daos/build/src/control/src/github.com/daos-stack/daos/src/control/server/superblock.go:176 github.com/daos-stack/daos/src/control/server.(*IOServerInstance).NeedsSuperblock /home/jhenderson/git/daos/build/src/control/src/github.com/daos-stack/daos/src/control/server/superblock.go:110 github.com/daos-stack/daos/src/control/server.(*IOServerHarness).CreateSuperblocks /home/jhenderson/git/daos/build/src/control/src/github.com/daos-stack/daos/src/control/server/harness.go:118 github.com/daos-stack/daos/src/control/server.Start /home/jhenderson/git/daos/build/src/control/src/github.com/daos-stack/daos/src/control/server/server.go:166 main.(*startCmd).Execute /home/jhenderson/git/daos/build/src/control/src/github.com/daos-stack/daos/src/control/cmd/daos_server/start.go:167 main.parseOpts.func1 /home/jhenderson/git/daos/build/src/control/src/github.com/daos-stack/daos/src/control/cmd/daos_server/main.go:103 github.com/daos-stack/daos/src/control/vendor/github.com/jessevdk/go-flags.(*Parser).ParseArgs /home/jhenderson/git/daos/build/src/control/src/github.com/daos-stack/daos/src/control/vendor/github.com/jessevdk/go-flags/parser.go:314 main.parseOpts /home/jhenderson/git/daos/build/src/control/src/github.com/daos-stack/daos/src/control/cmd/daos_server/main.go:111 main.main /home/jhenderson/git/daos/build/src/control/src/github.com/daos-stack/daos/src/control/cmd/daos_server/main.go:123 runtime.main /usr/lib64/go1.11.9/go/src/runtime/proc.go:201 runtime.goexit /usr/lib64/go1.11.9/go/src/runtime/asm_amd64.s:1333 failed to read existing superblock github.com/daos-stack/daos/src/control/server.(*IOServerInstance).NeedsSuperblock /home/jhenderson/git/daos/build/src/control/src/github.com/daos-stack/daos/src/control/server/superblock.go:117 github.com/daos-stack/daos/src/control/server.(*IOServerHarness).CreateSuperblocks /home/jhenderson/git/daos/build/src/control/src/github.com/daos-stack/daos/src/control/server/harness.go:118 github.com/daos-stack/daos/src/control/server.Start /home/jhenderson/git/daos/build/src/control/src/github.com/daos-stack/daos/src/control/server/server.go:166 main.(*startCmd).Execute /home/jhenderson/git/daos/build/src/control/src/github.com/daos-stack/daos/src/control/cmd/daos_server/start.go:167 main.parseOpts.func1 /home/jhenderson/git/daos/build/src/control/src/github.com/daos-stack/daos/src/control/cmd/daos_server/main.go:103 github.com/daos-stack/daos/src/control/vendor/github.com/jessevdk/go-flags.(*Parser).ParseArgs /home/jhenderson/git/daos/build/src/control/src/github.com/daos-stack/daos/src/control/vendor/github.com/jessevdk/go-flags/parser.go:314 main.parseOpts /home/jhenderson/git/daos/build/src/control/src/github.com/daos-stack/daos/src/control/cmd/daos_server/main.go:111 main.main /home/jhenderson/git/daos/build/src/control/src/github.com/daos-stack/daos/src/control/cmd/daos_server/main.go:123 runtime.main /usr/lib64/go1.11.9/go/src/runtime/proc.go:201 runtime.goexit /usr/lib64/go1.11.9/go/src/runtime/asm_amd64.s:1333 --------------------------------------------------------------------- This e-mail and any attachments may contain confidential material for --------------------------------------------------------------------- This e-mail and any attachments may contain confidential material for --------------------------------------------------------------------- This e-mail and any attachments may contain confidential material for --------------------------------------------------------------------- This e-mail and any attachments may contain confidential material for
|
|