Date   

Re: Meaning and functionality of "target" in DAOS

Lombardi, Johann
 

Hi Patrick,

 

Your understanding is correct. The granularity of allocation when creating (or extending) a pool is a whole DAOS server (i.e. effectively all targets exported by the server). That’s mostly for simplicity purpose. The membership (i.e. the pool map) has individual entries for each target with a status (see struct pool_comp_state). When a storage server fails, all targets hosted by this server are gone and will be rebuilt on the surviving servers. If one SCM DIMM fails, we lose the whole DAX filesystem and thus the whole server. If one SSD fails, only a subset of the targets will be impacted and rebuilt. We are currently working on SSD hotplug to allow a failed SSD to be replaced and the impacted targets to be reintegrated in the pool map.

 

Historically, DAOS first implemented SCM support (2015-2017) and, at that time, the pool map had one entry per server. The pool map was then modified to have one entry per target when NVMe SSD support was added in early 2018.

 

HTH

 

Cheers,

Johann

 

From: <daos@daos.groups.io> on behalf of Patrick Farrell <paf@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Saturday 2 November 2019 at 16:03
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: Re: [daos] Meaning and functionality of "target" in DAOS

 

Clarification - I'm talking about the high level documentation describing the storage model, not any specific utility or API documentation.  (ie, https://daos-stack.github.io/overview/storage/)


From: daos@daos.groups.io <daos@daos.groups.io> on behalf of Patrick Farrell <paf@...>
Sent: Saturday, November 2, 2019 10:01 AM
To: daos@daos.groups.io <daos@daos.groups.io>
Subject: [daos] Meaning and functionality of "target" in DAOS

 

I'm struggling a bit with some terminology.  The documentation talks about adding and removing targets from a pool, and the pool create API takes a target list, etc.  However, the contents of this target list is actually a list of ranks, and looking at the help for pool creation with the command line tool, the argument is "ranks".

After looking at code and talking this over with folks here, I think I now understand that while a target is the unit of fault, etc, it is only possible to manipulate pool membership at the level of *ranks*, ie, full nodes.  So if a node has, say, 8 targets on it, it is *not* possible to add only 1 of those to a pool.

 

Is this correct?  Is it possible to get some background on this - Was there a switch from working with targets individually to all the targets on a node?  (I certainly understand it's possible both for the target to be the base unit of fault, performance, etc, and for it to also not be possible to address only *some* targets on a node in pool management.)

 

Regards,

Patrick

---------------------------------------------------------------------
Intel Corporation SAS (French simplified joint stock company)
Registered headquarters: "Les Montalets"- 2, rue de Paris,
92196 Meudon Cedex, France
Registration Number:  302 456 199 R.C.S. NANTERRE
Capital: 4,572,000 Euros

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


Re: Meaning and functionality of "target" in DAOS

Patrick Farrell <paf@...>
 

Clarification - I'm talking about the high level documentation describing the storage model, not any specific utility or API documentation.  (ie, https://daos-stack.github.io/overview/storage/)


From: daos@daos.groups.io <daos@daos.groups.io> on behalf of Patrick Farrell <paf@...>
Sent: Saturday, November 2, 2019 10:01 AM
To: daos@daos.groups.io <daos@daos.groups.io>
Subject: [daos] Meaning and functionality of "target" in DAOS
 
I'm struggling a bit with some terminology.  The documentation talks about adding and removing targets from a pool, and the pool create API takes a target list, etc.  However, the contents of this target list is actually a list of ranks, and looking at the help for pool creation with the command line tool, the argument is "ranks".
After looking at code and talking this over with folks here, I think I now understand that while a target is the unit of fault, etc, it is only possible to manipulate pool membership at the level of *ranks*, ie, full nodes.  So if a node has, say, 8 targets on it, it is *not* possible to add only 1 of those to a pool.
 
Is this correct?  Is it possible to get some background on this - Was there a switch from working with targets individually to all the targets on a node?  (I certainly understand it's possible both for the target to be the base unit of fault, performance, etc, and for it to also not be possible to address only *some* targets on a node in pool management.)
 
Regards,
Patrick


Meaning and functionality of "target" in DAOS

Patrick Farrell <paf@...>
 

I'm struggling a bit with some terminology.  The documentation talks about adding and removing targets from a pool, and the pool create API takes a target list, etc.  However, the contents of this target list is actually a list of ranks, and looking at the help for pool creation with the command line tool, the argument is "ranks".
After looking at code and talking this over with folks here, I think I now understand that while a target is the unit of fault, etc, it is only possible to manipulate pool membership at the level of *ranks*, ie, full nodes.  So if a node has, say, 8 targets on it, it is *not* possible to add only 1 of those to a pool.
 
Is this correct?  Is it possible to get some background on this - Was there a switch from working with targets individually to all the targets on a node?  (I certainly understand it's possible both for the target to be the base unit of fault, performance, etc, and for it to also not be possible to address only *some* targets on a node in pool management.)
 
Regards,
Patrick


Re: DAOS master error when formatting tmpfs

Nabarro, Tom
 

I tried the following with a modified scm_mount on current master (f6a34514a45eb3fb38dc43771d11c9f61e510516):

 

With non-root + manual mount

 

diff --git a/utils/config/examples/daos_server_sockets.yml b/utils/config/examples/daos_server_sockets.yml

-  scm_mount: /mnt/daos # map to -s /mnt/daos

+  scm_mount: /mnt/daos0        # map to -s /mnt/daos

 

$  sudo umount /mnt/daos; sudo rm -rf /mnt/daos; sudo mkdir /mnt/daos0; sudo mount -t tmpfs -o size=64G tmpfs /mnt/daos0

$  orterun -np 1 -H boro-45 --report-uri /tmp/urifile daos_server start -t 1 -d /tmp/ -o $(pwd)/utils/config/examples/daos_server_sockets.yml -i

 

With root + format

 

tty no. 1:

 

root@boro-45:/home/tanabarr/projects/daos_m$ umount /mnt/daos0; rm -rf /mnt/daos0

root@boro-45:/home/tanabarr/projects/daos_m$ orterun -N 1 -H localhost --report-uri /tmp/urifile --allow-run-as-root daos_server start -t 1 -o ~tanabarr/projects/daos_m/utils/config/examples/daos_server_sockets.yml –I –a /tmp/

 

tty no. 2:

 

source scons_local/utils/setup_local.sh

dmg -i storage format

 

mkdir /var/run/daos_agent/

daos_agent -i &

export OFI_INTERFACE=eth0; export CRT_PHY_ADDR_STR=ofi+sockets; export DAOS_SINGLETON_CLI=1; export CRT_ATTACH_INFO_PATH=/tmp;

 

root@boro-45:/home/tanabarr/projects/daos_m$ dmg --debug -i pool create -s 1G -n 0G

DEBUG 15:54:19.265895 main.go:139: debug output enabled

DEBUG 15:54:19.266281 config.go:129: DAOS Client config read from /home/tanabarr/projects/daos_m/install/etc/daos.yml

Active connections: [localhost:10001]

Creating DAOS pool with 1GB SCM and 0B NvMe storage (1.000 ratio)

DEBUG 15:54:19.367167 pool.go:76: Create DAOS pool request: scmbytes:1073741824 numsvcreps:1 user:"root@" usergroup:"root@" uuid:"855f5015-7d57-4b6c-9bf1-e658239a5bb8" sys:"daos_server"

DEBUG 15:54:19.828667 pool.go:83: Create DAOS pool response: svcreps:"0"

Pool-create command SUCCEEDED: UUID: 855f5015-7d57-4b6c-9bf1-e658239a5bb8, Service replicas: 0

 

root@boro-45:/home/tanabarr/projects/daos_m$ orterun -np 1 --allow-run-as-root --ompi-server file:/tmp/urifile daos pool query --pool=925d742e-dc0c-4128-9c4e-3f7e5c0cf9da --svc=0

Pool 925d742e-dc0c-4128-9c4e-3f7e5c0cf9da, ntarget=1, disabled=0

Pool space info:

- Target(VOS) count:1

- SCM:

  Total size: 1073741824

  Free: 1073741504, min:1073741504, max:1073741504, mean:1073741504

- NVMe:

  Total size: 0

  Free: 0, min:0, max:0, mean:0

Rebuild idle, 0 objs, 0 recs

 

root@boro-45:/home/tanabarr/projects/daos_m$ orterun -np 1 --allow-run-as-root --ompi-server file:/tmp/urifile daos cont create --pool=925d742e-dc0c-4128-9c4e-3f7e5c0cf9da --svc='uuidgen'

Successfully created container 62f09a35-8a82-4f87-8a15-e78a357c9e31

 

Regards,

Tom Nabarro – DCG/ESAD

M: +44 (0)7786 260986

Skype: tom.nabarro

 

From: daos@daos.groups.io [mailto:daos@daos.groups.io] On Behalf Of Jordan Henderson
Sent: Tuesday, October 29, 2019 9:42 PM
To: daos@daos.groups.io
Subject: Re: [daos] DAOS master error when formatting tmpfs

 

Hi Tom,

 

I had a little more time to play around with this and it looks like I only encounter the issue if the 'scm_mount_path' or 'scm_mount' that I use in my server configuration file differs from '/mnt/daos'. When it is set to the default '/mnt/daos' I don't have any problems, but if I set it to something else I encounter the error about unformatted storage.


From: daos@daos.groups.io <daos@daos.groups.io> on behalf of Nabarro, Tom via Groups.Io <tom.nabarro@...>
Sent: Friday, October 25, 2019 12:37 PM
To: daos@daos.groups.io <daos@daos.groups.io>
Subject: Re: [daos] DAOS master error when formatting tmpfs

 

Let’s focus on the non-root case to start with, after you have manually created empty /mnt/daos and mounted tmpfs, change permissions to 777 (just for this experiment) on the SCM directory and run daos_server with control_log_mask: DEBUG (please also paste config file). The superblock should be created and the IO server start.

 

Thanks

 

Regards,

Tom Nabarro – DCG/ESAD

M: +44 (0)7786 260986

Skype: tom.nabarro

 

From: daos@daos.groups.io [mailto:daos@daos.groups.io] On Behalf Of Jordan Henderson
Sent: Friday, October 25, 2019 5:00 PM
To: daos@daos.groups.io
Subject: Re: [daos] DAOS master error when formatting tmpfs

 

Hi Tom,

 

in general I don't usually run as root, but in this case it did seem to be the only way that I could get the server to wait for storage formatting. However, even when I started from a clean slate for the tmpfs mount, as per these instructions, it didn't seem to matter whether I manually mounted the tmpfs myself or allowed the storage formatting to do so. When running as root, the storage format appeared to be successful in both cases, but the server still immediately returned an error for the storage not being formatted and then exited. When running as non-root, the server didn't wait for storage formatting in either case.

 

It might be worth noting that after each successful format command, my tmpfs mount still didn't contain a superblock file. I'm guessing that this is probably why the server is returning a formatting error right after the storage format command?

 


From: daos@daos.groups.io <daos@daos.groups.io> on behalf of Nabarro, Tom via Groups.Io <tom.nabarro@...>
Sent: Friday, October 25, 2019 8:58 AM
To: Chaarawi, Mohamad <mohamad.chaarawi@...>; Jordan Henderson <jhenderson@...>
Cc: daos@daos.groups.io <daos@daos.groups.io>
Subject: Re: [daos] DAOS master error when formatting tmpfs

 

Hello Jordan

 

If running as root, could you please try “umount /mnt/daos; rm -rf /mnt/daos” then start server which should wait for format, then format from daos_shell.

 

$  umount /mnt/daos; rm -rf /mnt/daos

$  orterun -N 1 -H localhost --report-uri /tmp/urifile --allow-run-as-root daos_server start -t 1 -o ~tanabarr/projects/daos_m/utils/config/examples/daos_server_sockets.yml -i

$ daos_shell -i storage format

 

If running as non-root, the following works for me:

 

$  sudo umount /mnt/daos; sudo rm -rf /mnt/daos; sudo mkdir /mnt/daos; sudo mount -t tmpfs -o size=64G tmpfs /mnt/daos

$ orterun -np 1 -H localhost --report-uri /tmp/urifile daos_server start -t 1 -d /tmp/ -o ~tanabarr/projects/daos_m/utils/config/examples/daos_server_sockets.yml –i

 

Above tried on commit 15168685005843766c038afff45fd6681c07f341 .

86f730a37d0170fdb733c1b08308162a245e5aea did introduce changes to the formatting code but I haven’t observed any subsequent regressions in my testing, it may just need a clean slate in your situation.

 

Thanks

 

Regards,

Tom Nabarro – DCG/ESAD

M: +44 (0)7786 260986

Skype: tom.nabarro

 

From: Chaarawi, Mohamad
Sent: Friday, October 25, 2019 2:24 PM
To: Jordan Henderson <jhenderson@...>
Cc: Nabarro, Tom <tom.nabarro@...>
Subject: Re: DAOS master error when formatting tmpfs

 

Jordan, it would be great to send such emails to the DAOS user list where there are more people (from the control plane) who can answer. Please subscribe here:

https://daos.groups.io/g/daos

 

did you clear your tmpfs beforehand?

Im not sure why you would be getting this over tmpfs, but I never start the server as root.

 

Mohamad

 

 

From: Jordan Henderson <jhenderson@...>
Date: Thursday, October 24, 2019 at 10:22 PM
To: "Chaarawi, Mohamad" <mohamad.chaarawi@...>
Subject: DAOS master error when formatting tmpfs

 

Hi Mohamad,

 

are you aware of any bugs with the current DAOS master (commit 15168685005843766c038afff45fd6681c07f341) and trying to format storage when the SCM is emulated through a tmpfs? I tried updating my DAOS to the latest master but when I tried to start DAOS I got the error below. If I started the server as root it then waited for me to format the storage, but when I did so the format command returned success and then the server immediately returned the same error as below and exited. Once I switched to the v0.6 tag (which was also ahead of my current install), I was able to run the server as non-root, format the tmpfs and start the server fine.

 

DEBUG 21:03:05.903946 instance.go:199: /home/jhenderson/Work/DAOS_Workspace/daos_fs/ (ram) needs format: true

ERROR: failed to read existing superblock: can't read superblock from unformatted storage

DEBUG 21:03:05.904215 main.go:67: can't read superblock from unformatted storage

github.com/daos-stack/daos/src/control/server.(*IOServerInstance).ReadSuperblock

/home/jhenderson/git/daos/build/src/control/src/github.com/daos-stack/daos/src/control/server/superblock.go:176

github.com/daos-stack/daos/src/control/server.(*IOServerInstance).NeedsSuperblock

/home/jhenderson/git/daos/build/src/control/src/github.com/daos-stack/daos/src/control/server/superblock.go:110

github.com/daos-stack/daos/src/control/server.(*IOServerHarness).CreateSuperblocks

/home/jhenderson/git/daos/build/src/control/src/github.com/daos-stack/daos/src/control/server/harness.go:118

github.com/daos-stack/daos/src/control/server.Start

/home/jhenderson/git/daos/build/src/control/src/github.com/daos-stack/daos/src/control/server/server.go:166

main.(*startCmd).Execute

/home/jhenderson/git/daos/build/src/control/src/github.com/daos-stack/daos/src/control/cmd/daos_server/start.go:167

main.parseOpts.func1

/home/jhenderson/git/daos/build/src/control/src/github.com/daos-stack/daos/src/control/cmd/daos_server/main.go:103

github.com/daos-stack/daos/src/control/vendor/github.com/jessevdk/go-flags.(*Parser).ParseArgs

/home/jhenderson/git/daos/build/src/control/src/github.com/daos-stack/daos/src/control/vendor/github.com/jessevdk/go-flags/parser.go:314

main.parseOpts

/home/jhenderson/git/daos/build/src/control/src/github.com/daos-stack/daos/src/control/cmd/daos_server/main.go:111

main.main

/home/jhenderson/git/daos/build/src/control/src/github.com/daos-stack/daos/src/control/cmd/daos_server/main.go:123

runtime.main

/usr/lib64/go1.11.9/go/src/runtime/proc.go:201

runtime.goexit

/usr/lib64/go1.11.9/go/src/runtime/asm_amd64.s:1333

failed to read existing superblock

github.com/daos-stack/daos/src/control/server.(*IOServerInstance).NeedsSuperblock

/home/jhenderson/git/daos/build/src/control/src/github.com/daos-stack/daos/src/control/server/superblock.go:117

github.com/daos-stack/daos/src/control/server.(*IOServerHarness).CreateSuperblocks

/home/jhenderson/git/daos/build/src/control/src/github.com/daos-stack/daos/src/control/server/harness.go:118

github.com/daos-stack/daos/src/control/server.Start

/home/jhenderson/git/daos/build/src/control/src/github.com/daos-stack/daos/src/control/server/server.go:166

main.(*startCmd).Execute

/home/jhenderson/git/daos/build/src/control/src/github.com/daos-stack/daos/src/control/cmd/daos_server/start.go:167

main.parseOpts.func1

/home/jhenderson/git/daos/build/src/control/src/github.com/daos-stack/daos/src/control/cmd/daos_server/main.go:103

github.com/daos-stack/daos/src/control/vendor/github.com/jessevdk/go-flags.(*Parser).ParseArgs

/home/jhenderson/git/daos/build/src/control/src/github.com/daos-stack/daos/src/control/vendor/github.com/jessevdk/go-flags/parser.go:314

main.parseOpts

/home/jhenderson/git/daos/build/src/control/src/github.com/daos-stack/daos/src/control/cmd/daos_server/main.go:111

main.main

/home/jhenderson/git/daos/build/src/control/src/github.com/daos-stack/daos/src/control/cmd/daos_server/main.go:123

runtime.main

/usr/lib64/go1.11.9/go/src/runtime/proc.go:201

runtime.goexit

/usr/lib64/go1.11.9/go/src/runtime/asm_amd64.s:1333

---------------------------------------------------------------------
Intel Corporation (UK) Limited
Registered No. 1134945 (England)
Registered Office: Pipers Way, Swindon SN3 1RJ
VAT No: 860 2173 47

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

---------------------------------------------------------------------
Intel Corporation (UK) Limited
Registered No. 1134945 (England)
Registered Office: Pipers Way, Swindon SN3 1RJ
VAT No: 860 2173 47

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

---------------------------------------------------------------------
Intel Corporation (UK) Limited
Registered No. 1134945 (England)
Registered Office: Pipers Way, Swindon SN3 1RJ
VAT No: 860 2173 47

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


Re: DAOS master error when formatting tmpfs

Nabarro, Tom
 

I haven't been developing or testing with IOAT devices, Niu might be able to help with where we are at on DAOS & VFIO/IOAT SSDs. You can supply a PCI address whitelist to storage prepare but currently not blacklist.

 

Regards,

Tom Nabarro – DCG/ESAD

M: +44 (0)7786 260986

Skype: tom.nabarro

 

From: daos@daos.groups.io [mailto:daos@daos.groups.io] On Behalf Of Kevan Rehm
Sent: Wednesday, October 30, 2019 1:09 PM
To: daos@daos.groups.io
Subject: Re: [daos] DAOS master error when formatting tmpfs

 

Tom,

 

Sorry, while my symptoms were similar, my issue only relates to SSDs, not SCM.   A non-root daos daemon cannot open an IOAT device because of the /dev/sdX root permissions.  Can you provide more detail on what configurations are supported for IOAT?   

 

Suppose you have a system with a mix of NVMe and IOAT SSDs.   For NVMe we would want to enable the IOMMU to get the vfio_pci driver, and we would want to run daos daemons as non-root.   But that doesn’t work for IOAT SSDs, it appears the rule is that if there are any IOAT devices, then the daos daemons must run as root.   What about the driver rebinding, does that interfere with SPDK’s bdev driver doing AIO, do we have to blacklist IOAT drives during “storage prepare” to prevent the rebinding, or is the driver rebinding for IOAT devices harmless?

 

Thanks, Kevan

 

From: <daos@daos.groups.io> on behalf of "Nabarro, Tom" <tom.nabarro@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Friday, October 25, 2019 at 10:11 PM
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: Re: [daos] DAOS master error when formatting tmpfs

 

Are you seeing that when running as non-root? if so then do you have a mounted empty SCM available prior to starting? When running as non-root, server will not wait for format, will either create superblock and continue to start IO server if SCM is mounted otherwise it will bail.

 

Ignore ALSR message as doesn't currently cause any practical problems as far as we know.

 

We are not currently testing/developing with VFIO/IOMMU/IOAT

 

Regards,

Tom Nabarro – DCG/ESAD

M: +44 (0)7786 260986

Skype: tom.nabarro

 

From: daos@daos.groups.io [mailto:daos@daos.groups.io] On Behalf Of Kevan Rehm
Sent: Friday, October 25, 2019 10:51 PM
To: daos@daos.groups.io
Subject: Re: [daos] DAOS master error when formatting tmpfs

 

Tom,

 

I am seeing the same message that Jordan reports when running as non-root, the “daos_server start” command fails immediately, doesn’t wait for formatting:

 

no NVMe controllers found

DAOS control server listening on 0.0.0.0:10001

no NVDIMMs found!

ERROR: failed to read existing superblock: can't read superblock from unformatted storage

 

But perhaps my situation is different.   I followed your instructions and got the output below.   For storage I am using an IOAT device, I do not have NVMe devices but I am using the vfio_pci driver, the iommu is enabled.   Is non-root supported with IOAT devices?  (I don’t see any code to chown /dev/sdX to user daos in setup.sh.)  If not supported, sorry for the interruption, I’ll switch to root.

 

Thanks, Kevan

 

P.S. Should I be worried about the ASLR message?

 

 

-bash-4.2$ daos/install/bin/orterun -np 1 -H localhost --report-uri /tmp/urifile daos_server start -t 1 -d /tmp/ -o /home/users/daos/daos/utils/config/examples/daos_server_local.yml

daos_server logging to file /tmp/daos_control.log

Starting SPDK v18.07-pre / DPDK 18.02.0 initialization...

[ DPDK EAL parameters: spdk -c 0x1 --file-prefix=spdk1327119562 --base-virtaddr=0x200000000000 --proc-type=auto ]

EAL: Detected 32 lcore(s)

EAL: Auto-detected process type: PRIMARY

EAL: No free hugepages reported in hugepages-1048576kB

EAL: Multi-process socket /home/users/daos/.spdk1327119562_unix

EAL: Probing VFIO support...

EAL: Cannot obtain physical addresses: No such file or directory. Only vfio will function.

no NVMe controllers found

DAOS control server listening on 0.0.0.0:10001

no NVDIMMs found!

Starting I/O server instance 0: /home/users/daos/daos/install/bin/daos_io_server

daos_io_server:0 Using legacy core allocation algorithm

daos_io_server:0 Starting SPDK v18.07-pre / DPDK 18.02.0 initialization...

[ DPDK EAL parameters: daos -c 0x1 --file-prefix=spdk1327119562 --base-virtaddr=0x200000000000 --proc-type=auto ]

ERROR: daos_io_server:0 EAL: Detected 32 lcore(s)

ERROR: daos_io_server:0 EAL: Auto-detected process type: SECONDARY

daos_io_server:0 EAL: Multi-process socket /home/users/daos/.spdk1327119562_unix_15634_10ea63cd2a562

daos_io_server:0 EAL: Probing VFIO support...

daos_io_server:0 EAL: WARNING: Address Space Layout Randomization (ASLR) is enabled in the kernel.

daos_io_server:0 EAL:    This may cause issues with mapping memory into secondary processes

daos_io_server:0 EAL: Cannot obtain physical addresses: No such file or directory. Only vfio will function.

ERROR: daos_io_server:0 bdev_aio.c:  83:bdev_aio_open: *ERROR*: open() failed (file:/dev/sdb), errno 13: Permission denied

ERROR: daos_io_server:0 bdev_aio.c: 470:create_aio_disk: *ERROR*: Unable to open file /dev/sdb. fd: -1 errno: 13

bdev_aio.c: 599:bdev_aio_initialize: *ERROR*: Unable to create AIO bdev from file /dev/sdb

ERROR: DAOS I/O Server exited with error: /home/users/daos/daos/install/bin/daos_io_server (instance 0) exited: exit status 1

-------------------------------------------------------

Primary job  terminated normally, but 1 process returned

a non-zero exit code. Per user-direction, the job has been aborted.

-------------------------------------------------------

--------------------------------------------------------------------------

orterun detected that one or more processes exited with non-zero status, thus causing

the job to be terminated. The first process to do so was:

 

  Process name: [[21676,1],0]

  Exit code:    1

--------------------------------------------------------------------------

 

 

From: <daos@daos.groups.io> on behalf of "Nabarro, Tom" <tom.nabarro@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Friday, October 25, 2019 at 12:37 PM
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: Re: [daos] DAOS master error when formatting tmpfs

 

Let’s focus on the non-root case to start with, after you have manually created empty /mnt/daos and mounted tmpfs, change permissions to 777 (just for this experiment) on the SCM directory and run daos_server with control_log_mask: DEBUG (please also paste config file). The superblock should be created and the IO server start.

 

Thanks

 

Regards,

Tom Nabarro – DCG/ESAD

M: +44 (0)7786 260986

Skype: tom.nabarro

 

From: daos@daos.groups.io [mailto:daos@daos.groups.io] On Behalf Of Jordan Henderson
Sent: Friday, October 25, 2019 5:00 PM
To: daos@daos.groups.io
Subject: Re: [daos] DAOS master error when formatting tmpfs

 

Hi Tom,

 

in general I don't usually run as root, but in this case it did seem to be the only way that I could get the server to wait for storage formatting. However, even when I started from a clean slate for the tmpfs mount, as per these instructions, it didn't seem to matter whether I manually mounted the tmpfs myself or allowed the storage formatting to do so. When running as root, the storage format appeared to be successful in both cases, but the server still immediately returned an error for the storage not being formatted and then exited. When running as non-root, the server didn't wait for storage formatting in either case.

 

It might be worth noting that after each successful format command, my tmpfs mount still didn't contain a superblock file. I'm guessing that this is probably why the server is returning a formatting error right after the storage format command?

 


From: daos@daos.groups.io <daos@daos.groups.io> on behalf of Nabarro, Tom via Groups.Io <tom.nabarro@...>
Sent: Friday, October 25, 2019 8:58 AM
To: Chaarawi, Mohamad <mohamad.chaarawi@...>; Jordan Henderson <jhenderson@...>
Cc: daos@daos.groups.io <daos@daos.groups.io>
Subject: Re: [daos] DAOS master error when formatting tmpfs

 

Hello Jordan

 

If running as root, could you please try “umount /mnt/daos; rm -rf /mnt/daos” then start server which should wait for format, then format from daos_shell.

 

$  umount /mnt/daos; rm -rf /mnt/daos

$  orterun -N 1 -H localhost --report-uri /tmp/urifile --allow-run-as-root daos_server start -t 1 -o ~tanabarr/projects/daos_m/utils/config/examples/daos_server_sockets.yml -i

$ daos_shell -i storage format

 

If running as non-root, the following works for me:

 

$  sudo umount /mnt/daos; sudo rm -rf /mnt/daos; sudo mkdir /mnt/daos; sudo mount -t tmpfs -o size=64G tmpfs /mnt/daos

$ orterun -np 1 -H localhost --report-uri /tmp/urifile daos_server start -t 1 -d /tmp/ -o ~tanabarr/projects/daos_m/utils/config/examples/daos_server_sockets.yml –i

 

Above tried on commit 15168685005843766c038afff45fd6681c07f341 .

86f730a37d0170fdb733c1b08308162a245e5aea did introduce changes to the formatting code but I haven’t observed any subsequent regressions in my testing, it may just need a clean slate in your situation.

 

Thanks

 

Regards,

Tom Nabarro – DCG/ESAD

M: +44 (0)7786 260986

Skype: tom.nabarro

 

From: Chaarawi, Mohamad
Sent: Friday, October 25, 2019 2:24 PM
To: Jordan Henderson <jhenderson@...>
Cc: Nabarro, Tom <tom.nabarro@...>
Subject: Re: DAOS master error when formatting tmpfs

 

Jordan, it would be great to send such emails to the DAOS user list where there are more people (from the control plane) who can answer. Please subscribe here:

https://daos.groups.io/g/daos

 

did you clear your tmpfs beforehand?

Im not sure why you would be getting this over tmpfs, but I never start the server as root.

 

Mohamad

 

 

From: Jordan Henderson <jhenderson@...>
Date: Thursday, October 24, 2019 at 10:22 PM
To: "Chaarawi, Mohamad" <mohamad.chaarawi@...>
Subject: DAOS master error when formatting tmpfs

 

Hi Mohamad,

 

are you aware of any bugs with the current DAOS master (commit 15168685005843766c038afff45fd6681c07f341) and trying to format storage when the SCM is emulated through a tmpfs? I tried updating my DAOS to the latest master but when I tried to start DAOS I got the error below. If I started the server as root it then waited for me to format the storage, but when I did so the format command returned success and then the server immediately returned the same error as below and exited. Once I switched to the v0.6 tag (which was also ahead of my current install), I was able to run the server as non-root, format the tmpfs and start the server fine.

 

DEBUG 21:03:05.903946 instance.go:199: /home/jhenderson/Work/DAOS_Workspace/daos_fs/ (ram) needs format: true

ERROR: failed to read existing superblock: can't read superblock from unformatted storage

DEBUG 21:03:05.904215 main.go:67: can't read superblock from unformatted storage

github.com/daos-stack/daos/src/control/server.(*IOServerInstance).ReadSuperblock

/home/jhenderson/git/daos/build/src/control/src/github.com/daos-stack/daos/src/control/server/superblock.go:176

github.com/daos-stack/daos/src/control/server.(*IOServerInstance).NeedsSuperblock

/home/jhenderson/git/daos/build/src/control/src/github.com/daos-stack/daos/src/control/server/superblock.go:110

github.com/daos-stack/daos/src/control/server.(*IOServerHarness).CreateSuperblocks

/home/jhenderson/git/daos/build/src/control/src/github.com/daos-stack/daos/src/control/server/harness.go:118

github.com/daos-stack/daos/src/control/server.Start

/home/jhenderson/git/daos/build/src/control/src/github.com/daos-stack/daos/src/control/server/server.go:166

main.(*startCmd).Execute

/home/jhenderson/git/daos/build/src/control/src/github.com/daos-stack/daos/src/control/cmd/daos_server/start.go:167

main.parseOpts.func1

/home/jhenderson/git/daos/build/src/control/src/github.com/daos-stack/daos/src/control/cmd/daos_server/main.go:103

github.com/daos-stack/daos/src/control/vendor/github.com/jessevdk/go-flags.(*Parser).ParseArgs

/home/jhenderson/git/daos/build/src/control/src/github.com/daos-stack/daos/src/control/vendor/github.com/jessevdk/go-flags/parser.go:314

main.parseOpts

/home/jhenderson/git/daos/build/src/control/src/github.com/daos-stack/daos/src/control/cmd/daos_server/main.go:111

main.main

/home/jhenderson/git/daos/build/src/control/src/github.com/daos-stack/daos/src/control/cmd/daos_server/main.go:123

runtime.main

/usr/lib64/go1.11.9/go/src/runtime/proc.go:201

runtime.goexit

/usr/lib64/go1.11.9/go/src/runtime/asm_amd64.s:1333

failed to read existing superblock

github.com/daos-stack/daos/src/control/server.(*IOServerInstance).NeedsSuperblock

/home/jhenderson/git/daos/build/src/control/src/github.com/daos-stack/daos/src/control/server/superblock.go:117

github.com/daos-stack/daos/src/control/server.(*IOServerHarness).CreateSuperblocks

/home/jhenderson/git/daos/build/src/control/src/github.com/daos-stack/daos/src/control/server/harness.go:118

github.com/daos-stack/daos/src/control/server.Start

/home/jhenderson/git/daos/build/src/control/src/github.com/daos-stack/daos/src/control/server/server.go:166

main.(*startCmd).Execute

/home/jhenderson/git/daos/build/src/control/src/github.com/daos-stack/daos/src/control/cmd/daos_server/start.go:167

main.parseOpts.func1

/home/jhenderson/git/daos/build/src/control/src/github.com/daos-stack/daos/src/control/cmd/daos_server/main.go:103

github.com/daos-stack/daos/src/control/vendor/github.com/jessevdk/go-flags.(*Parser).ParseArgs

/home/jhenderson/git/daos/build/src/control/src/github.com/daos-stack/daos/src/control/vendor/github.com/jessevdk/go-flags/parser.go:314

main.parseOpts

/home/jhenderson/git/daos/build/src/control/src/github.com/daos-stack/daos/src/control/cmd/daos_server/main.go:111

main.main

/home/jhenderson/git/daos/build/src/control/src/github.com/daos-stack/daos/src/control/cmd/daos_server/main.go:123

runtime.main

/usr/lib64/go1.11.9/go/src/runtime/proc.go:201

runtime.goexit

/usr/lib64/go1.11.9/go/src/runtime/asm_amd64.s:1333

---------------------------------------------------------------------
Intel Corporation (UK) Limited
Registered No. 1134945 (England)
Registered Office: Pipers Way, Swindon SN3 1RJ
VAT No: 860 2173 47

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

---------------------------------------------------------------------
Intel Corporation (UK) Limited
Registered No. 1134945 (England)
Registered Office: Pipers Way, Swindon SN3 1RJ
VAT No: 860 2173 47

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

---------------------------------------------------------------------
Intel Corporation (UK) Limited
Registered No. 1134945 (England)
Registered Office: Pipers Way, Swindon SN3 1RJ
VAT No: 860 2173 47

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

---------------------------------------------------------------------
Intel Corporation (UK) Limited
Registered No. 1134945 (England)
Registered Office: Pipers Way, Swindon SN3 1RJ
VAT No: 860 2173 47

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


Re: Registration

Patrick Farrell <paf@...>
 

Gabriel,

You can sign up for the list via https://daos.groups.io/g/daos - The sign up process there should be largely self-explanatory (once you know to look there).

Regards,
-Patrick


From: daos@daos.groups.io <daos@daos.groups.io> on behalf of Gabriel.Pirastru@... <Gabriel.Pirastru@...>
Sent: Wednesday, October 30, 2019 9:01 AM
To: daos@daos.groups.io <daos@daos.groups.io>
Subject: [daos] Registration
 

Dell Customer Communication - Confidential


Hello,

 

Would it be possible to get registered to this mailing list?

I have seen there is a 2h meeting at the SC19 on DAOS, I would like to attend if possible.

 

Thank you & Kind regards

 

Gabriel

 

Gabriel Pirastru

EMEA Business Development Manager

Dell EMC | HPC & AI team

Office   (+31) 206 744 607

Mobile (+31) 612 459 767

gabriel_pirastru@...

 

 

Dell BV is registered in Netherlands.  Company Registration Number : 33226753.
Registered Seat: Transformatorweg 38, 1014 AK Amsterdam, Netherlands
Vat Number: NL 009120774 B01 (Netherlands) / LU 17816980 (Luxembourg)


Registration

Gabriel.Pirastru@...
 

Dell Customer Communication - Confidential


Hello,

 

Would it be possible to get registered to this mailing list?

I have seen there is a 2h meeting at the SC19 on DAOS, I would like to attend if possible.

 

Thank you & Kind regards

 

Gabriel

 

Gabriel Pirastru

EMEA Business Development Manager

Dell EMC | HPC & AI team

Office   (+31) 206 744 607

Mobile (+31) 612 459 767

gabriel_pirastru@...

 

 

Dell BV is registered in Netherlands.  Company Registration Number : 33226753.
Registered Seat: Transformatorweg 38, 1014 AK Amsterdam, Netherlands
Vat Number: NL 009120774 B01 (Netherlands) / LU 17816980 (Luxembourg)


Re: DAOS master error when formatting tmpfs

Kevan Rehm
 

Tom,

 

Sorry, while my symptoms were similar, my issue only relates to SSDs, not SCM.   A non-root daos daemon cannot open an IOAT device because of the /dev/sdX root permissions.  Can you provide more detail on what configurations are supported for IOAT?   

 

Suppose you have a system with a mix of NVMe and IOAT SSDs.   For NVMe we would want to enable the IOMMU to get the vfio_pci driver, and we would want to run daos daemons as non-root.   But that doesn’t work for IOAT SSDs, it appears the rule is that if there are any IOAT devices, then the daos daemons must run as root.   What about the driver rebinding, does that interfere with SPDK’s bdev driver doing AIO, do we have to blacklist IOAT drives during “storage prepare” to prevent the rebinding, or is the driver rebinding for IOAT devices harmless?

 

Thanks, Kevan

 

From: <daos@daos.groups.io> on behalf of "Nabarro, Tom" <tom.nabarro@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Friday, October 25, 2019 at 10:11 PM
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: Re: [daos] DAOS master error when formatting tmpfs

 

Are you seeing that when running as non-root? if so then do you have a mounted empty SCM available prior to starting? When running as non-root, server will not wait for format, will either create superblock and continue to start IO server if SCM is mounted otherwise it will bail.

 

Ignore ALSR message as doesn't currently cause any practical problems as far as we know.

 

We are not currently testing/developing with VFIO/IOMMU/IOAT

 

Regards,

Tom Nabarro – DCG/ESAD

M: +44 (0)7786 260986

Skype: tom.nabarro

 

From: daos@daos.groups.io [mailto:daos@daos.groups.io] On Behalf Of Kevan Rehm
Sent: Friday, October 25, 2019 10:51 PM
To: daos@daos.groups.io
Subject: Re: [daos] DAOS master error when formatting tmpfs

 

Tom,

 

I am seeing the same message that Jordan reports when running as non-root, the “daos_server start” command fails immediately, doesn’t wait for formatting:

 

no NVMe controllers found

DAOS control server listening on 0.0.0.0:10001

no NVDIMMs found!

ERROR: failed to read existing superblock: can't read superblock from unformatted storage

 

But perhaps my situation is different.   I followed your instructions and got the output below.   For storage I am using an IOAT device, I do not have NVMe devices but I am using the vfio_pci driver, the iommu is enabled.   Is non-root supported with IOAT devices?  (I don’t see any code to chown /dev/sdX to user daos in setup.sh.)  If not supported, sorry for the interruption, I’ll switch to root.

 

Thanks, Kevan

 

P.S. Should I be worried about the ASLR message?

 

 

-bash-4.2$ daos/install/bin/orterun -np 1 -H localhost --report-uri /tmp/urifile daos_server start -t 1 -d /tmp/ -o /home/users/daos/daos/utils/config/examples/daos_server_local.yml

daos_server logging to file /tmp/daos_control.log

Starting SPDK v18.07-pre / DPDK 18.02.0 initialization...

[ DPDK EAL parameters: spdk -c 0x1 --file-prefix=spdk1327119562 --base-virtaddr=0x200000000000 --proc-type=auto ]

EAL: Detected 32 lcore(s)

EAL: Auto-detected process type: PRIMARY

EAL: No free hugepages reported in hugepages-1048576kB

EAL: Multi-process socket /home/users/daos/.spdk1327119562_unix

EAL: Probing VFIO support...

EAL: Cannot obtain physical addresses: No such file or directory. Only vfio will function.

no NVMe controllers found

DAOS control server listening on 0.0.0.0:10001

no NVDIMMs found!

Starting I/O server instance 0: /home/users/daos/daos/install/bin/daos_io_server

daos_io_server:0 Using legacy core allocation algorithm

daos_io_server:0 Starting SPDK v18.07-pre / DPDK 18.02.0 initialization...

[ DPDK EAL parameters: daos -c 0x1 --file-prefix=spdk1327119562 --base-virtaddr=0x200000000000 --proc-type=auto ]

ERROR: daos_io_server:0 EAL: Detected 32 lcore(s)

ERROR: daos_io_server:0 EAL: Auto-detected process type: SECONDARY

daos_io_server:0 EAL: Multi-process socket /home/users/daos/.spdk1327119562_unix_15634_10ea63cd2a562

daos_io_server:0 EAL: Probing VFIO support...

daos_io_server:0 EAL: WARNING: Address Space Layout Randomization (ASLR) is enabled in the kernel.

daos_io_server:0 EAL:    This may cause issues with mapping memory into secondary processes

daos_io_server:0 EAL: Cannot obtain physical addresses: No such file or directory. Only vfio will function.

ERROR: daos_io_server:0 bdev_aio.c:  83:bdev_aio_open: *ERROR*: open() failed (file:/dev/sdb), errno 13: Permission denied

ERROR: daos_io_server:0 bdev_aio.c: 470:create_aio_disk: *ERROR*: Unable to open file /dev/sdb. fd: -1 errno: 13

bdev_aio.c: 599:bdev_aio_initialize: *ERROR*: Unable to create AIO bdev from file /dev/sdb

ERROR: DAOS I/O Server exited with error: /home/users/daos/daos/install/bin/daos_io_server (instance 0) exited: exit status 1

-------------------------------------------------------

Primary job  terminated normally, but 1 process returned

a non-zero exit code. Per user-direction, the job has been aborted.

-------------------------------------------------------

--------------------------------------------------------------------------

orterun detected that one or more processes exited with non-zero status, thus causing

the job to be terminated. The first process to do so was:

 

  Process name: [[21676,1],0]

  Exit code:    1

--------------------------------------------------------------------------

 

 

From: <daos@daos.groups.io> on behalf of "Nabarro, Tom" <tom.nabarro@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Friday, October 25, 2019 at 12:37 PM
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: Re: [daos] DAOS master error when formatting tmpfs

 

Let’s focus on the non-root case to start with, after you have manually created empty /mnt/daos and mounted tmpfs, change permissions to 777 (just for this experiment) on the SCM directory and run daos_server with control_log_mask: DEBUG (please also paste config file). The superblock should be created and the IO server start.

 

Thanks

 

Regards,

Tom Nabarro – DCG/ESAD

M: +44 (0)7786 260986

Skype: tom.nabarro

 

From: daos@daos.groups.io [mailto:daos@daos.groups.io] On Behalf Of Jordan Henderson
Sent: Friday, October 25, 2019 5:00 PM
To: daos@daos.groups.io
Subject: Re: [daos] DAOS master error when formatting tmpfs

 

Hi Tom,

 

in general I don't usually run as root, but in this case it did seem to be the only way that I could get the server to wait for storage formatting. However, even when I started from a clean slate for the tmpfs mount, as per these instructions, it didn't seem to matter whether I manually mounted the tmpfs myself or allowed the storage formatting to do so. When running as root, the storage format appeared to be successful in both cases, but the server still immediately returned an error for the storage not being formatted and then exited. When running as non-root, the server didn't wait for storage formatting in either case.

 

It might be worth noting that after each successful format command, my tmpfs mount still didn't contain a superblock file. I'm guessing that this is probably why the server is returning a formatting error right after the storage format command?

 


From: daos@daos.groups.io <daos@daos.groups.io> on behalf of Nabarro, Tom via Groups.Io <tom.nabarro@...>
Sent: Friday, October 25, 2019 8:58 AM
To: Chaarawi, Mohamad <mohamad.chaarawi@...>; Jordan Henderson <jhenderson@...>
Cc: daos@daos.groups.io <daos@daos.groups.io>
Subject: Re: [daos] DAOS master error when formatting tmpfs

 

Hello Jordan

 

If running as root, could you please try “umount /mnt/daos; rm -rf /mnt/daos” then start server which should wait for format, then format from daos_shell.

 

$  umount /mnt/daos; rm -rf /mnt/daos

$  orterun -N 1 -H localhost --report-uri /tmp/urifile --allow-run-as-root daos_server start -t 1 -o ~tanabarr/projects/daos_m/utils/config/examples/daos_server_sockets.yml -i

$ daos_shell -i storage format

 

If running as non-root, the following works for me:

 

$  sudo umount /mnt/daos; sudo rm -rf /mnt/daos; sudo mkdir /mnt/daos; sudo mount -t tmpfs -o size=64G tmpfs /mnt/daos

$ orterun -np 1 -H localhost --report-uri /tmp/urifile daos_server start -t 1 -d /tmp/ -o ~tanabarr/projects/daos_m/utils/config/examples/daos_server_sockets.yml –i

 

Above tried on commit 15168685005843766c038afff45fd6681c07f341 .

86f730a37d0170fdb733c1b08308162a245e5aea did introduce changes to the formatting code but I haven’t observed any subsequent regressions in my testing, it may just need a clean slate in your situation.

 

Thanks

 

Regards,

Tom Nabarro – DCG/ESAD

M: +44 (0)7786 260986

Skype: tom.nabarro

 

From: Chaarawi, Mohamad
Sent: Friday, October 25, 2019 2:24 PM
To: Jordan Henderson <jhenderson@...>
Cc: Nabarro, Tom <tom.nabarro@...>
Subject: Re: DAOS master error when formatting tmpfs

 

Jordan, it would be great to send such emails to the DAOS user list where there are more people (from the control plane) who can answer. Please subscribe here:

https://daos.groups.io/g/daos

 

did you clear your tmpfs beforehand?

Im not sure why you would be getting this over tmpfs, but I never start the server as root.

 

Mohamad

 

 

From: Jordan Henderson <jhenderson@...>
Date: Thursday, October 24, 2019 at 10:22 PM
To: "Chaarawi, Mohamad" <mohamad.chaarawi@...>
Subject: DAOS master error when formatting tmpfs

 

Hi Mohamad,

 

are you aware of any bugs with the current DAOS master (commit 15168685005843766c038afff45fd6681c07f341) and trying to format storage when the SCM is emulated through a tmpfs? I tried updating my DAOS to the latest master but when I tried to start DAOS I got the error below. If I started the server as root it then waited for me to format the storage, but when I did so the format command returned success and then the server immediately returned the same error as below and exited. Once I switched to the v0.6 tag (which was also ahead of my current install), I was able to run the server as non-root, format the tmpfs and start the server fine.

 

DEBUG 21:03:05.903946 instance.go:199: /home/jhenderson/Work/DAOS_Workspace/daos_fs/ (ram) needs format: true

ERROR: failed to read existing superblock: can't read superblock from unformatted storage

DEBUG 21:03:05.904215 main.go:67: can't read superblock from unformatted storage

github.com/daos-stack/daos/src/control/server.(*IOServerInstance).ReadSuperblock

/home/jhenderson/git/daos/build/src/control/src/github.com/daos-stack/daos/src/control/server/superblock.go:176

github.com/daos-stack/daos/src/control/server.(*IOServerInstance).NeedsSuperblock

/home/jhenderson/git/daos/build/src/control/src/github.com/daos-stack/daos/src/control/server/superblock.go:110

github.com/daos-stack/daos/src/control/server.(*IOServerHarness).CreateSuperblocks

/home/jhenderson/git/daos/build/src/control/src/github.com/daos-stack/daos/src/control/server/harness.go:118

github.com/daos-stack/daos/src/control/server.Start

/home/jhenderson/git/daos/build/src/control/src/github.com/daos-stack/daos/src/control/server/server.go:166

main.(*startCmd).Execute

/home/jhenderson/git/daos/build/src/control/src/github.com/daos-stack/daos/src/control/cmd/daos_server/start.go:167

main.parseOpts.func1

/home/jhenderson/git/daos/build/src/control/src/github.com/daos-stack/daos/src/control/cmd/daos_server/main.go:103

github.com/daos-stack/daos/src/control/vendor/github.com/jessevdk/go-flags.(*Parser).ParseArgs

/home/jhenderson/git/daos/build/src/control/src/github.com/daos-stack/daos/src/control/vendor/github.com/jessevdk/go-flags/parser.go:314

main.parseOpts

/home/jhenderson/git/daos/build/src/control/src/github.com/daos-stack/daos/src/control/cmd/daos_server/main.go:111

main.main

/home/jhenderson/git/daos/build/src/control/src/github.com/daos-stack/daos/src/control/cmd/daos_server/main.go:123

runtime.main

/usr/lib64/go1.11.9/go/src/runtime/proc.go:201

runtime.goexit

/usr/lib64/go1.11.9/go/src/runtime/asm_amd64.s:1333

failed to read existing superblock

github.com/daos-stack/daos/src/control/server.(*IOServerInstance).NeedsSuperblock

/home/jhenderson/git/daos/build/src/control/src/github.com/daos-stack/daos/src/control/server/superblock.go:117

github.com/daos-stack/daos/src/control/server.(*IOServerHarness).CreateSuperblocks

/home/jhenderson/git/daos/build/src/control/src/github.com/daos-stack/daos/src/control/server/harness.go:118

github.com/daos-stack/daos/src/control/server.Start

/home/jhenderson/git/daos/build/src/control/src/github.com/daos-stack/daos/src/control/server/server.go:166

main.(*startCmd).Execute

/home/jhenderson/git/daos/build/src/control/src/github.com/daos-stack/daos/src/control/cmd/daos_server/start.go:167

main.parseOpts.func1

/home/jhenderson/git/daos/build/src/control/src/github.com/daos-stack/daos/src/control/cmd/daos_server/main.go:103

github.com/daos-stack/daos/src/control/vendor/github.com/jessevdk/go-flags.(*Parser).ParseArgs

/home/jhenderson/git/daos/build/src/control/src/github.com/daos-stack/daos/src/control/vendor/github.com/jessevdk/go-flags/parser.go:314

main.parseOpts

/home/jhenderson/git/daos/build/src/control/src/github.com/daos-stack/daos/src/control/cmd/daos_server/main.go:111

main.main

/home/jhenderson/git/daos/build/src/control/src/github.com/daos-stack/daos/src/control/cmd/daos_server/main.go:123

runtime.main

/usr/lib64/go1.11.9/go/src/runtime/proc.go:201

runtime.goexit

/usr/lib64/go1.11.9/go/src/runtime/asm_amd64.s:1333

---------------------------------------------------------------------
Intel Corporation (UK) Limited
Registered No. 1134945 (England)
Registered Office: Pipers Way, Swindon SN3 1RJ
VAT No: 860 2173 47

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

---------------------------------------------------------------------
Intel Corporation (UK) Limited
Registered No. 1134945 (England)
Registered Office: Pipers Way, Swindon SN3 1RJ
VAT No: 860 2173 47

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

---------------------------------------------------------------------
Intel Corporation (UK) Limited
Registered No. 1134945 (England)
Registered Office: Pipers Way, Swindon SN3 1RJ
VAT No: 860 2173 47

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


Re: New dmg tool

Nabarro, Tom
 

Indeed, documentation has been updated as such.

 

Tom

 

From: daos@daos.groups.io [mailto:daos@daos.groups.io] On Behalf Of Patrick Farrell
Sent: Tuesday, October 29, 2019 7:34 PM
To: daos@daos.groups.io
Subject: Re: [daos] New dmg tool

 

It may be worth noting that this tool (the new Go tool) was previously named daos_shell, and any commands which referred to “daos_shell” should now use “dmg”.

 

-Patrick

 

From: <daos@daos.groups.io> on behalf of "Nabarro, Tom" <tom.nabarro@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Tuesday, October 29, 2019 at 9:03 AM
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: [daos] New dmg tool

 

https://github.com/daos-stack/daos/pull/1302 has just been merged with master (https://github.com/daos-stack/daos/commit/084356e91d660334dc3efbbe9d0186e204450579).

 

The install/bin/dmg binary is now the control plane management/administration tool written in Go which replaces the previous C tool of the same name. Usage can be printed with "dmg --help" and if you need the old tool for whatever reason, it can be found at install/bin/dmg_old for a limited period of time. See the documentation at https://github.com/daos-stack/daos/blob/master/doc/admin/deployment.md#basic-workflow and https://daos.io/admin/deployment/ (which will shortly be rebuilt to reflect the changes) for details and examples on how to use dmg.

 

Best regards,

Tom Nabarro BEng (hons)

Extreme Storage Architecture & Development

Intel Corporation

E: tom.nabarro@...

M: +44 (0)7786 260986

Skype: tom.nabarro

 

---------------------------------------------------------------------
Intel Corporation (UK) Limited
Registered No. 1134945 (England)
Registered Office: Pipers Way, Swindon SN3 1RJ
VAT No: 860 2173 47

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

---------------------------------------------------------------------
Intel Corporation (UK) Limited
Registered No. 1134945 (England)
Registered Office: Pipers Way, Swindon SN3 1RJ
VAT No: 860 2173 47

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


Re: DAOS master error when formatting tmpfs

Jordan Henderson
 

Hi Tom,

I had a little more time to play around with this and it looks like I only encounter the issue if the 'scm_mount_path' or 'scm_mount' that I use in my server configuration file differs from '/mnt/daos'. When it is set to the default '/mnt/daos' I don't have any problems, but if I set it to something else I encounter the error about unformatted storage.


From: daos@daos.groups.io <daos@daos.groups.io> on behalf of Nabarro, Tom via Groups.Io <tom.nabarro@...>
Sent: Friday, October 25, 2019 12:37 PM
To: daos@daos.groups.io <daos@daos.groups.io>
Subject: Re: [daos] DAOS master error when formatting tmpfs
 

Let’s focus on the non-root case to start with, after you have manually created empty /mnt/daos and mounted tmpfs, change permissions to 777 (just for this experiment) on the SCM directory and run daos_server with control_log_mask: DEBUG (please also paste config file). The superblock should be created and the IO server start.

 

Thanks

 

Regards,

Tom Nabarro – DCG/ESAD

M: +44 (0)7786 260986

Skype: tom.nabarro

 

From: daos@daos.groups.io [mailto:daos@daos.groups.io] On Behalf Of Jordan Henderson
Sent: Friday, October 25, 2019 5:00 PM
To: daos@daos.groups.io
Subject: Re: [daos] DAOS master error when formatting tmpfs

 

Hi Tom,

 

in general I don't usually run as root, but in this case it did seem to be the only way that I could get the server to wait for storage formatting. However, even when I started from a clean slate for the tmpfs mount, as per these instructions, it didn't seem to matter whether I manually mounted the tmpfs myself or allowed the storage formatting to do so. When running as root, the storage format appeared to be successful in both cases, but the server still immediately returned an error for the storage not being formatted and then exited. When running as non-root, the server didn't wait for storage formatting in either case.

 

It might be worth noting that after each successful format command, my tmpfs mount still didn't contain a superblock file. I'm guessing that this is probably why the server is returning a formatting error right after the storage format command?

 


From: daos@daos.groups.io <daos@daos.groups.io> on behalf of Nabarro, Tom via Groups.Io <tom.nabarro@...>
Sent: Friday, October 25, 2019 8:58 AM
To: Chaarawi, Mohamad <mohamad.chaarawi@...>; Jordan Henderson <jhenderson@...>
Cc: daos@daos.groups.io <daos@daos.groups.io>
Subject: Re: [daos] DAOS master error when formatting tmpfs

 

Hello Jordan

 

If running as root, could you please try “umount /mnt/daos; rm -rf /mnt/daos” then start server which should wait for format, then format from daos_shell.

 

$  umount /mnt/daos; rm -rf /mnt/daos

$  orterun -N 1 -H localhost --report-uri /tmp/urifile --allow-run-as-root daos_server start -t 1 -o ~tanabarr/projects/daos_m/utils/config/examples/daos_server_sockets.yml -i

$ daos_shell -i storage format

 

If running as non-root, the following works for me:

 

$  sudo umount /mnt/daos; sudo rm -rf /mnt/daos; sudo mkdir /mnt/daos; sudo mount -t tmpfs -o size=64G tmpfs /mnt/daos

$ orterun -np 1 -H localhost --report-uri /tmp/urifile daos_server start -t 1 -d /tmp/ -o ~tanabarr/projects/daos_m/utils/config/examples/daos_server_sockets.yml –i

 

Above tried on commit 15168685005843766c038afff45fd6681c07f341 .

86f730a37d0170fdb733c1b08308162a245e5aea did introduce changes to the formatting code but I haven’t observed any subsequent regressions in my testing, it may just need a clean slate in your situation.

 

Thanks

 

Regards,

Tom Nabarro – DCG/ESAD

M: +44 (0)7786 260986

Skype: tom.nabarro

 

From: Chaarawi, Mohamad
Sent: Friday, October 25, 2019 2:24 PM
To: Jordan Henderson <jhenderson@...>
Cc: Nabarro, Tom <tom.nabarro@...>
Subject: Re: DAOS master error when formatting tmpfs

 

Jordan, it would be great to send such emails to the DAOS user list where there are more people (from the control plane) who can answer. Please subscribe here:

https://daos.groups.io/g/daos

 

did you clear your tmpfs beforehand?

Im not sure why you would be getting this over tmpfs, but I never start the server as root.

 

Mohamad

 

 

From: Jordan Henderson <jhenderson@...>
Date: Thursday, October 24, 2019 at 10:22 PM
To: "Chaarawi, Mohamad" <mohamad.chaarawi@...>
Subject: DAOS master error when formatting tmpfs

 

Hi Mohamad,

 

are you aware of any bugs with the current DAOS master (commit 15168685005843766c038afff45fd6681c07f341) and trying to format storage when the SCM is emulated through a tmpfs? I tried updating my DAOS to the latest master but when I tried to start DAOS I got the error below. If I started the server as root it then waited for me to format the storage, but when I did so the format command returned success and then the server immediately returned the same error as below and exited. Once I switched to the v0.6 tag (which was also ahead of my current install), I was able to run the server as non-root, format the tmpfs and start the server fine.

 

DEBUG 21:03:05.903946 instance.go:199: /home/jhenderson/Work/DAOS_Workspace/daos_fs/ (ram) needs format: true

ERROR: failed to read existing superblock: can't read superblock from unformatted storage

DEBUG 21:03:05.904215 main.go:67: can't read superblock from unformatted storage

github.com/daos-stack/daos/src/control/server.(*IOServerInstance).ReadSuperblock

/home/jhenderson/git/daos/build/src/control/src/github.com/daos-stack/daos/src/control/server/superblock.go:176

github.com/daos-stack/daos/src/control/server.(*IOServerInstance).NeedsSuperblock

/home/jhenderson/git/daos/build/src/control/src/github.com/daos-stack/daos/src/control/server/superblock.go:110

github.com/daos-stack/daos/src/control/server.(*IOServerHarness).CreateSuperblocks

/home/jhenderson/git/daos/build/src/control/src/github.com/daos-stack/daos/src/control/server/harness.go:118

github.com/daos-stack/daos/src/control/server.Start

/home/jhenderson/git/daos/build/src/control/src/github.com/daos-stack/daos/src/control/server/server.go:166

main.(*startCmd).Execute

/home/jhenderson/git/daos/build/src/control/src/github.com/daos-stack/daos/src/control/cmd/daos_server/start.go:167

main.parseOpts.func1

/home/jhenderson/git/daos/build/src/control/src/github.com/daos-stack/daos/src/control/cmd/daos_server/main.go:103

github.com/daos-stack/daos/src/control/vendor/github.com/jessevdk/go-flags.(*Parser).ParseArgs

/home/jhenderson/git/daos/build/src/control/src/github.com/daos-stack/daos/src/control/vendor/github.com/jessevdk/go-flags/parser.go:314

main.parseOpts

/home/jhenderson/git/daos/build/src/control/src/github.com/daos-stack/daos/src/control/cmd/daos_server/main.go:111

main.main

/home/jhenderson/git/daos/build/src/control/src/github.com/daos-stack/daos/src/control/cmd/daos_server/main.go:123

runtime.main

/usr/lib64/go1.11.9/go/src/runtime/proc.go:201

runtime.goexit

/usr/lib64/go1.11.9/go/src/runtime/asm_amd64.s:1333

failed to read existing superblock

github.com/daos-stack/daos/src/control/server.(*IOServerInstance).NeedsSuperblock

/home/jhenderson/git/daos/build/src/control/src/github.com/daos-stack/daos/src/control/server/superblock.go:117

github.com/daos-stack/daos/src/control/server.(*IOServerHarness).CreateSuperblocks

/home/jhenderson/git/daos/build/src/control/src/github.com/daos-stack/daos/src/control/server/harness.go:118

github.com/daos-stack/daos/src/control/server.Start

/home/jhenderson/git/daos/build/src/control/src/github.com/daos-stack/daos/src/control/server/server.go:166

main.(*startCmd).Execute

/home/jhenderson/git/daos/build/src/control/src/github.com/daos-stack/daos/src/control/cmd/daos_server/start.go:167

main.parseOpts.func1

/home/jhenderson/git/daos/build/src/control/src/github.com/daos-stack/daos/src/control/cmd/daos_server/main.go:103

github.com/daos-stack/daos/src/control/vendor/github.com/jessevdk/go-flags.(*Parser).ParseArgs

/home/jhenderson/git/daos/build/src/control/src/github.com/daos-stack/daos/src/control/vendor/github.com/jessevdk/go-flags/parser.go:314

main.parseOpts

/home/jhenderson/git/daos/build/src/control/src/github.com/daos-stack/daos/src/control/cmd/daos_server/main.go:111

main.main

/home/jhenderson/git/daos/build/src/control/src/github.com/daos-stack/daos/src/control/cmd/daos_server/main.go:123

runtime.main

/usr/lib64/go1.11.9/go/src/runtime/proc.go:201

runtime.goexit

/usr/lib64/go1.11.9/go/src/runtime/asm_amd64.s:1333

---------------------------------------------------------------------
Intel Corporation (UK) Limited
Registered No. 1134945 (England)
Registered Office: Pipers Way, Swindon SN3 1RJ
VAT No: 860 2173 47

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

---------------------------------------------------------------------
Intel Corporation (UK) Limited
Registered No. 1134945 (England)
Registered Office: Pipers Way, Swindon SN3 1RJ
VAT No: 860 2173 47

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


Re: New dmg tool

Patrick Farrell <paf@...>
 

It may be worth noting that this tool (the new Go tool) was previously named daos_shell, and any commands which referred to “daos_shell” should now use “dmg”.

 

-Patrick

 

From: <daos@daos.groups.io> on behalf of "Nabarro, Tom" <tom.nabarro@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Tuesday, October 29, 2019 at 9:03 AM
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: [daos] New dmg tool

 

https://github.com/daos-stack/daos/pull/1302 has just been merged with master (https://github.com/daos-stack/daos/commit/084356e91d660334dc3efbbe9d0186e204450579).

 

The install/bin/dmg binary is now the control plane management/administration tool written in Go which replaces the previous C tool of the same name. Usage can be printed with "dmg --help" and if you need the old tool for whatever reason, it can be found at install/bin/dmg_old for a limited period of time. See the documentation at https://github.com/daos-stack/daos/blob/master/doc/admin/deployment.md#basic-workflow and https://daos.io/admin/deployment/ (which will shortly be rebuilt to reflect the changes) for details and examples on how to use dmg.

 

Best regards,

Tom Nabarro BEng (hons)

Extreme Storage Architecture & Development

Intel Corporation

E: tom.nabarro@...

M: +44 (0)7786 260986

Skype: tom.nabarro

 

---------------------------------------------------------------------
Intel Corporation (UK) Limited
Registered No. 1134945 (England)
Registered Office: Pipers Way, Swindon SN3 1RJ
VAT No: 860 2173 47

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


New dmg tool

Nabarro, Tom
 

https://github.com/daos-stack/daos/pull/1302 has just been merged with master (https://github.com/daos-stack/daos/commit/084356e91d660334dc3efbbe9d0186e204450579).

 

The install/bin/dmg binary is now the control plane management/administration tool written in Go which replaces the previous C tool of the same name. Usage can be printed with "dmg --help" and if you need the old tool for whatever reason, it can be found at install/bin/dmg_old for a limited period of time. See the documentation at https://github.com/daos-stack/daos/blob/master/doc/admin/deployment.md#basic-workflow and https://daos.io/admin/deployment/ (which will shortly be rebuilt to reflect the changes) for details and examples on how to use dmg.

 

Best regards,

Tom Nabarro BEng (hons)

Extreme Storage Architecture & Development

Intel Corporation

E: tom.nabarro@...

M: +44 (0)7786 260986

Skype: tom.nabarro

 

---------------------------------------------------------------------
Intel Corporation (UK) Limited
Registered No. 1134945 (England)
Registered Office: Pipers Way, Swindon SN3 1RJ
VAT No: 860 2173 47

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


3nd annual DAOS User Group meeting

 

will this be webcast?

On Oct 28, 2019, at 3:56 PM, Prantis, Kelsey <kelsey.prantis@...> wrote:

Greetings, 
 
Please join us at the 3nd annual DAOS User Group meeting to be held at SC19 on Wednesday, November 20 from 3:00 to 5:00 pm MT.  This will be held in the Mt. Columbia Room at the Grand Hyatt Denver (1750 Welton Street, Denver).
 
We are in the process of developing the agenda, which so far includes a development and roadmap update, middleware support, performance, and an open discussion. If you would like to present a specific topic, or see an additional topic added, please connect with us to be added to the agenda.
 
Please forward this invitation to anyone we may have missed, or anyone you think might want to attend.  This meeting is open to everyone. If you are not on our mailing list (daos@daos.groups.io) and would like further updates on this event, please reply (not reply-all) to this email, and we will make sure you are kept on future communications, such as the agenda.
We’re looking forward to seeing you there!
 
Best regards,
 
Kelsey Prantis
Software Engineering Manager, Extreme Storage Architecture and Design Division (ESAD)
 
Johann Lombardi 
Principal Engineer ESAD


Re: daos_server storage prepare failures

Nabarro, Tom
 

PR landed to master, Commit 0396c0a218c7e63fe54ddb9e2dbb04345a99807d

 

Regards,

Tom Nabarro – DCG/ESAD

M: +44 (0)7786 260986

Skype: tom.nabarro

 

From: daos@daos.groups.io [mailto:daos@daos.groups.io] On Behalf Of Nabarro, Tom
Sent: Saturday, October 26, 2019 4:00 AM
To: daos@daos.groups.io
Subject: Re: [daos] daos_server storage prepare failures

 

Apologies that this regression slipped in

PR for the fix https://github.com/daos-stack/daos/pull/1323

 

Regards,

Tom Nabarro – DCG/ESAD

M: +44 (0)7786 260986

Skype: tom.nabarro

 

From: daos@daos.groups.io [mailto:daos@daos.groups.io] On Behalf Of Kevan Rehm
Sent: Friday, October 25, 2019 9:00 PM
To: daos@daos.groups.io
Subject: [daos] daos_server storage prepare failures

 

Greetings,

 

I’ve been tracking down a problem where “daos_server storage prepare” fails every time.  I am running daos master, up to date as of a few minutes ago.  The command looks like this:

 

/home/users/daos/daos/install/bin/daos_server storage prepare --debug --nvme-only --target-user=daos -p 1024 -o /home/users/daos/daos/utils/config/examples/daos_server_local.yml

 

Output is this:

 

Preparing locally-attached NVMe storage...

DEBUG 14:26:03.463504 storage_nvme.go:104: spdk setup with _NRHUGE=1024

DEBUG 14:26:03.463959 storage_nvme.go:108: spdk setup with _TARGET_USER=daos

DEBUG 14:26:13.369224 main.go:67: scan error(s):

  SPDK setup: spdk setup failed (_NRHUGE=1024, _TARGET_USER=daos, , chown: cannot access ‘libvirt’: No such file or directory

): exit status 123

 

main.concatErrors

       /home/users/daos/daos/build/src/control/src/github.com/daos-stack/daos/src/control/cmd/daos_server/storage.go:94

main.(*storagePrepareCmd).Execute

       /home/users/daos/daos/build/src/control/src/github.com/daos-stack/daos/src/control/cmd/daos_server/storage.go:160

main.parseOpts.func1

       /home/users/daos/daos/build/src/control/src/github.com/daos-stack/daos/src/control/cmd/daos_server/main.go:103

github.com/daos-stack/daos/src/control/vendor/github.com/jessevdk/go-flags.(*Parser).ParseArgs

       /home/users/daos/daos/build/src/control/src/github.com/daos-stack/daos/src/control/vendor/github.com/jessevdk/go-flags/parser.go:314

main.parseOpts

       /home/users/daos/daos/build/src/control/src/github.com/daos-stack/daos/src/control/cmd/daos_server/main.go:111

main.main

       /home/users/daos/daos/build/src/control/src/github.com/daos-stack/daos/src/control/cmd/daos_server/main.go:123

runtime.main

       /usr/lib/golang/src/runtime/proc.go:203

runtime.goexit

       /usr/lib/golang/src/runtime/asm_amd64.s:1357

ERROR: scan error(s):

  SPDK setup: spdk setup failed (_NRHUGE=1024, _TARGET_USER=daos, , chown: cannot access ‘libvirt’: No such file or directory

): exit status 123

 

The problem is in file src/control/server/init/setup_spdk.sh, recently changed.   Code section is this:

 

        # build arglist manually to filter missing directories/files

        # so we don't error on non-existent entities

        for glob in '/dev/hugepages' '/dev/uio*'                \

                '/sys/class/uio/uio*/device/config'     \

                '/sys/class/uio/uio*/device/resource*'; do

 

                if list=$(ls $glob); then

                        echo "RUN: ls $glob | xargs -r chown -R $_TARGET_USER"

                        echo "$list" | xargs -r chown -R "$_TARGET_USER"

                fi

        done

 

On my machine, /dev/hugepages looks like this:

 

[root@hl-d109 ~]# ls -lR /dev/hugepages

/dev/hugepages:

total 0

drwxr-xr-x 3 root root 0 Oct 25 13:26 libvirt

 

/dev/hugepages/libvirt:

total 0

drwxr-xr-x 2 root root 0 Oct 25 13:26 qemu

 

/dev/hugepages/libvirt/qemu:

total 0

 

I think the intent was for the pathnames in the glob variable to be absolute pathnames, but the above code yields relative pathnames, i.e. libvirt instead of /dev/hugepages/libvirt, so the chown in the xargs statement fails because the command is not be executed inside /dev/hugepages.

 

If you need more information, let me know.

 

Thanks, Kevan

---------------------------------------------------------------------
Intel Corporation (UK) Limited
Registered No. 1134945 (England)
Registered Office: Pipers Way, Swindon SN3 1RJ
VAT No: 860 2173 47

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

---------------------------------------------------------------------
Intel Corporation (UK) Limited
Registered No. 1134945 (England)
Registered Office: Pipers Way, Swindon SN3 1RJ
VAT No: 860 2173 47

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


DAOS User Group Meeting Announcement for SC19

Prantis, Kelsey
 

Greetings,

 

Please join us at the 3nd annual DAOS User Group meeting to be held at SC19 on Wednesday, November 20 from 3:00 to 5:00 pm MT.  This will be held in the Mt. Columbia Room at the Grand Hyatt Denver (1750 Welton Street, Denver).

 

We are in the process of developing the agenda, which so far includes a development and roadmap update, middleware support, performance, and an open discussion. If you would like to present a specific topic, or see an additional topic added, please connect with us to be added to the agenda.

 

Please forward this invitation to anyone we may have missed, or anyone you think might want to attend.  This meeting is open to everyone. If you are not on our mailing list (daos@daos.groups.io) and would like further updates on this event, please reply (not reply-all) to this email, and we will make sure you are kept on future communications, such as the agenda.

We’re looking forward to seeing you there!

 

Best regards,

 

Kelsey Prantis

Software Engineering Manager, Extreme Storage Architecture and Design Division (ESAD)

kelsey.prantis@...

 

Johann Lombardi

Principal Engineer ESAD

johann.lombardi@...


Re: Multiple targets

Colin Ngam
 

We solved this issue by setting the target in the config file to 2.

 

From: Colin Ngam <cngam@...>
Date: Wednesday, October 23, 2019 at 4:11 PM
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: Multiple targets

 

Hi,

 

We defind multiple targets.

 

  bdev_class: kdev

  bdev_list: [/dev/sdl1, /dev/sdb1]

 

orterun -allow-run-as-root -H localhost -np 1 /home/users/daos/daos/install/bin/daos_server start -a /tmp/ -o /home/users/daos/daos/utils/config/examples/daos_server_local.yml

 

Management Service access point started (bootstrapped)

daos_io_server:0 DAOS I/O server (v0.6.0) process 10886 started on rank 0 (out of 1) with 1 target, 0 helper XS per target, firstcore 0, host hl-d108.

 

Tests:

 

orterun --allow-run-as-root -np 1 ./install/bin/daos_test

 

Questions:

 

  1. Should I see 2 targets on the about line (red)
  2. I see activity on only 1 device.

 

Thanks.

 

Colin


Re: DAOS master error when formatting tmpfs

Nabarro, Tom
 

Are you seeing that when running as non-root? if so then do you have a mounted empty SCM available prior to starting? When running as non-root, server will not wait for format, will either create superblock and continue to start IO server if SCM is mounted otherwise it will bail.

 

Ignore ALSR message as doesn't currently cause any practical problems as far as we know.

 

We are not currently testing/developing with VFIO/IOMMU/IOAT

 

Regards,

Tom Nabarro – DCG/ESAD

M: +44 (0)7786 260986

Skype: tom.nabarro

 

From: daos@daos.groups.io [mailto:daos@daos.groups.io] On Behalf Of Kevan Rehm
Sent: Friday, October 25, 2019 10:51 PM
To: daos@daos.groups.io
Subject: Re: [daos] DAOS master error when formatting tmpfs

 

Tom,

 

I am seeing the same message that Jordan reports when running as non-root, the “daos_server start” command fails immediately, doesn’t wait for formatting:

 

no NVMe controllers found

DAOS control server listening on 0.0.0.0:10001

no NVDIMMs found!

ERROR: failed to read existing superblock: can't read superblock from unformatted storage

 

But perhaps my situation is different.   I followed your instructions and got the output below.   For storage I am using an IOAT device, I do not have NVMe devices but I am using the vfio_pci driver, the iommu is enabled.   Is non-root supported with IOAT devices?  (I don’t see any code to chown /dev/sdX to user daos in setup.sh.)  If not supported, sorry for the interruption, I’ll switch to root.

 

Thanks, Kevan

 

P.S. Should I be worried about the ASLR message?

 

 

-bash-4.2$ daos/install/bin/orterun -np 1 -H localhost --report-uri /tmp/urifile daos_server start -t 1 -d /tmp/ -o /home/users/daos/daos/utils/config/examples/daos_server_local.yml

daos_server logging to file /tmp/daos_control.log

Starting SPDK v18.07-pre / DPDK 18.02.0 initialization...

[ DPDK EAL parameters: spdk -c 0x1 --file-prefix=spdk1327119562 --base-virtaddr=0x200000000000 --proc-type=auto ]

EAL: Detected 32 lcore(s)

EAL: Auto-detected process type: PRIMARY

EAL: No free hugepages reported in hugepages-1048576kB

EAL: Multi-process socket /home/users/daos/.spdk1327119562_unix

EAL: Probing VFIO support...

EAL: Cannot obtain physical addresses: No such file or directory. Only vfio will function.

no NVMe controllers found

DAOS control server listening on 0.0.0.0:10001

no NVDIMMs found!

Starting I/O server instance 0: /home/users/daos/daos/install/bin/daos_io_server

daos_io_server:0 Using legacy core allocation algorithm

daos_io_server:0 Starting SPDK v18.07-pre / DPDK 18.02.0 initialization...

[ DPDK EAL parameters: daos -c 0x1 --file-prefix=spdk1327119562 --base-virtaddr=0x200000000000 --proc-type=auto ]

ERROR: daos_io_server:0 EAL: Detected 32 lcore(s)

ERROR: daos_io_server:0 EAL: Auto-detected process type: SECONDARY

daos_io_server:0 EAL: Multi-process socket /home/users/daos/.spdk1327119562_unix_15634_10ea63cd2a562

daos_io_server:0 EAL: Probing VFIO support...

daos_io_server:0 EAL: WARNING: Address Space Layout Randomization (ASLR) is enabled in the kernel.

daos_io_server:0 EAL:    This may cause issues with mapping memory into secondary processes

daos_io_server:0 EAL: Cannot obtain physical addresses: No such file or directory. Only vfio will function.

ERROR: daos_io_server:0 bdev_aio.c:  83:bdev_aio_open: *ERROR*: open() failed (file:/dev/sdb), errno 13: Permission denied

ERROR: daos_io_server:0 bdev_aio.c: 470:create_aio_disk: *ERROR*: Unable to open file /dev/sdb. fd: -1 errno: 13

bdev_aio.c: 599:bdev_aio_initialize: *ERROR*: Unable to create AIO bdev from file /dev/sdb

ERROR: DAOS I/O Server exited with error: /home/users/daos/daos/install/bin/daos_io_server (instance 0) exited: exit status 1

-------------------------------------------------------

Primary job  terminated normally, but 1 process returned

a non-zero exit code. Per user-direction, the job has been aborted.

-------------------------------------------------------

--------------------------------------------------------------------------

orterun detected that one or more processes exited with non-zero status, thus causing

the job to be terminated. The first process to do so was:

 

  Process name: [[21676,1],0]

  Exit code:    1

--------------------------------------------------------------------------

 

 

From: <daos@daos.groups.io> on behalf of "Nabarro, Tom" <tom.nabarro@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Friday, October 25, 2019 at 12:37 PM
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: Re: [daos] DAOS master error when formatting tmpfs

 

Let’s focus on the non-root case to start with, after you have manually created empty /mnt/daos and mounted tmpfs, change permissions to 777 (just for this experiment) on the SCM directory and run daos_server with control_log_mask: DEBUG (please also paste config file). The superblock should be created and the IO server start.

 

Thanks

 

Regards,

Tom Nabarro – DCG/ESAD

M: +44 (0)7786 260986

Skype: tom.nabarro

 

From: daos@daos.groups.io [mailto:daos@daos.groups.io] On Behalf Of Jordan Henderson
Sent: Friday, October 25, 2019 5:00 PM
To: daos@daos.groups.io
Subject: Re: [daos] DAOS master error when formatting tmpfs

 

Hi Tom,

 

in general I don't usually run as root, but in this case it did seem to be the only way that I could get the server to wait for storage formatting. However, even when I started from a clean slate for the tmpfs mount, as per these instructions, it didn't seem to matter whether I manually mounted the tmpfs myself or allowed the storage formatting to do so. When running as root, the storage format appeared to be successful in both cases, but the server still immediately returned an error for the storage not being formatted and then exited. When running as non-root, the server didn't wait for storage formatting in either case.

 

It might be worth noting that after each successful format command, my tmpfs mount still didn't contain a superblock file. I'm guessing that this is probably why the server is returning a formatting error right after the storage format command?

 


From: daos@daos.groups.io <daos@daos.groups.io> on behalf of Nabarro, Tom via Groups.Io <tom.nabarro@...>
Sent: Friday, October 25, 2019 8:58 AM
To: Chaarawi, Mohamad <mohamad.chaarawi@...>; Jordan Henderson <jhenderson@...>
Cc: daos@daos.groups.io <daos@daos.groups.io>
Subject: Re: [daos] DAOS master error when formatting tmpfs

 

Hello Jordan

 

If running as root, could you please try “umount /mnt/daos; rm -rf /mnt/daos” then start server which should wait for format, then format from daos_shell.

 

$  umount /mnt/daos; rm -rf /mnt/daos

$  orterun -N 1 -H localhost --report-uri /tmp/urifile --allow-run-as-root daos_server start -t 1 -o ~tanabarr/projects/daos_m/utils/config/examples/daos_server_sockets.yml -i

$ daos_shell -i storage format

 

If running as non-root, the following works for me:

 

$  sudo umount /mnt/daos; sudo rm -rf /mnt/daos; sudo mkdir /mnt/daos; sudo mount -t tmpfs -o size=64G tmpfs /mnt/daos

$ orterun -np 1 -H localhost --report-uri /tmp/urifile daos_server start -t 1 -d /tmp/ -o ~tanabarr/projects/daos_m/utils/config/examples/daos_server_sockets.yml –i

 

Above tried on commit 15168685005843766c038afff45fd6681c07f341 .

86f730a37d0170fdb733c1b08308162a245e5aea did introduce changes to the formatting code but I haven’t observed any subsequent regressions in my testing, it may just need a clean slate in your situation.

 

Thanks

 

Regards,

Tom Nabarro – DCG/ESAD

M: +44 (0)7786 260986

Skype: tom.nabarro

 

From: Chaarawi, Mohamad
Sent: Friday, October 25, 2019 2:24 PM
To: Jordan Henderson <jhenderson@...>
Cc: Nabarro, Tom <tom.nabarro@...>
Subject: Re: DAOS master error when formatting tmpfs

 

Jordan, it would be great to send such emails to the DAOS user list where there are more people (from the control plane) who can answer. Please subscribe here:

https://daos.groups.io/g/daos

 

did you clear your tmpfs beforehand?

Im not sure why you would be getting this over tmpfs, but I never start the server as root.

 

Mohamad

 

 

From: Jordan Henderson <jhenderson@...>
Date: Thursday, October 24, 2019 at 10:22 PM
To: "Chaarawi, Mohamad" <mohamad.chaarawi@...>
Subject: DAOS master error when formatting tmpfs

 

Hi Mohamad,

 

are you aware of any bugs with the current DAOS master (commit 15168685005843766c038afff45fd6681c07f341) and trying to format storage when the SCM is emulated through a tmpfs? I tried updating my DAOS to the latest master but when I tried to start DAOS I got the error below. If I started the server as root it then waited for me to format the storage, but when I did so the format command returned success and then the server immediately returned the same error as below and exited. Once I switched to the v0.6 tag (which was also ahead of my current install), I was able to run the server as non-root, format the tmpfs and start the server fine.

 

DEBUG 21:03:05.903946 instance.go:199: /home/jhenderson/Work/DAOS_Workspace/daos_fs/ (ram) needs format: true

ERROR: failed to read existing superblock: can't read superblock from unformatted storage

DEBUG 21:03:05.904215 main.go:67: can't read superblock from unformatted storage

github.com/daos-stack/daos/src/control/server.(*IOServerInstance).ReadSuperblock

/home/jhenderson/git/daos/build/src/control/src/github.com/daos-stack/daos/src/control/server/superblock.go:176

github.com/daos-stack/daos/src/control/server.(*IOServerInstance).NeedsSuperblock

/home/jhenderson/git/daos/build/src/control/src/github.com/daos-stack/daos/src/control/server/superblock.go:110

github.com/daos-stack/daos/src/control/server.(*IOServerHarness).CreateSuperblocks

/home/jhenderson/git/daos/build/src/control/src/github.com/daos-stack/daos/src/control/server/harness.go:118

github.com/daos-stack/daos/src/control/server.Start

/home/jhenderson/git/daos/build/src/control/src/github.com/daos-stack/daos/src/control/server/server.go:166

main.(*startCmd).Execute

/home/jhenderson/git/daos/build/src/control/src/github.com/daos-stack/daos/src/control/cmd/daos_server/start.go:167

main.parseOpts.func1

/home/jhenderson/git/daos/build/src/control/src/github.com/daos-stack/daos/src/control/cmd/daos_server/main.go:103

github.com/daos-stack/daos/src/control/vendor/github.com/jessevdk/go-flags.(*Parser).ParseArgs

/home/jhenderson/git/daos/build/src/control/src/github.com/daos-stack/daos/src/control/vendor/github.com/jessevdk/go-flags/parser.go:314

main.parseOpts

/home/jhenderson/git/daos/build/src/control/src/github.com/daos-stack/daos/src/control/cmd/daos_server/main.go:111

main.main

/home/jhenderson/git/daos/build/src/control/src/github.com/daos-stack/daos/src/control/cmd/daos_server/main.go:123

runtime.main

/usr/lib64/go1.11.9/go/src/runtime/proc.go:201

runtime.goexit

/usr/lib64/go1.11.9/go/src/runtime/asm_amd64.s:1333

failed to read existing superblock

github.com/daos-stack/daos/src/control/server.(*IOServerInstance).NeedsSuperblock

/home/jhenderson/git/daos/build/src/control/src/github.com/daos-stack/daos/src/control/server/superblock.go:117

github.com/daos-stack/daos/src/control/server.(*IOServerHarness).CreateSuperblocks

/home/jhenderson/git/daos/build/src/control/src/github.com/daos-stack/daos/src/control/server/harness.go:118

github.com/daos-stack/daos/src/control/server.Start

/home/jhenderson/git/daos/build/src/control/src/github.com/daos-stack/daos/src/control/server/server.go:166

main.(*startCmd).Execute

/home/jhenderson/git/daos/build/src/control/src/github.com/daos-stack/daos/src/control/cmd/daos_server/start.go:167

main.parseOpts.func1

/home/jhenderson/git/daos/build/src/control/src/github.com/daos-stack/daos/src/control/cmd/daos_server/main.go:103

github.com/daos-stack/daos/src/control/vendor/github.com/jessevdk/go-flags.(*Parser).ParseArgs

/home/jhenderson/git/daos/build/src/control/src/github.com/daos-stack/daos/src/control/vendor/github.com/jessevdk/go-flags/parser.go:314

main.parseOpts

/home/jhenderson/git/daos/build/src/control/src/github.com/daos-stack/daos/src/control/cmd/daos_server/main.go:111

main.main

/home/jhenderson/git/daos/build/src/control/src/github.com/daos-stack/daos/src/control/cmd/daos_server/main.go:123

runtime.main

/usr/lib64/go1.11.9/go/src/runtime/proc.go:201

runtime.goexit

/usr/lib64/go1.11.9/go/src/runtime/asm_amd64.s:1333

---------------------------------------------------------------------
Intel Corporation (UK) Limited
Registered No. 1134945 (England)
Registered Office: Pipers Way, Swindon SN3 1RJ
VAT No: 860 2173 47

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

---------------------------------------------------------------------
Intel Corporation (UK) Limited
Registered No. 1134945 (England)
Registered Office: Pipers Way, Swindon SN3 1RJ
VAT No: 860 2173 47

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

---------------------------------------------------------------------
Intel Corporation (UK) Limited
Registered No. 1134945 (England)
Registered Office: Pipers Way, Swindon SN3 1RJ
VAT No: 860 2173 47

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


Re: daos_server storage prepare failures

Nabarro, Tom
 

When the fix lands you should be able to run storage prepare -n -u <non-root username> to change ownership back when desired.

 

Regards,

Tom Nabarro – DCG/ESAD

M: +44 (0)7786 260986

Skype: tom.nabarro

 

From: daos@daos.groups.io [mailto:daos@daos.groups.io] On Behalf Of Kevan Rehm
Sent: Friday, October 25, 2019 9:38 PM
To: daos@daos.groups.io
Subject: Re: [daos] daos_server storage prepare failures

 

A side effect of the code problem mentioned below is that /dev/hugepages is left owned by root, mode 0755, so if you try to start daos_server as user daos, it fails with:

 

EAL: Cannot obtain physical addresses: No such file or directory. Only vfio will function.

EAL: Failed to create shared memory!

EAL: FATAL: Cannot init memory

 

EAL: Cannot init memory

 

because the daos user doesn’t have permission to create files in /dev/hugepages.   As root I chowned /dev/hugepages manually to daos and now “daos_server start”  is able to create files in the directory as user daos.

 

If I should be filing an issue rather than reporting here, please point me to where I can do so.

 

Thanks, Kevan

 

From: <daos@daos.groups.io> on behalf of Kevan Rehm <krehm@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Friday, October 25, 2019 at 3:00 PM
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: [daos] daos_server storage prepare failures

 

Greetings,

 

I’ve been tracking down a problem where “daos_server storage prepare” fails every time.  I am running daos master, up to date as of a few minutes ago.  The command looks like this:

 

/home/users/daos/daos/install/bin/daos_server storage prepare --debug --nvme-only --target-user=daos -p 1024 -o /home/users/daos/daos/utils/config/examples/daos_server_local.yml

 

Output is this:

 

Preparing locally-attached NVMe storage...

DEBUG 14:26:03.463504 storage_nvme.go:104: spdk setup with _NRHUGE=1024

DEBUG 14:26:03.463959 storage_nvme.go:108: spdk setup with _TARGET_USER=daos

DEBUG 14:26:13.369224 main.go:67: scan error(s):

  SPDK setup: spdk setup failed (_NRHUGE=1024, _TARGET_USER=daos, , chown: cannot access ‘libvirt’: No such file or directory

): exit status 123

 

main.concatErrors

       /home/users/daos/daos/build/src/control/src/github.com/daos-stack/daos/src/control/cmd/daos_server/storage.go:94

main.(*storagePrepareCmd).Execute

       /home/users/daos/daos/build/src/control/src/github.com/daos-stack/daos/src/control/cmd/daos_server/storage.go:160

main.parseOpts.func1

       /home/users/daos/daos/build/src/control/src/github.com/daos-stack/daos/src/control/cmd/daos_server/main.go:103

github.com/daos-stack/daos/src/control/vendor/github.com/jessevdk/go-flags.(*Parser).ParseArgs

       /home/users/daos/daos/build/src/control/src/github.com/daos-stack/daos/src/control/vendor/github.com/jessevdk/go-flags/parser.go:314

main.parseOpts

       /home/users/daos/daos/build/src/control/src/github.com/daos-stack/daos/src/control/cmd/daos_server/main.go:111

main.main

       /home/users/daos/daos/build/src/control/src/github.com/daos-stack/daos/src/control/cmd/daos_server/main.go:123

runtime.main

       /usr/lib/golang/src/runtime/proc.go:203

runtime.goexit

       /usr/lib/golang/src/runtime/asm_amd64.s:1357

ERROR: scan error(s):

  SPDK setup: spdk setup failed (_NRHUGE=1024, _TARGET_USER=daos, , chown: cannot access ‘libvirt’: No such file or directory

): exit status 123

 

The problem is in file src/control/server/init/setup_spdk.sh, recently changed.   Code section is this:

 

        # build arglist manually to filter missing directories/files

        # so we don't error on non-existent entities

        for glob in '/dev/hugepages' '/dev/uio*'                \

                '/sys/class/uio/uio*/device/config'     \

                '/sys/class/uio/uio*/device/resource*'; do

 

                if list=$(ls $glob); then

                        echo "RUN: ls $glob | xargs -r chown -R $_TARGET_USER"

                        echo "$list" | xargs -r chown -R "$_TARGET_USER"

                fi

        done

 

On my machine, /dev/hugepages looks like this:

 

[root@hl-d109 ~]# ls -lR /dev/hugepages

/dev/hugepages:

total 0

drwxr-xr-x 3 root root 0 Oct 25 13:26 libvirt

 

/dev/hugepages/libvirt:

total 0

drwxr-xr-x 2 root root 0 Oct 25 13:26 qemu

 

/dev/hugepages/libvirt/qemu:

total 0

 

I think the intent was for the pathnames in the glob variable to be absolute pathnames, but the above code yields relative pathnames, i.e. libvirt instead of /dev/hugepages/libvirt, so the chown in the xargs statement fails because the command is not be executed inside /dev/hugepages.

 

If you need more information, let me know.

 

Thanks, Kevan

---------------------------------------------------------------------
Intel Corporation (UK) Limited
Registered No. 1134945 (England)
Registered Office: Pipers Way, Swindon SN3 1RJ
VAT No: 860 2173 47

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


Re: daos_server storage prepare failures

Nabarro, Tom
 

Apologies that this regression slipped in

PR for the fix https://github.com/daos-stack/daos/pull/1323

 

Regards,

Tom Nabarro – DCG/ESAD

M: +44 (0)7786 260986

Skype: tom.nabarro

 

From: daos@daos.groups.io [mailto:daos@daos.groups.io] On Behalf Of Kevan Rehm
Sent: Friday, October 25, 2019 9:00 PM
To: daos@daos.groups.io
Subject: [daos] daos_server storage prepare failures

 

Greetings,

 

I’ve been tracking down a problem where “daos_server storage prepare” fails every time.  I am running daos master, up to date as of a few minutes ago.  The command looks like this:

 

/home/users/daos/daos/install/bin/daos_server storage prepare --debug --nvme-only --target-user=daos -p 1024 -o /home/users/daos/daos/utils/config/examples/daos_server_local.yml

 

Output is this:

 

Preparing locally-attached NVMe storage...

DEBUG 14:26:03.463504 storage_nvme.go:104: spdk setup with _NRHUGE=1024

DEBUG 14:26:03.463959 storage_nvme.go:108: spdk setup with _TARGET_USER=daos

DEBUG 14:26:13.369224 main.go:67: scan error(s):

  SPDK setup: spdk setup failed (_NRHUGE=1024, _TARGET_USER=daos, , chown: cannot access ‘libvirt’: No such file or directory

): exit status 123

 

main.concatErrors

       /home/users/daos/daos/build/src/control/src/github.com/daos-stack/daos/src/control/cmd/daos_server/storage.go:94

main.(*storagePrepareCmd).Execute

       /home/users/daos/daos/build/src/control/src/github.com/daos-stack/daos/src/control/cmd/daos_server/storage.go:160

main.parseOpts.func1

       /home/users/daos/daos/build/src/control/src/github.com/daos-stack/daos/src/control/cmd/daos_server/main.go:103

github.com/daos-stack/daos/src/control/vendor/github.com/jessevdk/go-flags.(*Parser).ParseArgs

       /home/users/daos/daos/build/src/control/src/github.com/daos-stack/daos/src/control/vendor/github.com/jessevdk/go-flags/parser.go:314

main.parseOpts

       /home/users/daos/daos/build/src/control/src/github.com/daos-stack/daos/src/control/cmd/daos_server/main.go:111

main.main

       /home/users/daos/daos/build/src/control/src/github.com/daos-stack/daos/src/control/cmd/daos_server/main.go:123

runtime.main

       /usr/lib/golang/src/runtime/proc.go:203

runtime.goexit

       /usr/lib/golang/src/runtime/asm_amd64.s:1357

ERROR: scan error(s):

  SPDK setup: spdk setup failed (_NRHUGE=1024, _TARGET_USER=daos, , chown: cannot access ‘libvirt’: No such file or directory

): exit status 123

 

The problem is in file src/control/server/init/setup_spdk.sh, recently changed.   Code section is this:

 

        # build arglist manually to filter missing directories/files

        # so we don't error on non-existent entities

        for glob in '/dev/hugepages' '/dev/uio*'                \

                '/sys/class/uio/uio*/device/config'     \

                '/sys/class/uio/uio*/device/resource*'; do

 

                if list=$(ls $glob); then

                        echo "RUN: ls $glob | xargs -r chown -R $_TARGET_USER"

                        echo "$list" | xargs -r chown -R "$_TARGET_USER"

                fi

        done

 

On my machine, /dev/hugepages looks like this:

 

[root@hl-d109 ~]# ls -lR /dev/hugepages

/dev/hugepages:

total 0

drwxr-xr-x 3 root root 0 Oct 25 13:26 libvirt

 

/dev/hugepages/libvirt:

total 0

drwxr-xr-x 2 root root 0 Oct 25 13:26 qemu

 

/dev/hugepages/libvirt/qemu:

total 0

 

I think the intent was for the pathnames in the glob variable to be absolute pathnames, but the above code yields relative pathnames, i.e. libvirt instead of /dev/hugepages/libvirt, so the chown in the xargs statement fails because the command is not be executed inside /dev/hugepages.

 

If you need more information, let me know.

 

Thanks, Kevan

---------------------------------------------------------------------
Intel Corporation (UK) Limited
Registered No. 1134945 (England)
Registered Office: Pipers Way, Swindon SN3 1RJ
VAT No: 860 2173 47

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


Re: DAOS master error when formatting tmpfs

Patrick Farrell <paf@...>
 

Kevan,

The ASLR message was discussed in an earlier message; I’ll let you find it, but in short, no.  It’s fine, and will be gone soon once some planned changes are done.

-Patrick


From: daos@daos.groups.io <daos@daos.groups.io> on behalf of Kevan Rehm <krehm@...>
Sent: Friday, October 25, 2019 4:50 PM
To: daos@daos.groups.io <daos@daos.groups.io>
Subject: Re: [daos] DAOS master error when formatting tmpfs
 

Tom,

 

I am seeing the same message that Jordan reports when running as non-root, the “daos_server start” command fails immediately, doesn’t wait for formatting:

 

no NVMe controllers found

DAOS control server listening on 0.0.0.0:10001

no NVDIMMs found!

ERROR: failed to read existing superblock: can't read superblock from unformatted storage

 

But perhaps my situation is different.   I followed your instructions and got the output below.   For storage I am using an IOAT device, I do not have NVMe devices but I am using the vfio_pci driver, the iommu is enabled.   Is non-root supported with IOAT devices?  (I don’t see any code to chown /dev/sdX to user daos in setup.sh.)  If not supported, sorry for the interruption, I’ll switch to root.

 

Thanks, Kevan

 

P.S. Should I be worried about the ASLR message?

 

 

-bash-4.2$ daos/install/bin/orterun -np 1 -H localhost --report-uri /tmp/urifile daos_server start -t 1 -d /tmp/ -o /home/users/daos/daos/utils/config/examples/daos_server_local.yml

daos_server logging to file /tmp/daos_control.log

Starting SPDK v18.07-pre / DPDK 18.02.0 initialization...

[ DPDK EAL parameters: spdk -c 0x1 --file-prefix=spdk1327119562 --base-virtaddr=0x200000000000 --proc-type=auto ]

EAL: Detected 32 lcore(s)

EAL: Auto-detected process type: PRIMARY

EAL: No free hugepages reported in hugepages-1048576kB

EAL: Multi-process socket /home/users/daos/.spdk1327119562_unix

EAL: Probing VFIO support...

EAL: Cannot obtain physical addresses: No such file or directory. Only vfio will function.

no NVMe controllers found

DAOS control server listening on 0.0.0.0:10001

no NVDIMMs found!

Starting I/O server instance 0: /home/users/daos/daos/install/bin/daos_io_server

daos_io_server:0 Using legacy core allocation algorithm

daos_io_server:0 Starting SPDK v18.07-pre / DPDK 18.02.0 initialization...

[ DPDK EAL parameters: daos -c 0x1 --file-prefix=spdk1327119562 --base-virtaddr=0x200000000000 --proc-type=auto ]

ERROR: daos_io_server:0 EAL: Detected 32 lcore(s)

ERROR: daos_io_server:0 EAL: Auto-detected process type: SECONDARY

daos_io_server:0 EAL: Multi-process socket /home/users/daos/.spdk1327119562_unix_15634_10ea63cd2a562

daos_io_server:0 EAL: Probing VFIO support...

daos_io_server:0 EAL: WARNING: Address Space Layout Randomization (ASLR) is enabled in the kernel.

daos_io_server:0 EAL:    This may cause issues with mapping memory into secondary processes

daos_io_server:0 EAL: Cannot obtain physical addresses: No such file or directory. Only vfio will function.

ERROR: daos_io_server:0 bdev_aio.c:  83:bdev_aio_open: *ERROR*: open() failed (file:/dev/sdb), errno 13: Permission denied

ERROR: daos_io_server:0 bdev_aio.c: 470:create_aio_disk: *ERROR*: Unable to open file /dev/sdb. fd: -1 errno: 13

bdev_aio.c: 599:bdev_aio_initialize: *ERROR*: Unable to create AIO bdev from file /dev/sdb

ERROR: DAOS I/O Server exited with error: /home/users/daos/daos/install/bin/daos_io_server (instance 0) exited: exit status 1

-------------------------------------------------------

Primary job  terminated normally, but 1 process returned

a non-zero exit code. Per user-direction, the job has been aborted.

-------------------------------------------------------

--------------------------------------------------------------------------

orterun detected that one or more processes exited with non-zero status, thus causing

the job to be terminated. The first process to do so was:

 

  Process name: [[21676,1],0]

  Exit code:    1

--------------------------------------------------------------------------

 

 

From: <daos@daos.groups.io> on behalf of "Nabarro, Tom" <tom.nabarro@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Friday, October 25, 2019 at 12:37 PM
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: Re: [daos] DAOS master error when formatting tmpfs

 

Let’s focus on the non-root case to start with, after you have manually created empty /mnt/daos and mounted tmpfs, change permissions to 777 (just for this experiment) on the SCM directory and run daos_server with control_log_mask: DEBUG (please also paste config file). The superblock should be created and the IO server start.

 

Thanks

 

Regards,

Tom Nabarro – DCG/ESAD

M: +44 (0)7786 260986

Skype: tom.nabarro

 

From: daos@daos.groups.io [mailto:daos@daos.groups.io] On Behalf Of Jordan Henderson
Sent: Friday, October 25, 2019 5:00 PM
To: daos@daos.groups.io
Subject: Re: [daos] DAOS master error when formatting tmpfs

 

Hi Tom,

 

in general I don't usually run as root, but in this case it did seem to be the only way that I could get the server to wait for storage formatting. However, even when I started from a clean slate for the tmpfs mount, as per these instructions, it didn't seem to matter whether I manually mounted the tmpfs myself or allowed the storage formatting to do so. When running as root, the storage format appeared to be successful in both cases, but the server still immediately returned an error for the storage not being formatted and then exited. When running as non-root, the server didn't wait for storage formatting in either case.

 

It might be worth noting that after each successful format command, my tmpfs mount still didn't contain a superblock file. I'm guessing that this is probably why the server is returning a formatting error right after the storage format command?

 


From: daos@daos.groups.io <daos@daos.groups.io> on behalf of Nabarro, Tom via Groups.Io <tom.nabarro@...>
Sent: Friday, October 25, 2019 8:58 AM
To: Chaarawi, Mohamad <mohamad.chaarawi@...>; Jordan Henderson <jhenderson@...>
Cc: daos@daos.groups.io <daos@daos.groups.io>
Subject: Re: [daos] DAOS master error when formatting tmpfs

 

Hello Jordan

 

If running as root, could you please try “umount /mnt/daos; rm -rf /mnt/daos” then start server which should wait for format, then format from daos_shell.

 

$  umount /mnt/daos; rm -rf /mnt/daos

$  orterun -N 1 -H localhost --report-uri /tmp/urifile --allow-run-as-root daos_server start -t 1 -o ~tanabarr/projects/daos_m/utils/config/examples/daos_server_sockets.yml -i

$ daos_shell -i storage format

 

If running as non-root, the following works for me:

 

$  sudo umount /mnt/daos; sudo rm -rf /mnt/daos; sudo mkdir /mnt/daos; sudo mount -t tmpfs -o size=64G tmpfs /mnt/daos

$ orterun -np 1 -H localhost --report-uri /tmp/urifile daos_server start -t 1 -d /tmp/ -o ~tanabarr/projects/daos_m/utils/config/examples/daos_server_sockets.yml –i

 

Above tried on commit 15168685005843766c038afff45fd6681c07f341 .

86f730a37d0170fdb733c1b08308162a245e5aea did introduce changes to the formatting code but I haven’t observed any subsequent regressions in my testing, it may just need a clean slate in your situation.

 

Thanks

 

Regards,

Tom Nabarro – DCG/ESAD

M: +44 (0)7786 260986

Skype: tom.nabarro

 

From: Chaarawi, Mohamad
Sent: Friday, October 25, 2019 2:24 PM
To: Jordan Henderson <jhenderson@...>
Cc: Nabarro, Tom <tom.nabarro@...>
Subject: Re: DAOS master error when formatting tmpfs

 

Jordan, it would be great to send such emails to the DAOS user list where there are more people (from the control plane) who can answer. Please subscribe here:

https://daos.groups.io/g/daos

 

did you clear your tmpfs beforehand?

Im not sure why you would be getting this over tmpfs, but I never start the server as root.

 

Mohamad

 

 

From: Jordan Henderson <jhenderson@...>
Date: Thursday, October 24, 2019 at 10:22 PM
To: "Chaarawi, Mohamad" <mohamad.chaarawi@...>
Subject: DAOS master error when formatting tmpfs

 

Hi Mohamad,

 

are you aware of any bugs with the current DAOS master (commit 15168685005843766c038afff45fd6681c07f341) and trying to format storage when the SCM is emulated through a tmpfs? I tried updating my DAOS to the latest master but when I tried to start DAOS I got the error below. If I started the server as root it then waited for me to format the storage, but when I did so the format command returned success and then the server immediately returned the same error as below and exited. Once I switched to the v0.6 tag (which was also ahead of my current install), I was able to run the server as non-root, format the tmpfs and start the server fine.

 

DEBUG 21:03:05.903946 instance.go:199: /home/jhenderson/Work/DAOS_Workspace/daos_fs/ (ram) needs format: true

ERROR: failed to read existing superblock: can't read superblock from unformatted storage

DEBUG 21:03:05.904215 main.go:67: can't read superblock from unformatted storage

github.com/daos-stack/daos/src/control/server.(*IOServerInstance).ReadSuperblock

/home/jhenderson/git/daos/build/src/control/src/github.com/daos-stack/daos/src/control/server/superblock.go:176

github.com/daos-stack/daos/src/control/server.(*IOServerInstance).NeedsSuperblock

/home/jhenderson/git/daos/build/src/control/src/github.com/daos-stack/daos/src/control/server/superblock.go:110

github.com/daos-stack/daos/src/control/server.(*IOServerHarness).CreateSuperblocks

/home/jhenderson/git/daos/build/src/control/src/github.com/daos-stack/daos/src/control/server/harness.go:118

github.com/daos-stack/daos/src/control/server.Start

/home/jhenderson/git/daos/build/src/control/src/github.com/daos-stack/daos/src/control/server/server.go:166

main.(*startCmd).Execute

/home/jhenderson/git/daos/build/src/control/src/github.com/daos-stack/daos/src/control/cmd/daos_server/start.go:167

main.parseOpts.func1

/home/jhenderson/git/daos/build/src/control/src/github.com/daos-stack/daos/src/control/cmd/daos_server/main.go:103

github.com/daos-stack/daos/src/control/vendor/github.com/jessevdk/go-flags.(*Parser).ParseArgs

/home/jhenderson/git/daos/build/src/control/src/github.com/daos-stack/daos/src/control/vendor/github.com/jessevdk/go-flags/parser.go:314

main.parseOpts

/home/jhenderson/git/daos/build/src/control/src/github.com/daos-stack/daos/src/control/cmd/daos_server/main.go:111

main.main

/home/jhenderson/git/daos/build/src/control/src/github.com/daos-stack/daos/src/control/cmd/daos_server/main.go:123

runtime.main

/usr/lib64/go1.11.9/go/src/runtime/proc.go:201

runtime.goexit

/usr/lib64/go1.11.9/go/src/runtime/asm_amd64.s:1333

failed to read existing superblock

github.com/daos-stack/daos/src/control/server.(*IOServerInstance).NeedsSuperblock

/home/jhenderson/git/daos/build/src/control/src/github.com/daos-stack/daos/src/control/server/superblock.go:117

github.com/daos-stack/daos/src/control/server.(*IOServerHarness).CreateSuperblocks

/home/jhenderson/git/daos/build/src/control/src/github.com/daos-stack/daos/src/control/server/harness.go:118

github.com/daos-stack/daos/src/control/server.Start

/home/jhenderson/git/daos/build/src/control/src/github.com/daos-stack/daos/src/control/server/server.go:166

main.(*startCmd).Execute

/home/jhenderson/git/daos/build/src/control/src/github.com/daos-stack/daos/src/control/cmd/daos_server/start.go:167

main.parseOpts.func1

/home/jhenderson/git/daos/build/src/control/src/github.com/daos-stack/daos/src/control/cmd/daos_server/main.go:103

github.com/daos-stack/daos/src/control/vendor/github.com/jessevdk/go-flags.(*Parser).ParseArgs

/home/jhenderson/git/daos/build/src/control/src/github.com/daos-stack/daos/src/control/vendor/github.com/jessevdk/go-flags/parser.go:314

main.parseOpts

/home/jhenderson/git/daos/build/src/control/src/github.com/daos-stack/daos/src/control/cmd/daos_server/main.go:111

main.main

/home/jhenderson/git/daos/build/src/control/src/github.com/daos-stack/daos/src/control/cmd/daos_server/main.go:123

runtime.main

/usr/lib64/go1.11.9/go/src/runtime/proc.go:201

runtime.goexit

/usr/lib64/go1.11.9/go/src/runtime/asm_amd64.s:1333

---------------------------------------------------------------------
Intel Corporation (UK) Limited
Registered No. 1134945 (England)
Registered Office: Pipers Way, Swindon SN3 1RJ
VAT No: 860 2173 47

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

---------------------------------------------------------------------
Intel Corporation (UK) Limited
Registered No. 1134945 (England)
Registered Office: Pipers Way, Swindon SN3 1RJ
VAT No: 860 2173 47

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

1401 - 1420 of 1624