Date   

Re: FIO Results & Running IO500

Lombardi, Johann
 

Hi Peter,

 

A few things to try/explore:

  • I don’t think that we have ever tested with 2x pmem DIMMs per socket. Maybe you could try with dram instead of pmem to see whether the performance increases.
  • 16x targets might be too much for 2x pmem DIMMs. You could try to reduce it to 8x targets and set “nr_xs_helpers” to 0.
  • It sounds like you run the benchmark (fio, IO500) and the DAOS engine on the same node. There might be interferences between both. You could try to change the affinity of the benchmark to run on CPU cores not used by the DAOS engine (e.g. with taskset(1) or mpirun args).

 

Cheers,

Johann

 

From: <daos@daos.groups.io> on behalf of Peter <magpiesaresoawesome@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Tuesday 13 July 2021 at 03:16
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: Re: [daos] FIO Results & Running IO500

 

Hello again,

I've tried some more to improve these results, different DAOS versions (including the YUM repo), different MPI versions, DAOS configurations, etc.

I'm still unable to diagnose this issue, IOPS performance remains low for both FIO and IO500.

Does anyone have any input on how I can try to debug or resolve this issue?

Thanks for your help.

---------------------------------------------------------------------
Intel Corporation SAS (French simplified joint stock company)
Registered headquarters: "Les Montalets"- 2, rue de Paris,
92196 Meudon Cedex, France
Registration Number:  302 456 199 R.C.S. NANTERRE
Capital: 4,572,000 Euros

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


Re: Creating a POSIX container first and opening it inside my code

Lombardi, Johann
 

Hi Lipeng,

 

Assuming that you are using 1.2, you can create a POSIX container with the daos utility:

$ daos cont create --pool <POOL_UUID> --type POSIX

 

You get back a container UUID that you can then pass to daos_cont_open() to open the container and then mount with dfs_mount() (see https://github.com/daos-stack/daos/blob/master/src/include/daos_fs.h#L121).

 

Cheers,

Johann

 

From: <daos@daos.groups.io> on behalf of "wanl via groups.io" <wanl@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Tuesday 6 July 2021 at 17:00
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: [daos] Creating a POSIX container first and opening it inside my code

 

Hi,

How to create a POSIX container first by calling "daos cont create" and then open it inside my code using "daos_cont_open()"? Should I pass some environment variables into my code?

Thanks,
Lipeng

 

---------------------------------------------------------------------
Intel Corporation SAS (French simplified joint stock company)
Registered headquarters: "Les Montalets"- 2, rue de Paris,
92196 Meudon Cedex, France
Registration Number:  302 456 199 R.C.S. NANTERRE
Capital: 4,572,000 Euros

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


Re: Update to privileged helper

Nabarro, Tom
 

Try running the utils/setup_daos_admin.sh script as sudo.

 

The admin binary should be moved to /usr/bin/ with setuid.

Remove executable bit from install/bin/daos_admin.

These steps are performed by the script.

 

From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of Ethan Mallove
Sent: Tuesday, July 13, 2021 9:24 PM
To: daos@daos.groups.io
Subject: Re: [daos] Update to privileged helper

 

If you are running DAOS from source, and you want to run as a non-root user, then you will need to perform some manual setup steps on every server in order to ensure that the privileged helper has the correct permissions in order to perform privileged tasks

What are the manual steps?  I tried using setuid and checking immutable bit, but I still get the privileged helper (daos_admin) does not have root permissions, e.g.,

# chmod u+s install/bin/daos_admin

$ ls -ltrd install/bin/daos_admin

-rwsrwxr-x 1 emallovx emallovx 21838880 Jul 13 18:54 install/bin/daos_admin

# chown -R root:root install/bin/daos_admin

 

chown: changing ownership of ‘install/bin/daos_admin’: Operation not permitted

$ lsattr daos_admin

lsattr: Inappropriate ioctl for device While reading flags on daos_admin


Regards,
Ethan


Re: Update to privileged helper

Ethan Mallove
 

If you are running DAOS from source, and you want to run as a non-root user, then you will need to perform some manual setup steps on every server in order to ensure that the privileged helper has the correct permissions in order to perform privileged tasks

What are the manual steps?  I tried using setuid and checking immutable bit, but I still get the privileged helper (daos_admin) does not have root permissions, e.g.,

# chmod u+s install/bin/daos_admin

$ ls -ltrd install/bin/daos_admin

-rwsrwxr-x 1 emallovx emallovx 21838880 Jul 13 18:54 install/bin/daos_admin

# chown -R root:root install/bin/daos_admin

 

chown: changing ownership of ‘install/bin/daos_admin’: Operation not permitted

$ lsattr daos_admin

lsattr: Inappropriate ioctl for device While reading flags on daos_admin


Regards,
Ethan


Re: FIO Results & Running IO500

Peter
 

Thank you for the reply,

Yes, I have mounted the Optane modules as an ext4 filesystem; a quick FIO test is able to achieve > 1 MIOPs.

My thought was that it would be network related, however I've ran some MPI benchmarks and the scores line-up with EDR Infiniband.
Also, the 12.4 GB/s speed we get in the FIO Seq Read test shows we can get better than local-only performance.

The documentation mentions daos_test and daos_perf, are these still supported?


Re: FIO Results & Running IO500

JACKSON Adrian
 

Hi,

Have you tried benchmarking the hardware directly, rather than through
DAOS? i.e. running some benchmarks just on an ext4 filesystem mounted on
the Optane on a single node. Just to check that gives you expected
performance.

cheers

adrianj
The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. Is e buidheann carthannais a th’ ann an Oilthigh Dhùn Èideann, clàraichte an Alba, àireamh clàraidh SC005336.


Re: FIO Results & Running IO500

Peter
 

Hello again,

I've tried some more to improve these results, different DAOS versions (including the YUM repo), different MPI versions, DAOS configurations, etc.

I'm still unable to diagnose this issue, IOPS performance remains low for both FIO and IO500.

Does anyone have any input on how I can try to debug or resolve this issue?

Thanks for your help.


Creating a POSIX container first and opening it inside my code

wanl@...
 

Hi,

How to create a POSIX container first by calling "daos cont create" and then open it inside my code using "daos_cont_open()"? Should I pass some environment variables into my code?

Thanks,
Lipeng

 


Re: Questions about DFS

Chaarawi, Mohamad
 

Hi,

 

  1. Permissions checks when using the DFS API are only done on the pool and container ACLs during dfs_mount().
  2. The DAOS/DFS client does not cache any data on the client side. So if you lookup a full path every time you are going to do the path traversal for each lookup. In DAOS master, we have added a new API (dfs_sys) where a DFS mountpoint can be done with a caching  option to cache the looked up path of parent directories to avoid such overhead:
    1. https://github.com/daos-stack/daos/blob/master/src/include/daos_fs_sys.h#L49

 

Thanks,

Mohamad

 

From: daos@daos.groups.io <daos@daos.groups.io> on behalf of 段世博 <duanshibo.d@...>
Date: Saturday, July 3, 2021 at 3:44 AM
To: daos@daos.groups.io <daos@daos.groups.io>
Subject: [daos] Questions about DFS

Hi~, I have two questions about DFS metadata:

   1. When the client opens a file or directory, will it check the permissions for each directory on the path, or will it only check at dfs_mount()?

   2. Will the client cache the metadata of the directory to speed up path traversal? Or the metadata of this directory will be saved on the client after the directory is opened until it is closed?

 

thanks.


Questions about DFS

段世博
 

Hi~, I have two questions about DFS metadata:
   1. When the client opens a file or directory, will it check the permissions for each directory on the path, or will it only check at dfs_mount()?
   2. Will the client cache the metadata of the directory to speed up path traversal? Or the metadata of this directory will be saved on the client after the directory is opened until it is closed?
 
thanks.


Re: FIO Results & Running IO500

JACKSON Adrian
 

Actually, it's been pointed out to me I was confusing engines and
targets. So ignore me. :)

On 29/06/2021 10:09, JACKSON Adrian wrote:
It would be sensible to increase the number of engines per node. For our
system, where we have 48 cores per node, we're running 12 engines per
socket, 24 per node. This night be too many, but I think 1 engine per
node is too few.

cheers

adrianj

On 29/06/2021 08:32, Peter wrote:
This email was sent to you by someone outside the University.
You should only click on links or attachments if you are certain that
the email is genuine and the content is safe.
Johann, my configuration is as follows:


4 nodes, 4 engines in total (1 per node)
2 x 128 GB Optane per socket, 2 x sockets per node (DAOS is currently
only using 1 socket per node)
(We also have 1x 1.5 TB NVMe drive / node, that we plan to eventually
configure DAOS to use)
Nodes are using Mellanox EDR Infiniband
Cent OS 7.9, have tried various MPI distributions.

The yaml file (was generated via the auto conf)

***********
port: 10001
transport_config:
allow_insecure: true
server_name: server
client_cert_dir: /etc/daos/certs/clients
ca_cert: /etc/daos/certs/daosCA.crt
cert: /etc/daos/certs/server.crt
key: /etc/daos/certs/server.key
servers: []
engines:
- targets: 16
nr_xs_helpers: 3
first_core: 0
name: daos_server
socket_dir: /var/run/daos_server
log_file: /tmp/daos_engine.0.log
scm_mount: /mnt/daos0
scm_class: dcpm
scm_list:
- /dev/pmem0
bdev_class: nvme
provider: ofi+verbs;ofi_rxm
fabric_iface: ib0
fabric_iface_port: 31416
pinned_numa_node: 0
disable_vfio: false
disable_vmd: true
nr_hugepages: 0
set_hugepages: false
control_log_mask: INFO
control_log_file: /tmp/daos_server.log
helper_log_file: ""
firmware_helper_log_file: ""
recreate_superblocks: false
fault_path: ""
name: daos_server
socket_dir: /var/run/daos_server
provider: ofi+verbs;ofi_rxm
modules: ""
access_points:
- 172.23.7.3:10001
fault_cb: ""
hyperthreads: false
path: ../etc/daos_server.yml
*******

And yes, I am running FIO from one of the nodes.

Is there anything you see that I should modify or investigate? Thank you
very much for the help!
--
Tel: +44 131 6506470 skype: remoteadrianj
The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. Is e buidheann carthannais a th’ ann an Oilthigh Dhùn Èideann, clàraichte an Alba, àireamh clàraidh SC005336.




--
Tel: +44 131 6506470 skype: remoteadrianj


Re: FIO Results & Running IO500

JACKSON Adrian
 

It would be sensible to increase the number of engines per node. For our
system, where we have 48 cores per node, we're running 12 engines per
socket, 24 per node. This night be too many, but I think 1 engine per
node is too few.

cheers

adrianj

On 29/06/2021 08:32, Peter wrote:
This email was sent to you by someone outside the University.
You should only click on links or attachments if you are certain that
the email is genuine and the content is safe.
Johann, my configuration is as follows:


4 nodes, 4 engines in total (1 per node)
2 x 128 GB Optane per socket, 2 x sockets per node (DAOS is currently
only using 1 socket per node)
(We also have 1x 1.5 TB NVMe drive / node, that we plan to eventually
configure DAOS to use)
Nodes are using Mellanox EDR Infiniband
Cent OS 7.9, have tried various MPI distributions.

The yaml file (was generated via the auto conf)

***********
port: 10001
transport_config:
allow_insecure: true
server_name: server
client_cert_dir: /etc/daos/certs/clients
ca_cert: /etc/daos/certs/daosCA.crt
cert: /etc/daos/certs/server.crt
key: /etc/daos/certs/server.key
servers: []
engines:
- targets: 16
nr_xs_helpers: 3
first_core: 0
name: daos_server
socket_dir: /var/run/daos_server
log_file: /tmp/daos_engine.0.log
scm_mount: /mnt/daos0
scm_class: dcpm
scm_list:
- /dev/pmem0
bdev_class: nvme
provider: ofi+verbs;ofi_rxm
fabric_iface: ib0
fabric_iface_port: 31416
pinned_numa_node: 0
disable_vfio: false
disable_vmd: true
nr_hugepages: 0
set_hugepages: false
control_log_mask: INFO
control_log_file: /tmp/daos_server.log
helper_log_file: ""
firmware_helper_log_file: ""
recreate_superblocks: false
fault_path: ""
name: daos_server
socket_dir: /var/run/daos_server
provider: ofi+verbs;ofi_rxm
modules: ""
access_points:
- 172.23.7.3:10001
fault_cb: ""
hyperthreads: false
path: ../etc/daos_server.yml
*******

And yes, I am running FIO from one of the nodes.

Is there anything you see that I should modify or investigate? Thank you
very much for the help!
--
Tel: +44 131 6506470 skype: remoteadrianj
The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. Is e buidheann carthannais a th’ ann an Oilthigh Dhùn Èideann, clàraichte an Alba, àireamh clàraidh SC005336.


Re: FIO Results & Running IO500

Peter
 

Johann, my configuration is as follows:


4 nodes, 4 engines in total (1 per node)
2 x 128 GB Optane per socket, 2 x sockets per node (DAOS is currently only using 1 socket per node)
(We also have 1x 1.5 TB NVMe drive / node, that we plan to eventually configure DAOS to use)
Nodes are using Mellanox EDR Infiniband
Cent OS 7.9, have tried various MPI distributions.

The yaml file (was generated via the auto conf)

***********
port: 10001
transport_config:
  allow_insecure: true
  server_name: server
  client_cert_dir: /etc/daos/certs/clients
  ca_cert: /etc/daos/certs/daosCA.crt
  cert: /etc/daos/certs/server.crt
  key: /etc/daos/certs/server.key
servers: []
engines:
- targets: 16
  nr_xs_helpers: 3
  first_core: 0
  name: daos_server
  socket_dir: /var/run/daos_server
  log_file: /tmp/daos_engine.0.log
  scm_mount: /mnt/daos0
  scm_class: dcpm
  scm_list:
  - /dev/pmem0
  bdev_class: nvme
  provider: ofi+verbs;ofi_rxm
  fabric_iface: ib0
  fabric_iface_port: 31416
  pinned_numa_node: 0
disable_vfio: false
disable_vmd: true
nr_hugepages: 0
set_hugepages: false
control_log_mask: INFO
control_log_file: /tmp/daos_server.log
helper_log_file: ""
firmware_helper_log_file: ""
recreate_superblocks: false
fault_path: ""
name: daos_server
socket_dir: /var/run/daos_server
provider: ofi+verbs;ofi_rxm
modules: ""
access_points:
- 172.23.7.3:10001
fault_cb: ""
hyperthreads: false
path: ../etc/daos_server.yml
*******

And yes, I am running FIO from one of the nodes.

Is there anything you see that I should modify or investigate? Thank you very much for the help!


Re: DAOS on AWS?

Lombardi, Johann
 

Hi there,

 

We do run DAOS on a regular basis on Google cloud (GCP), but I am not aware of anyone who tried it on AWS yet. If that is something that you want to pursue, we would be happy to learn about your experience with DAOS on AWS on this mailing list and help you as much as we can.

 

Cheers,
Johann

 

From: <daos@daos.groups.io> on behalf of "bpetterson_08@..." <bpetterson_08@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Monday 21 June 2021 at 16:49
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: [daos] DAOS on AWS?

 

Good Day,

New to DAOS and this group but didn't see a "rules and regulations section" so I hope this is okay. I have been reading all the docs and while everything I am reading looks straight forward i'd like to do a POC to wrap my head around the content.
Looking for some feedback on a theoretical test of DAOS on AWS? I am aware that they have dedicated NVMe volumes that can be setup on EC2 instances.
Has anyone done something like this? 
If so, a follow up question would be were there any gotchas?

I am looking to run a test implementation of DAOS to vet out some possible HPC/BigData implementations but don't really have the investment for a physical hardware test.

Thanks,

---------------------------------------------------------------------
Intel Corporation SAS (French simplified joint stock company)
Registered headquarters: "Les Montalets"- 2, rue de Paris,
92196 Meudon Cedex, France
Registration Number:  302 456 199 R.C.S. NANTERRE
Capital: 4,572,000 Euros

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


Re: FIO Results & Running IO500

Lombardi, Johann
 

Hi there,

 

The fio numbers look indeed pretty low. Could you please tell us more about the configuration? It sounds like you have Optane pmem on all the nodes, right? How many DIMMs per node? How many engines do you run totally? Are you running fio from a node that is also a DAOS server? Could you please also share your yaml config file?

 

Cheers,

Johann

 

From: <daos@daos.groups.io> on behalf of "Harms, Kevin via groups.io" <harms@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Wednesday 23 June 2021 at 17:04
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: Re: [daos] FIO Results & Running IO500

 

 

  I'm not sure about what to expect from your nodes, but for IO-500:

 

  the first part of complaints are for the runtime being too short. You need to adjust the parameters to make the run longer.

  the second part for MPI_Comm_split_type failing is complaining about the arguments... Can you try with an MPICH derivative? Maybe there is some issue between OpenMPI and MPICH with regard to valid split_type values.

 

kevin

 

________________________________________

Sent: Tuesday, June 22, 2021 9:40 PM

Subject: [daos] FIO Results & Running IO500

 

Hello all!

 

I have a cluster of 4 DAOS nodes. The nodes use CentOS 7.9, Optane SCM (no SSD), and are connected over EB Infiniband.

These nodes are able to run FIO as show here: https://daos-stack.github.io/admin/performance_tuning/#fio

The scores I am able to achieve running /examples/dfs.fio are:

 

Seq Read        12.4 GB/s       283 us / 21408 us (latency min/average)

Seq Write       4.0 GB/s        673 us / 66585 us

(latency min/average)

Random Read     187 KIOPS       83 us / 1335 us

(latency min/average)

Random Write    180 KIOPs       93 us / 1409 us

(latency min/average)

 

Are these numbers reasonable? The random scores seem low. I'm not 100% sure about my recorded latency numbers, but they also seem slow (for Optane) but perhaps this is due to various DFUSE or other overheads.

 

 

IO500 runs, with the following output: (I'm not concerned about the stonewall time errors for the moment)

 

IO500 version io500-isc21 (standard)

ERROR INVALID (src/phase_ior.c:24) Write phase needed 103.465106s instead of stonewall 300s. Stonewall was hit at 103.5s

ERROR INVALID (src/main.c:396) Runtime of phase (104.060211) is below stonewall time. This shouldn't happen!

ERROR INVALID (src/main.c:402) Runtime is smaller than expected minimum runtime

[RESULT]       ior-easy-write        3.092830 GiB/s : time 104.060 seconds [INVALID]

ERROR INVALID (src/main.c:396) Runtime of phase (2.191084) is below stonewall time. This shouldn't happen!

ERROR INVALID (src/main.c:402) Runtime is smaller than expected minimum runtime

[RESULT]    mdtest-easy-write      178.031067 kIOPS : time 2.191 seconds [INVALID]

[      ]            timestamp        0.000000 kIOPS : time 0.003 seconds

ERROR INVALID (src/phase_ior.c:24) Write phase needed 6.626582s instead of stonewall 300s. Stonewall was hit at 6.3s

ERROR INVALID (src/main.c:396) Runtime of phase (6.666027) is below stonewall time. This shouldn't happen!

ERROR INVALID (src/main.c:402) Runtime is smaller than expected minimum runtime

[RESULT]       ior-hard-write        2.114133 GiB/s : time 6.666 seconds [INVALID]

ERROR INVALID (src/main.c:396) Runtime of phase (5.672756) is below stonewall time. This shouldn't happen!

ERROR INVALID (src/main.c:402) Runtime is smaller than expected minimum runtime

[RESULT]    mdtest-hard-write       59.615140 kIOPS : time 5.673 seconds [INVALID]

 

[swat7-02:06130] *** An error occurred in MPI_Comm_split_type

[swat7-02:06130] *** reported by process [1960837121,9]

[swat7-02:06130] *** on communicator MPI_COMM_WORLD

[swat7-02:06130] *** MPI_ERR_ARG: invalid argument of some other kind

[swat7-02:06130] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,

[swat7-02:06130] ***    and potentially your MPI job)

....(repeated)

And IO500 terminates. This is using openmpi 4, using openmpi 3.1.6 IO500 simply hangs at the same spot.

 

Would anyone have insight into what is going on here, and how I can fix it?

 

Thank you for your help.

 

 

 

 

 

 

 

---------------------------------------------------------------------
Intel Corporation SAS (French simplified joint stock company)
Registered headquarters: "Les Montalets"- 2, rue de Paris,
92196 Meudon Cedex, France
Registration Number:  302 456 199 R.C.S. NANTERRE
Capital: 4,572,000 Euros

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


Re: FIO Results & Running IO500

Harms, Kevin
 

I'm not sure about what to expect from your nodes, but for IO-500:

the first part of complaints are for the runtime being too short. You need to adjust the parameters to make the run longer.
the second part for MPI_Comm_split_type failing is complaining about the arguments... Can you try with an MPICH derivative? Maybe there is some issue between OpenMPI and MPICH with regard to valid split_type values.

kevin

________________________________________
From: daos@daos.groups.io <daos@daos.groups.io> on behalf of Peter <magpiesaresoawesome@gmail.com>
Sent: Tuesday, June 22, 2021 9:40 PM
To: daos@daos.groups.io
Subject: [daos] FIO Results & Running IO500

Hello all!

I have a cluster of 4 DAOS nodes. The nodes use CentOS 7.9, Optane SCM (no SSD), and are connected over EB Infiniband.
These nodes are able to run FIO as show here: https://daos-stack.github.io/admin/performance_tuning/#fio
The scores I am able to achieve running /examples/dfs.fio are:

Seq Read 12.4 GB/s 283 us / 21408 us (latency min/average)
Seq Write 4.0 GB/s 673 us / 66585 us
(latency min/average)
Random Read 187 KIOPS 83 us / 1335 us
(latency min/average)
Random Write 180 KIOPs 93 us / 1409 us
(latency min/average)

Are these numbers reasonable? The random scores seem low. I'm not 100% sure about my recorded latency numbers, but they also seem slow (for Optane) but perhaps this is due to various DFUSE or other overheads.


I have since attempted to run IO-500, configured according to: https://wiki.hpdd.intel.com/display/DC/IO-500+ISC21<https://wiki.hpdd.intel.com/display/DC/IO-500+ISC21#IO500ISC21-Pre-requisites>
IO500 runs, with the following output: (I'm not concerned about the stonewall time errors for the moment)

IO500 version io500-isc21 (standard)
ERROR INVALID (src/phase_ior.c:24) Write phase needed 103.465106s instead of stonewall 300s. Stonewall was hit at 103.5s
ERROR INVALID (src/main.c:396) Runtime of phase (104.060211) is below stonewall time. This shouldn't happen!
ERROR INVALID (src/main.c:402) Runtime is smaller than expected minimum runtime
[RESULT] ior-easy-write 3.092830 GiB/s : time 104.060 seconds [INVALID]
ERROR INVALID (src/main.c:396) Runtime of phase (2.191084) is below stonewall time. This shouldn't happen!
ERROR INVALID (src/main.c:402) Runtime is smaller than expected minimum runtime
[RESULT] mdtest-easy-write 178.031067 kIOPS : time 2.191 seconds [INVALID]
[ ] timestamp 0.000000 kIOPS : time 0.003 seconds
ERROR INVALID (src/phase_ior.c:24) Write phase needed 6.626582s instead of stonewall 300s. Stonewall was hit at 6.3s
ERROR INVALID (src/main.c:396) Runtime of phase (6.666027) is below stonewall time. This shouldn't happen!
ERROR INVALID (src/main.c:402) Runtime is smaller than expected minimum runtime
[RESULT] ior-hard-write 2.114133 GiB/s : time 6.666 seconds [INVALID]
ERROR INVALID (src/main.c:396) Runtime of phase (5.672756) is below stonewall time. This shouldn't happen!
ERROR INVALID (src/main.c:402) Runtime is smaller than expected minimum runtime
[RESULT] mdtest-hard-write 59.615140 kIOPS : time 5.673 seconds [INVALID]

[swat7-02:06130] *** An error occurred in MPI_Comm_split_type
[swat7-02:06130] *** reported by process [1960837121,9]
[swat7-02:06130] *** on communicator MPI_COMM_WORLD
[swat7-02:06130] *** MPI_ERR_ARG: invalid argument of some other kind
[swat7-02:06130] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[swat7-02:06130] *** and potentially your MPI job)
....(repeated)
And IO500 terminates. This is using openmpi 4, using openmpi 3.1.6 IO500 simply hangs at the same spot.

Would anyone have insight into what is going on here, and how I can fix it?

Thank you for your help.


FIO Results & Running IO500

Peter
 

Hello all!

I have a cluster of 4 DAOS nodes. The nodes use CentOS 7.9, Optane SCM (no SSD), and are connected over EB Infiniband.
These nodes are able to run FIO as show here: https://daos-stack.github.io/admin/performance_tuning/#fio
The scores I am able to achieve running /examples/dfs.fio are:

Seq Read 12.4 GB/s 283 us / 21408 us (latency min/average)
Seq Write 4.0 GB/s 673 us / 66585 us
(latency min/average)
Random Read 187 KIOPS 83 us / 1335 us
(latency min/average)
Random Write 180 KIOPs 93 us / 1409 us
(latency min/average)


Are these numbers reasonable? The random scores seem low. I'm not 100% sure about my recorded latency numbers, but they also seem slow (for Optane) but perhaps this is due to various DFUSE or other overheads.


I have since attempted to run IO-500, configured according to: https://wiki.hpdd.intel.com/display/DC/IO-500+ISC21
IO500 runs, with the following output: (I'm not concerned about the stonewall time errors for the moment)

IO500 version io500-isc21 (standard)
ERROR INVALID (src/phase_ior.c:24) Write phase needed 103.465106s instead of stonewall 300s. Stonewall was hit at 103.5s
ERROR INVALID (src/main.c:396) Runtime of phase (104.060211) is below stonewall time. This shouldn't happen!
ERROR INVALID (src/main.c:402) Runtime is smaller than expected minimum runtime
[RESULT]       ior-easy-write        3.092830 GiB/s : time 104.060 seconds [INVALID]
ERROR INVALID (src/main.c:396) Runtime of phase (2.191084) is below stonewall time. This shouldn't happen!
ERROR INVALID (src/main.c:402) Runtime is smaller than expected minimum runtime
[RESULT]    mdtest-easy-write      178.031067 kIOPS : time 2.191 seconds [INVALID]
[      ]            timestamp        0.000000 kIOPS : time 0.003 seconds
ERROR INVALID (src/phase_ior.c:24) Write phase needed 6.626582s instead of stonewall 300s. Stonewall was hit at 6.3s
ERROR INVALID (src/main.c:396) Runtime of phase (6.666027) is below stonewall time. This shouldn't happen!
ERROR INVALID (src/main.c:402) Runtime is smaller than expected minimum runtime
[RESULT]       ior-hard-write        2.114133 GiB/s : time 6.666 seconds [INVALID]
ERROR INVALID (src/main.c:396) Runtime of phase (5.672756) is below stonewall time. This shouldn't happen!
ERROR INVALID (src/main.c:402) Runtime is smaller than expected minimum runtime
[RESULT]    mdtest-hard-write       59.615140 kIOPS : time 5.673 seconds [INVALID]

[swat7-02:06130] *** An error occurred in MPI_Comm_split_type
[swat7-02:06130] *** reported by process [1960837121,9]
[swat7-02:06130] *** on communicator MPI_COMM_WORLD
[swat7-02:06130] *** MPI_ERR_ARG: invalid argument of some other kind
[swat7-02:06130] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[swat7-02:06130] ***    and potentially your MPI job)
....(repeated)
And IO500 terminates. This is using openmpi 4, using openmpi 3.1.6 IO500 simply hangs at the same spot.

Would anyone have insight into what is going on here, and how I can fix it?

Thank you for your help.


DAOS on AWS?

bpetterson_08@...
 

Good Day,

New to DAOS and this group but didn't see a "rules and regulations section" so I hope this is okay. I have been reading all the docs and while everything I am reading looks straight forward i'd like to do a POC to wrap my head around the content.
Looking for some feedback on a theoretical test of DAOS on AWS? I am aware that they have dedicated NVMe volumes that can be setup on EC2 instances.
Has anyone done something like this? 
If so, a follow up question would be were there any gotchas?

I am looking to run a test implementation of DAOS to vet out some possible HPC/BigData implementations but don't really have the investment for a physical hardware test.

Thanks,


Re: Trying to solve "DAOS rank exited unexpectedly"

Lombardi, Johann
 

Hi there,

 

Any errors in the engine logs?

 

Cheers,

Johann

 

From: <daos@daos.groups.io> on behalf of Peter <magpiesaresoawesome@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Monday 31 May 2021 at 08:42
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: [daos] Trying to solve "DAOS rank exited unexpectedly"

 

[Edited Message Follows]

Hello all,

I've been experiencing problems trying to run DAOS 1.2. For the moment, I'm just trying to run a single server, and DAOS runs correctly when using Ethernet. However, when switching to an Infiniband configuration, I am met with the error attached below ("/tmp/daos_engine.0.log" remains empty).

Any help you can provide would be much appreciated, and please let me know if there is any more info I can provide. This is on Ubuntu 20.04, using the latest MLNX drivers (MLNX_OFED_LINUX-5.3-1.0.0.1).

------------------------- OUTPUT -------------------------
copoka@swat7-02:~$ daos_server start -i -o daos_server_ib.yml

DAOS Server config loaded from /home/copoka/daos_server_ib.yml
daos_server logging to file /tmp/daos_server.log
DEBUG 15:25:33.517796 start.go:89: Switching control log level to DEBUG
DEBUG 15:25:33.579954 netdetect.go:279: 2 NUMA nodes detected with 20 cores per node
DEBUG 15:25:33.580176 netdetect.go:284: initDeviceScan completed.  Depth -6, numObj 8, systemDeviceNames [lo eno1 eno2 ib0], hwlocDeviceNames [sda card0 ib0 mlx5_0 eno1 eno2 pmem0 pmem1]
DEBUG 15:25:33.580206 netdetect.go:913: Calling ValidateProviderConfig with ib0, ofi+verbs;ofi_rxm
DEBUG 15:25:33.580223 netdetect.go:964: Input provider string: ofi+verbs;ofi_rxm
DEBUG 15:25:33.580363 netdetect.go:995: There are 0 hfi1 devices in the system
DEBUG 15:25:33.580392 netdetect.go:572: There are 2 NUMA nodes.
DEBUG 15:25:33.580413 netdetect.go:928: Device ib0 supports provider: ofi+verbs;ofi_rxm
DEBUG 15:25:33.580440 netdetect.go:1073: Validate network config for numaNode: 0
DEBUG 15:25:33.580460 netdetect.go:572: There are 2 NUMA nodes.
DEBUG 15:25:33.580479 netdetect.go:1102: The NUMA node for device ib0 matches the provided value 0.  Network configuration is valid.
DEBUG 15:25:33.581049 server.go:401: Active config saved to /home/copoka/.daos_server.active.yml (read-only)
DEBUG 15:25:33.581078 server.go:113: fault domain: /swat7-02
DEBUG 15:25:33.581279 server.go:163: automatic NVMe prepare req: {ForwardableRequest:{Forwarded:false} HugePageCount:128 DisableCleanHugePages:false PCIWhitelist: PCIBlacklist: TargetUser:copoka ResetOnly:false DisableVFIO:false DisableVMD:true}
DEBUG 15:25:40.008657 database.go:277: set db replica addr: 172.23.7.2:10001
DEBUG 15:25:40.115490 netdetect.go:279: 2 NUMA nodes detected with 20 cores per node
DEBUG 15:25:40.115717 netdetect.go:284: initDeviceScan completed.  Depth -6, numObj 8, systemDeviceNames [lo eno1 eno2 ib0], hwlocDeviceNames [sda card0 ib0 mlx5_0 eno1 eno2 pmem0 pmem1]
DEBUG 15:25:40.115786 netdetect.go:669: Searching for a device alias for: ib0
DEBUG 15:25:40.115824 netdetect.go:693: Device alias for ib0 is mlx5_0
DEBUG 15:25:40.115874 class.go:196: spdk nvme: bdev_list empty in config, no nvme.conf generated for server
DEBUG 15:25:40.647130 provider.go:217: bdev scan: update cache (4 devices)
DAOS Control Server v1.2 (pid 186967) listening on 0.0.0.0:10001
DEBUG 15:25:40.647841 instance_exec.go:35: instance 0: checking if storage is formatted
Checking DAOS I/O Engine instance 0 storage ...
DEBUG 15:25:40.647958 instance_storage.go:74: /mnt/daos0: checking formatting
DEBUG 15:25:41.090347 instance_storage.go:90: /mnt/daos0 (dcpm) needs format: false
DEBUG 15:25:41.090374 instance_storage.go:121: instance 0: no SCM format required; checking for superblock
DEBUG 15:25:41.090392 instance_superblock.go:90: /mnt/daos0: checking superblock
DEBUG 15:25:41.090576 instance_storage.go:127: instance 0: superblock not needed
DEBUG 15:25:41.090601 database.go:372: system db start: isReplica: true, isBootstrap: true
DEBUG 15:25:41.090932 api.go:556: initial configuration: index=1 servers=[%+v [{Suffrage:Voter ID:75.12.36.64:10001 Address:75.12.36.64:10001}]]
DEBUG 15:25:41.090961 raft.go:173: isBootstrap: true, newDB: false
DEBUG 15:25:41.090977 instance_exec.go:62: instance start()
DEBUG 15:25:41.090990 class.go:223: skip bdev conf file generation as no path set
SCM @ /mnt/daos0: 265 GB Total/260 GB Avail
DEBUG 15:25:41.091098 instance_exec.go:79: instance 0: awaiting DAOS I/O Engine init
DEBUG 15:25:41.091102 raft.go:152: entering follower state: follower=Node at 172.23.7.2:10001 [Follower] leader=
DEBUG 15:25:41.091142 exec.go:69: daos_engine:0 args: [-t 16 -x 3 -g daos_server -d /var/run/daos_server -s /mnt/daos0 -p 0 -I 0]
DEBUG 15:25:41.091174 exec.go:70: daos_engine:0 env: [OFI_PORT=31416 CRT_CTX_SHARE_ADDR=0 CRT_TIMEOUT=0 OFI_DOMAIN=mlx5_0 D_LOG_FILE=/tmp/daos_engine.0.log CRT_PHY_ADDR_STR=ofi+verbs;ofi_rxm OFI_INTERFACE=ib0]
Starting I/O Engine instance 0: /home/copoka/daos/install/bin/daos_engine
daos_engine:0 Using NUMA core allocation algorithm
DEBUG 15:25:43.855176 instance_drpc.go:49: DAOS I/O Engine instance 0 drpc ready: uri:"ofi+verbs;ofi_rxm://172.23.7.2:31416" nctxs:20 drpcListenerSock:"/var/run/daos_server/daos_engine_188279.sock" ntgts:16
DEBUG 15:25:43.856148 system.go:145: DAOS system join request: sys:"daos_server" uuid:"c30f4940-f257-4a76-b1d0-cb74c3a29077" uri:"ofi+verbs;ofi_rxm://172.23.7.2:31416" nctxs:20 addr:"0.0.0.0:10001" srvFaultDomain:"/swat7-02"
DEBUG 15:25:43.856324 rpc.go:196: request hosts: [172.23.7.2:10001]
DEBUG 15:25:43.859492 rpc.go:392: MS request error: not the DAOS Management Service leader (try  or one of ); retrying after 0s
DEBUG 15:25:43.859708 rpc.go:196: request hosts: [172.23.7.2:10001]
DEBUG 15:25:43.861207 rpc.go:392: MS request error: not the DAOS Management Service leader (try  or one of ); retrying after 250ms
DEBUG 15:25:44.086829 raft.go:214: heartbeat timeout reached, starting election: last-leader=
DEBUG 15:25:44.086907 raft.go:250: entering candidate state: node=Node at 172.23.7.2:10001 [Candidate] term=27
DEBUG 15:25:44.087048 raft.go:268: votes: needed=1
DEBUG 15:25:44.087144 raft.go:220: RequestVote(75.12.36.64:10001, 75.12.36.64:10001) req: &{RPCHeader:{ProtocolVersion:3} Term:27 Candidate:[49 55 50 46 50 51 46 55 46 50 58 49 48 48 48 49] LastLogIndex:20 LastLogTerm:25 LeadershipTransfer:false}
DEBUG 15:25:44.090843 raft.go:287: vote granted: from=75.12.36.64:10001 term=27 tally=1
DEBUG 15:25:44.090915 raft.go:292: election won: tally=1
DEBUG 15:25:44.090975 raft.go:363: entering leader state: leader=Node at 172.23.7.2:10001 [Leader]
DEBUG 15:25:44.091044 raft.go:474: added peer, starting replication: peer=75.12.36.64:10001
DEBUG 15:25:44.091154 database.go:458: node 172.23.7.2:10001 gained MS leader state
MS leader running on swat7-02
DEBUG 15:25:44.091234 mgmt_system.go:160: starting joinLoop
DEBUG 15:25:44.091827 raft.go:152: entering follower state: follower=Node at 172.23.7.2:10001 [Follower] leader=
DEBUG 15:25:44.091899 database.go:449: node 172.23.7.2:10001 lost MS leader state
MS leader no longer running on swat7-02
DEBUG 15:25:44.091986 mgmt_system.go:170: stopped joinLoop
DEBUG 15:25:44.111729 rpc.go:196: request hosts: [172.23.7.2:10001]
DEBUG 15:25:44.113210 rpc.go:392: MS request error: not the DAOS Management Service leader (try  or one of ); retrying after 2.75s
DEBUG 15:25:46.863692 rpc.go:196: request hosts: [172.23.7.2:10001]
DEBUG 15:25:46.865213 rpc.go:392: MS request error: not the DAOS Management Service leader (try  or one of ); retrying after 1s
DEBUG 15:25:47.027134 raft.go:214: heartbeat timeout reached, starting election: last-leader=
DEBUG 15:25:47.027205 raft.go:250: entering candidate state: node=Node at 172.23.7.2:10001 [Candidate] term=28
DEBUG 15:25:47.027329 raft.go:268: votes: needed=1
DEBUG 15:25:47.027411 raft.go:220: RequestVote(75.12.36.64:10001, 75.12.36.64:10001) req: &{RPCHeader:{ProtocolVersion:3} Term:28 Candidate:[49 55 50 46 50 51 46 55 46 50 58 49 48 48 48 49] LastLogIndex:21 LastLogTerm:27 LeadershipTransfer:false}
DEBUG 15:25:47.028143 raft.go:287: vote granted: from=75.12.36.64:10001 term=28 tally=1
DEBUG 15:25:47.028209 raft.go:292: election won: tally=1
DEBUG 15:25:47.028254 raft.go:363: entering leader state: leader=Node at 172.23.7.2:10001 [Leader]
DEBUG 15:25:47.865660 rpc.go:196: request hosts: [172.23.7.2:10001]
DEBUG 15:25:47.866773 mgmt_system.go:353: MgmtSvc.Join dispatch, req:&mgmt.JoinReq{Sys:"daos_server", Uuid:"c30f4940-f257-4a76-b1d0-cb74c3a29077", Rank:0x0, Uri:"ofi+verbs;ofi_rxm://172.23.7.2:31416", Nctxs:0x14, Addr:"0.0.0.0:10001", SrvFaultDomain:"/swat7-02", Idx:0x0, XXX_NoUnkeyedLiteral:struct {}{}, XXX_unrecognized:[]uint8(nil), XXX_sizecache:0}
instance 0 exited: context deadline exceeded
DEBUG 15:25:57.865829 rpc.go:223: parent context canceled -- tearing down client invoker
&&& RAS EVENT id: [engine_status_down] ts: [2021-05-31T15:25:57.865841+0900] host: [swat7-02] type: [STATE_CHANGE] sev: [ERROR] msg: [DAOS rank exited unexpectedly] pid: [186967] rank: [0]
DEBUG 15:25:57.865912 system.go:246: forwarding engine_status_down event to MS access points [172.23.7.2:10001] (seq: 1)
DEBUG 15:25:57.866901 system.go:208: DAOS cluster event request: sequence:1 event:<id:2 msg:"DAOS rank exited unexpectedly" timestamp:"2021-05-31T15:25:57.865841+0900" type:1 severity:3 hostname:"swat7-02" proc_id:186967 rank_state_info:<errored:true error:"context deadline exceeded" > >
DEBUG 15:25:57.867166 rpc.go:196: request hosts: [172.23.7.2:10001]
DEBUG 15:25:57.868219 system.go:237: forwarding disabled for engine_status_down event

---------------------------------------------------------------------
Intel Corporation SAS (French simplified joint stock company)
Registered headquarters: "Les Montalets"- 2, rue de Paris,
92196 Meudon Cedex, France
Registration Number:  302 456 199 R.C.S. NANTERRE
Capital: 4,572,000 Euros

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


New object class selection API

Chaarawi, Mohamad
 

Hi All,

 

I would like to highlight a new API we added in 1.2 in regards to object class selection when generating OIDs for objects:

https://github.com/daos-stack/daos/blob/master/src/include/daos_obj.h#L404

 

We understand that our object class list is extremely long and not very intuitive for users. So we made an effort to try and select the best object class ourselves based on few things:

  • Container properties – for now the redundancy factor is taken into consideration here. So if you choose RF=1 when creating the container, the auto class generation will make sure to select an oclass that supports a RF factor of 1 (2-way replication for example)
  • The object type – based on the feature flag (flat KV, array, or generic). For example for arrays we choose an EC based class if redundancy is required, but for KV objects we choose replication.
  • Hints from user (redundancy level (auto – cont property, no redundancy, replication, EC) and sharding level (default, tiny, regular, .. max):

 

Of course the option for users to select the object class themselves is still supported with the API, and the auto oclass selection is triggered only when the provided oclass is unkown / 0.

 

Please note that the old API is deprecated:

https://github.com/daos-stack/daos/blob/master/src/include/daos_obj.h#L328

and I would like to invite everyone working on middleware on top of DAOS to change to using the new API.

DFS (POSIX) has already been migrated but that change did not make it into 1.2 series, but is already in master.

 

Please let us know if you have questions or need clarifications on this new API.

 

Thanks,

Mohamad

1 - 20 of 1420