Re: FIO Results & Running IO500


Lombardi, Johann
 

Hi there,

 

The fio numbers look indeed pretty low. Could you please tell us more about the configuration? It sounds like you have Optane pmem on all the nodes, right? How many DIMMs per node? How many engines do you run totally? Are you running fio from a node that is also a DAOS server? Could you please also share your yaml config file?

 

Cheers,

Johann

 

From: <daos@daos.groups.io> on behalf of "Harms, Kevin via groups.io" <harms@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Wednesday 23 June 2021 at 17:04
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: Re: [daos] FIO Results & Running IO500

 

 

  I'm not sure about what to expect from your nodes, but for IO-500:

 

  the first part of complaints are for the runtime being too short. You need to adjust the parameters to make the run longer.

  the second part for MPI_Comm_split_type failing is complaining about the arguments... Can you try with an MPICH derivative? Maybe there is some issue between OpenMPI and MPICH with regard to valid split_type values.

 

kevin

 

________________________________________

Sent: Tuesday, June 22, 2021 9:40 PM

Subject: [daos] FIO Results & Running IO500

 

Hello all!

 

I have a cluster of 4 DAOS nodes. The nodes use CentOS 7.9, Optane SCM (no SSD), and are connected over EB Infiniband.

These nodes are able to run FIO as show here: https://daos-stack.github.io/admin/performance_tuning/#fio

The scores I am able to achieve running /examples/dfs.fio are:

 

Seq Read        12.4 GB/s       283 us / 21408 us (latency min/average)

Seq Write       4.0 GB/s        673 us / 66585 us

(latency min/average)

Random Read     187 KIOPS       83 us / 1335 us

(latency min/average)

Random Write    180 KIOPs       93 us / 1409 us

(latency min/average)

 

Are these numbers reasonable? The random scores seem low. I'm not 100% sure about my recorded latency numbers, but they also seem slow (for Optane) but perhaps this is due to various DFUSE or other overheads.

 

 

IO500 runs, with the following output: (I'm not concerned about the stonewall time errors for the moment)

 

IO500 version io500-isc21 (standard)

ERROR INVALID (src/phase_ior.c:24) Write phase needed 103.465106s instead of stonewall 300s. Stonewall was hit at 103.5s

ERROR INVALID (src/main.c:396) Runtime of phase (104.060211) is below stonewall time. This shouldn't happen!

ERROR INVALID (src/main.c:402) Runtime is smaller than expected minimum runtime

[RESULT]       ior-easy-write        3.092830 GiB/s : time 104.060 seconds [INVALID]

ERROR INVALID (src/main.c:396) Runtime of phase (2.191084) is below stonewall time. This shouldn't happen!

ERROR INVALID (src/main.c:402) Runtime is smaller than expected minimum runtime

[RESULT]    mdtest-easy-write      178.031067 kIOPS : time 2.191 seconds [INVALID]

[      ]            timestamp        0.000000 kIOPS : time 0.003 seconds

ERROR INVALID (src/phase_ior.c:24) Write phase needed 6.626582s instead of stonewall 300s. Stonewall was hit at 6.3s

ERROR INVALID (src/main.c:396) Runtime of phase (6.666027) is below stonewall time. This shouldn't happen!

ERROR INVALID (src/main.c:402) Runtime is smaller than expected minimum runtime

[RESULT]       ior-hard-write        2.114133 GiB/s : time 6.666 seconds [INVALID]

ERROR INVALID (src/main.c:396) Runtime of phase (5.672756) is below stonewall time. This shouldn't happen!

ERROR INVALID (src/main.c:402) Runtime is smaller than expected minimum runtime

[RESULT]    mdtest-hard-write       59.615140 kIOPS : time 5.673 seconds [INVALID]

 

[swat7-02:06130] *** An error occurred in MPI_Comm_split_type

[swat7-02:06130] *** reported by process [1960837121,9]

[swat7-02:06130] *** on communicator MPI_COMM_WORLD

[swat7-02:06130] *** MPI_ERR_ARG: invalid argument of some other kind

[swat7-02:06130] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,

[swat7-02:06130] ***    and potentially your MPI job)

....(repeated)

And IO500 terminates. This is using openmpi 4, using openmpi 3.1.6 IO500 simply hangs at the same spot.

 

Would anyone have insight into what is going on here, and how I can fix it?

 

Thank you for your help.

 

 

 

 

 

 

 

---------------------------------------------------------------------
Intel Corporation SAS (French simplified joint stock company)
Registered headquarters: "Les Montalets"- 2, rue de Paris,
92196 Meudon Cedex, France
Registration Number:  302 456 199 R.C.S. NANTERRE
Capital: 4,572,000 Euros

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

Join daos@daos.groups.io to automatically receive all group messages.