FIO Results & Running IO500


Peter
 

Hello all!

I have a cluster of 4 DAOS nodes. The nodes use CentOS 7.9, Optane SCM (no SSD), and are connected over EB Infiniband.
These nodes are able to run FIO as show here: https://daos-stack.github.io/admin/performance_tuning/#fio
The scores I am able to achieve running /examples/dfs.fio are:

Seq Read 12.4 GB/s 283 us / 21408 us (latency min/average)
Seq Write 4.0 GB/s 673 us / 66585 us
(latency min/average)
Random Read 187 KIOPS 83 us / 1335 us
(latency min/average)
Random Write 180 KIOPs 93 us / 1409 us
(latency min/average)


Are these numbers reasonable? The random scores seem low. I'm not 100% sure about my recorded latency numbers, but they also seem slow (for Optane) but perhaps this is due to various DFUSE or other overheads.


I have since attempted to run IO-500, configured according to: https://wiki.hpdd.intel.com/display/DC/IO-500+ISC21
IO500 runs, with the following output: (I'm not concerned about the stonewall time errors for the moment)

IO500 version io500-isc21 (standard)
ERROR INVALID (src/phase_ior.c:24) Write phase needed 103.465106s instead of stonewall 300s. Stonewall was hit at 103.5s
ERROR INVALID (src/main.c:396) Runtime of phase (104.060211) is below stonewall time. This shouldn't happen!
ERROR INVALID (src/main.c:402) Runtime is smaller than expected minimum runtime
[RESULT]       ior-easy-write        3.092830 GiB/s : time 104.060 seconds [INVALID]
ERROR INVALID (src/main.c:396) Runtime of phase (2.191084) is below stonewall time. This shouldn't happen!
ERROR INVALID (src/main.c:402) Runtime is smaller than expected minimum runtime
[RESULT]    mdtest-easy-write      178.031067 kIOPS : time 2.191 seconds [INVALID]
[      ]            timestamp        0.000000 kIOPS : time 0.003 seconds
ERROR INVALID (src/phase_ior.c:24) Write phase needed 6.626582s instead of stonewall 300s. Stonewall was hit at 6.3s
ERROR INVALID (src/main.c:396) Runtime of phase (6.666027) is below stonewall time. This shouldn't happen!
ERROR INVALID (src/main.c:402) Runtime is smaller than expected minimum runtime
[RESULT]       ior-hard-write        2.114133 GiB/s : time 6.666 seconds [INVALID]
ERROR INVALID (src/main.c:396) Runtime of phase (5.672756) is below stonewall time. This shouldn't happen!
ERROR INVALID (src/main.c:402) Runtime is smaller than expected minimum runtime
[RESULT]    mdtest-hard-write       59.615140 kIOPS : time 5.673 seconds [INVALID]

[swat7-02:06130] *** An error occurred in MPI_Comm_split_type
[swat7-02:06130] *** reported by process [1960837121,9]
[swat7-02:06130] *** on communicator MPI_COMM_WORLD
[swat7-02:06130] *** MPI_ERR_ARG: invalid argument of some other kind
[swat7-02:06130] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[swat7-02:06130] ***    and potentially your MPI job)
....(repeated)
And IO500 terminates. This is using openmpi 4, using openmpi 3.1.6 IO500 simply hangs at the same spot.

Would anyone have insight into what is going on here, and how I can fix it?

Thank you for your help.

Join daos@daos.groups.io to automatically receive all group messages.