Date   

Re: Questions about DTX

Yong, Fan
 

If the old DTX leader crashed after all participants ‘prepared’ but some of them not ‘committed’ yet, then the remaining alive DTX participants will elect new leader based on some algorithm. The new leader will query such DTX status from related participants, then either commit or abort such DTX. Such process is called as DTX resync.

 

--

Regards,

Nasf

 

From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of 段世博
Sent: Thursday, April 15, 2021 9:59 PM
To: daos@daos.groups.io
Subject: [daos] Questions about DTX

 

hi~

   I have some questions about DTX. In DAOS, DTX is used to implement distributed transactions and ensure replica consistency. As far as I know, for a transaction in a prepared state in a non-leader node, the leader needs to be used to confirm whether it has been committed, which can avoid inconsistencies caused by some nodes have committed successfully while others failed.  My question is if the leader crashes during the 2PC process, will this affect the correctness of the system, or how does DAOS guarantee the correctness in this case?

thanks
~duan


Questions about DTX

段世博
 

hi~
   I have some questions about DTX. In DAOS, DTX is used to implement distributed transactions and ensure replica consistency. As far as I know, for a transaction in a prepared state in a non-leader node, the leader needs to be used to confirm whether it has been committed, which can avoid inconsistencies caused by some nodes have committed successfully while others failed.  My question is if the leader crashes during the 2PC process, will this affect the correctness of the system, or how does DAOS guarantee the correctness in this case?

thanks
~duan


Re: Timeouts/DAOS rendered useless when running IOR with SX/default object class

Rosenzweig, Joel B
 

Sure thing.  Unless you say otherwise, I’m planning to submit it against 1.2 and 2.0 branches.

 

https://github.com/daos-stack/daos/pull/5246

 

 

From: Lombardi, Johann <johann.lombardi@...>
Sent: Tuesday, March 30, 2021 3:19 PM
To: daos@daos.groups.io; Rosenzweig, Joel B <joel.b.rosenzweig@...>
Subject: Re: [daos] Timeouts/DAOS rendered useless when running IOR with SX/default object class

 

Hi Steffen,

 

Good catch! It sounds like we need to add a “LimitNOFILE” entry to our daos_server’s systemd unit file.

@Rosenzweig, Joel B could you please take of this? Thanks in advance.

 

Cheers,

Johann

 

From: <daos@daos.groups.io> on behalf of Steffen Christgau <christgau@...>
Reply-To: <daos@daos.groups.io>
Date: Tuesday 30 March 2021 at 17:04
To: <daos@daos.groups.io>
Subject: Re: [daos] Timeouts/DAOS rendered useless when running IOR with SX/default object class

 

A final "Hi" on that topic,

 

we have discovered the reason for the issue: The ulimit on the _server_

side was too low and it differs between regular users and daemons like

the DAOS server. For the latter it was set to soft 1024/hard 4096. We

increased it to 50000 respectively by modifying the service/unit file.

With that we did multiple IOR runs with up to 48 processes and SX object

class from a single client node without any errors.

 

We noted that the coredump end memlock limits are already "increased" in

the server's unit file. Maybe it is a good idea to increase the file

limit as well by default, although the limit may depend on the provider

in use.

 

Regards, Steffen

 

 

 

 

 

 


Re: Timeouts/DAOS rendered useless when running IOR with SX/default object class

Lombardi, Johann
 

Hi Steffen,

 

Good catch! It sounds like we need to add a “LimitNOFILE” entry to our daos_server’s systemd unit file.

@Rosenzweig, Joel B could you please take of this? Thanks in advance.

 

Cheers,

Johann

 

From: <daos@daos.groups.io> on behalf of Steffen Christgau <christgau@...>
Reply-To: <daos@daos.groups.io>
Date: Tuesday 30 March 2021 at 17:04
To: <daos@daos.groups.io>
Subject: Re: [daos] Timeouts/DAOS rendered useless when running IOR with SX/default object class

 

A final "Hi" on that topic,

 

we have discovered the reason for the issue: The ulimit on the _server_

side was too low and it differs between regular users and daemons like

the DAOS server. For the latter it was set to soft 1024/hard 4096. We

increased it to 50000 respectively by modifying the service/unit file.

With that we did multiple IOR runs with up to 48 processes and SX object

class from a single client node without any errors.

 

We noted that the coredump end memlock limits are already "increased" in

the server's unit file. Maybe it is a good idea to increase the file

limit as well by default, although the limit may depend on the provider

in use.

 

Regards, Steffen

 

 

 

 

 

 

---------------------------------------------------------------------
Intel Corporation SAS (French simplified joint stock company)
Registered headquarters: "Les Montalets"- 2, rue de Paris,
92196 Meudon Cedex, France
Registration Number:  302 456 199 R.C.S. NANTERRE
Capital: 4,572,000 Euros

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


Re: Timeouts/DAOS rendered useless when running IOR with SX/default object class

Steffen Christgau
 

A final "Hi" on that topic,

we have discovered the reason for the issue: The ulimit on the _server_ side was too low and it differs between regular users and daemons like the DAOS server. For the latter it was set to soft 1024/hard 4096. We increased it to 50000 respectively by modifying the service/unit file. With that we did multiple IOR runs with up to 48 processes and SX object class from a single client node without any errors.

We noted that the coredump end memlock limits are already "increased" in the server's unit file. Maybe it is a good idea to increase the file limit as well by default, although the limit may depend on the provider in use.

Regards, Steffen


Re: Timeouts/DAOS rendered useless when running IOR with SX/default object class

Steffen Christgau
 

Hi again once more,

meanwhile we checked the 'tcp' and the 'verbs' provider.

For 'tcp' we also experience the timeouts and an subsequently unusable DAOS system.

For 'verbs' (on an OmniPath network) we observe Mercury error on failed memory registrations:

03/29-12:36:21.95 bdaos15 DAOS[308011/308012] pool ERR src/pool/srv_pool.c:1899 transfer_map_buf() 4810a635: remote pool map buffer (4128) < required (5472)
03/29-12:36:50.65 bdaos15 DAOS[308011/308089] external ERR # HG -- error -- /builddir/build/BUILD/mercury-2.0.1rc1/src/mercury_bulk.c:846
# hg_bulk_register(): NA_Mem_register() failed (NA_PROTOCOL_ERROR)
03/29-12:36:50.65 bdaos15 DAOS[308011/308089] external ERR # HG -- error -- /builddir/build/BUILD/mercury-2.0.1rc1/src/mercury_bulk.c:762
# hg_bulk_create_na_mem_descs(): Could not register segment
03/29-12:36:50.65 bdaos15 DAOS[308011/308089] external ERR # HG -- error -- /builddir/build/BUILD/mercury-2.0.1rc1/src/mercury_bulk.c:626
# hg_bulk_create(): Could not create NA mem descriptors
03/29-12:36:50.65 bdaos15 DAOS[308011/308089] external ERR # HG -- error -- /builddir/build/BUILD/mercury-2.0.1rc1/src/mercury_bulk.c:2516
# HG_Bulk_create(): Could not create bulk handle
The version of all the employed providers is '111.10' - both on client and server side.

Maybe this help a little for further investigation.

Regards, Steffen


Re: Timeouts/DAOS rendered useless when running IOR with SX/default object class

Steffen Christgau
 

Hi again,

On 3/26/21 5:14 PM, Steffen Christgau wrote:
On 3/26/21 4:49 PM, Oganezov, Alexander A wrote:
Could you enable OFI level logs by setting FI_LOG_LEVEL=warn on the client side and provide stdout/stderr output from runs that result in mercury erorrs/timeouts?
Thanks for that input, we'll try to reproduce the issue with those settings and provide them ASAP
Here is the output of a failed attempt to run IOR. It now crashed for 48 processes on a single client. For smaller process counts IOR succeeds with the same messages/warnings from libfabric

$ export FI_LOG_LEVEL=warn > $ mpiexec -n 48 --map-by socket --bind-to core
/home/bemschri/opt/local/ior/github/bin/ior -F -r -w -t 1m -b 1g -i 3 -o /ior_file -a DFS --dfs.pool=... --dfs.cont=... --dfs.destroy --dfs.group=daos_server --dfs.oclass=S> libfabric:607767:core:core:fi_getinfo_():1019<warn> fi_getinfo: provider usnic returned -61 (No data available)
libfabric:607767:core:core:fi_getinfo_():1019<warn> fi_getinfo: provider ofi_rxm returned -61 (No data available)
libfabric:607767:core:core:fi_getinfo_():1019<warn> fi_getinfo: provider ofi_rxd returned -61 (No data available)
libfabric:607767:ofi_mrail:fabric:mrail_get_core_info():289<warn> OFI_MRAIL_ADDR_STRC env variable not set!
[repeats for each MPI process]

libfabric:607767:core:core:ofi_ns_add_local_name():370<warn> Cannot add
local name - name server uninitialized [repeats again]
IOR-3.4.0+dev: MPI Coordinated Test of Parallel I/O
Began : Mon Mar 29 10:47:36 2021
Command line : /home/bemschri/opt/local/ior/github/bin/ior -F -r
-w -t 1m -b 1g -i 3 -o /ior_file -a DFS --dfs.pool=... --dfs.cont=... --dfs.destroy --dfs.group=daos_server --dfs.oclass=SX
Machine : Linux bcn1031
TestID : 0
StartTime : Mon Mar 29 10:47:36 2021
Path : /ior_file.00000000
FS : 4607.9 TiB Used FS: 100.0% Inodes: 192512.0 Mi Used Inodes: 38.3%
Options: api : DFS
apiVersion : DAOS
test filename : /ior_file
access : file-per-process
type : independent
segments : 1
ordering in a file : sequential
ordering inter file : no tasks offsets
nodes : 1
tasks : 48
clients per node : 48
repetitions : 3
xfersize : 1 MiB
blocksize : 1 GiB
aggregate filesize : 48 GiB
Results: access bw(MiB/s) IOPS Latency(s) block(KiB) xfer(KiB) open(s) wr/rd(s) close(s) total(s) iter
------ --------- ---- ---------- ---------- --------- -------- -------- -------- -------- ----
^C
And in the DAOS client log we have the following

03/29-10:47:36.48 bcn1031 DAOS[607790/607790] crt INFO src/cart/crt_init.c:151 data_init() Disabling MR CACHE (FI_MR_CACHE_COUNT=0)
03/29-10:47:36.63 bcn1031 DAOS[607790/607790] mem WARN src/gurt/hash.c:763 d_hash_table_create_inplace() The d_hash_table_ops_t->hop_rec_hash()
callback is not provided!
Therefore the whole hash table locking will be used for backward compatibility.
03/29-10:48:38.41 bcn1031 DAOS[607798/607798] rpc ERR src/cart/crt_context.c:806 crt_context_timeout_check(0x1311b60) [opc=0x4020000 (DAOS) rpcid=0x7f90350400000033 rank:tag=14:7] ctx_id 0, (status: 0x38) timed out (60 seconds), target (14:7)
03/29-10:48:38.41 bcn1031 DAOS[607798/607798] rpc ERR src/cart/crt_context.c:755 crt_req_timeout_hdlr(0x1311b60) [opc=0x4020000 (DAOS) rpcid=0x7f90350400000033 rank:tag=14:7] aborting to group daos_server, rank 14, tgt_uri ofi+sockets://10.246.101.33:20007
03/29-10:48:38.41 bcn1031 DAOS[607798/607798] hg ERR src/cart/crt_hg.c:1050 crt_hg_req_send_cb(0x1311b60) [opc=0x4020000 (DAOS) rpcid=0x7f90350400000033 rank:tag=14:7] RPC failed; rc: DER_TIMEDOUT(-1011): 'Time out'
03/29-10:48:38.41 bcn1031 DAOS[607798/607798] object ERR src/object/cli_shard.c:631 dc_rw_cb() RPC 0 failed, DER_TIMEDOUT(-1011): 'Time out'
Regards, Steffen


Re: Timeouts/DAOS rendered useless when running IOR with SX/default object class

Steffen Christgau
 

Hi Alex,

On 3/26/21 4:49 PM, Oganezov, Alexander A wrote:
Could you enable OFI level logs by setting FI_LOG_LEVEL=warn on the client side and provide stdout/stderr output from runs that result in mercury erorrs/timeouts?
Thanks for that input, we'll try to reproduce the issue with those settings and provide them ASAP.

Also can you tell us what your ulimit -a reports on client/server nodes?
Sure.

client $ ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 1541126
max locked memory (kbytes, -l) unlimited
max memory size (kbytes, -m) 370688000
open files (-n) 65536
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) unlimited
cpu time (seconds, -t) unlimited
max user processes (-u) 4096
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited

On the server side pending signals is lower: 761096.

Regards, Steffen


Re: Timeouts/DAOS rendered useless when running IOR with SX/default object class

Oganezov, Alexander A
 

Hi Steffen,

Could you enable OFI level logs by setting FI_LOG_LEVEL=warn on the client side and provide stdout/stderr output from runs that result in mercury erorrs/timeouts?
Also can you tell us what your ulimit -a reports on client/server nodes?

We've seen issues before where if ulimit is set to too low for ulimit -n (open files) then some sockets connections could fail to be established. Getting ofi logs from the error would help to narrow this down.

Thanks,
~~Alex.

-----Original Message-----
From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of Steffen Christgau
Sent: Friday, March 26, 2021 7:50 AM
To: daos@daos.groups.io
Subject: [daos] Timeouts/DAOS rendered useless when running IOR with SX/default object class

Hi everybody,

during testing and performance assessment with IOR (latest Github
version from main branch) we are facing problems with DAOS v1.1.3.

When running IOR from a single client node there is no problem with
object class S1 and S2 with up to NP = 48 processes (from the dual
socket 96 core client machine). When we use the SX class (which is the
default in IOR), the benchmark successfully completes some of its
iterations but then hangs. This happens with as "little" as NP = 16
processes on that single client.

mpiexec -n NP --map-by socket --bind-to core ior -F -r -w -t 1m -b 1g -i
3 -o /ior_file -a DFS --dfs.pool=... --dfs.cont=... --dfs.destroy
--dfs.group=daos_server --dfs.oclass=OCLASS

In the client log we find the following

03/25-12:17:01.53 bcn1031 DAOS[536878/536878] rpc ERR src/cart/crt_context.c:806 crt_context_timeout_check(0x132e540) [opc=0x4020012 (DAOS) rpcid=0x5d481ae000000909 rank:tag=9:3] ctx_id 0, (status: 0x38) timed out (60 seconds), target (9:3)
03/25-12:17:01.53 bcn1031 DAOS[536875/536875] rpc ERR src/cart/crt_context.c:806 crt_context_timeout_check(0x1333750) [opc=0x4020012 (DAOS) rpcid=0x5edd88cd00000909 rank:tag=3:6] ctx_id 0, (status: 0x38) timed out (60 seconds), target (3:6)
03/25-12:17:01.53 bcn1031 DAOS[536874/536874] rpc ERR src/cart/crt_context.c:806 crt_context_timeout_check(0x13338a0) [opc=0x4020012 (DAOS) rpcid=0x454be3aa00000909 rank:tag=1:4] ctx_id 0, (status: 0x38) timed out (60 seconds), target (1:4)
03/25-12:17:01.53 bcn1031 DAOS[536874/536874] rpc ERR src/cart/crt_context.c:755 crt_req_timeout_hdlr(0x13338a0) [opc=0x4020012 (DAOS) rpcid=0x454be3aa00000909 rank:tag=1:4] aborting to group daos_server, rank 1, tgt_uri (null)
03/25-12:17:01.53 bcn1031 DAOS[536875/536875] rpc ERR src/cart/crt_context.c:755 crt_req_timeout_hdlr(0x1333750) [opc=0x4020012 (DAOS) rpcid=0x5edd88cd00000909 rank:tag=3:6] aborting to group daos_server, rank 3, tgt_uri (null)
03/25-12:17:01.53 bcn1031 DAOS[536878/536878] rpc ERR src/cart/crt_context.c:755 crt_req_timeout_hdlr(0x132e540) [opc=0x4020012 (DAOS) rpcid=0x5d481ae000000909 rank:tag=9:3] aborting to group daos_server, rank 9, tgt_uri (null)
03/25-12:17:01.53 bcn1031 DAOS[536873/536873] rpc ERR src/cart/crt_context.c:806 crt_context_timeout_check(0x13340c0) [opc=0x4020012 (DAOS) rpcid=0xaffa39e00000909 rank:tag=14:2] ctx_id 0, (status: 0x38) timed out (60 seconds), target (14:2)
03/25-12:17:01.53 bcn1031 DAOS[536873/536873] rpc ERR src/cart/crt_context.c:755 crt_req_timeout_hdlr(0x13340c0) [opc=0x4020012 (DAOS) rpcid=0xaffa39e00000909 rank:tag=14:2] aborting to group daos_server, rank 14, tgt_uri (null)
03/25-12:17:01.53 bcn1031 DAOS[536875/536875] hg ERR src/cart/crt_hg.c:1050 crt_hg_req_send_cb(0x1333750) [opc=0x4020012 (DAOS) rpcid=0x5edd88cd00000909 rank:tag=3:6] RPC failed; rc: DER_TIMEDOUT(-1011): 'Time out'
03/25-12:17:01.53 bcn1031 DAOS[536878/536878] hg ERR src/cart/crt_hg.c:1050 crt_hg_req_send_cb(0x132e540) [opc=0x4020012 (DAOS) rpcid=0x5d481ae000000909 rank:tag=9:3] RPC failed; rc: DER_TIMEDOUT(-1011): 'Time out'
03/25-12:17:01.53 bcn1031 DAOS[536874/536874] hg ERR src/cart/crt_hg.c:1050 crt_hg_req_send_cb(0x13338a0) [opc=0x4020012 (DAOS) rpcid=0x454be3aa00000909 rank:tag=1:4] RPC failed; rc: DER_TIMEDOUT(-1011): 'Time out'
03/25-12:17:01.53 bcn1031 DAOS[536873/536873] hg ERR src/cart/crt_hg.c:1050 crt_hg_req_send_cb(0x13340c0) [opc=0x4020012 (DAOS) rpcid=0xaffa39e00000909 rank:tag=14:2] RPC failed; rc: DER_TIMEDOUT(-1011): 'Time out'
At 60 seconds before the timestamp at which the timeout error occurs on
the client we find the following on rank9 (which has hostname bdaos14)

03/25-12:16:01.53 bdaos14 DAOS[28486/28507] external ERR # HG -- error -- /builddir/build/BUILD/mercury-2.0.1rc1/src/mercury_core.c:2751
# hg_core_forward_na(): Could not post send for input buffer (NA_PROTOCOL_ERROR)
03/25-12:16:01.53 bdaos14 DAOS[28486/28507] external ERR # HG -- error -- /builddir/build/BUILD/mercury-2.0.1rc1/src/mercury_core.c:2674
# hg_core_forward(): Could not forward buffer
03/25-12:16:01.53 bdaos14 DAOS[28486/28507] external ERR # HG -- error -- /builddir/build/BUILD/mercury-2.0.1rc1/src/mercury_core.c:5017
# HG_Core_forward(): Could not forward handle
03/25-12:16:01.53 bdaos14 DAOS[28486/28507] external ERR # HG -- error -- /builddir/build/BUILD/mercury-2.0.1rc1/src/mercury.c:1960
# HG_Forward(): Could not forward call (HG_PROTOCOL_ERROR)
03/25-12:16:01.53 bdaos14 DAOS[28486/28507] hg ERR src/cart/crt_hg.c:1090 crt_hg_req_send(0x7f255ad80f40) [opc=0x4020012 (DAOS) rpcid=0x4a70d1bc00001f4f rank:tag=16:5] HG_Forward failed, hg_ret: 12
> [...]
03/25-12:16:01.53 bdaos14 DAOS[28486/28507] rpc ERR src/cart/crt_context.c:806 crt_context_timeout_check(0x7f255ad80f40) [opc=0x4020012 (DAOS) rpcid=0x4a70d1bc00001f4f rank:tag=16:5] ctx_id 4, (status: 0x3f) timed out (60 seconds), target (16:5)
03/25-12:16:01.53 bdaos14 DAOS[28486/28507] rpc ERR src/cart/crt_context.c:743 crt_req_timeout_hdlr(0x7f255ad80f40) [opc=0x4020012 (DAOS) rpcid=0x4a70d1bc00001f4f rank:tag=16:5] failed due to group daos_server, rank 16, tgt_uri ofi+sockets://10.246.101.23:20005 can't rea
03/25-12:16:01.53 bdaos14 DAOS[28486/28507] rpc ERR src/cart/crt_context.c:292 crt_rpc_complete(0x7f255ad80f40) [opc=0x4020012 (DAOS) rpcid=0x4a70d1bc00001f4f rank:tag=16:5] failed, DER_UNREACH(-1006): 'Unreachable node'
03/25-12:16:01.57 bdaos14 DAOS[28486/28508] external ERR # HG -- error -- /builddir/build/BUILD/mercury-2.0.1rc1/src/mercury_core.c:2751
This happens for other rank:tag combinations as well.
The log on rank 16 (which is bdaos3) is basically clean at this point in
time (12:16:01). At the time the timout error manifests at the client we
see the following in the log of bdaos3.

03/25-12:17:01.56 bdaos3 DAOS[27816/27835] object ERR src/object/srv_obj.c:3946 ds_obj_dtx_follower() Handled DTX add8eaf5.199f0f144b80000 on non-leader: DER_UNKNOWN(1): 'Unknown error code 1'
03/25-12:17:01.56 bdaos3 DAOS[27816/27836] object ERR src/object/srv_obj.c:3946 ds_obj_dtx_follower() Handled DTX add8eaf5.199f0f144b80000 on non-leader: DER_UNKNOWN(1): 'Unknown error code 1'
There are a lot more similar errors over all server nodes which I can
send in a PM to whoever raises a hand ;-) Basic operations like
container creations and destruction are still working but even 'daos
pool autotest' fails although it worked fine before we started the
deadly IOR run.

daos pool autotest --pool=...
Step Operation Status Time(sec) Comment
0 Initializing DAOS OK 0.000
1 Connecting to pool OK 0.070
2 Creating container OK 0.000 uuid =
3 Opening container OK 0.060
10 Generating 1M S1 layouts OK 2.530
11 Generating 10K SX layouts OK 0.630
20 Inserting 1M 128B values rpc ERR src/cart/crt_context.c:806 crt_context_timeout_check(0x12a5540) [opc=0x4020000 (DAOS) rpcid=0x2c178c3e0000001a rank:tag=9:5] ctx_id 1, (status: 0x38) timed out (60 seconds), target (9:5)
rpc ERR src/cart/crt_context.c:755 crt_req_timeout_hdlr(0x12a5540) [opc=0x4020000 (DAOS) rpcid=0x2c178c3e0000001a rank:tag=9:5] aborting to group daos_server, rank 9, tgt_uri ofi+sockets://10.246.101.34:20005
hg ERR src/cart/crt_hg.c:1050 crt_hg_req_send_cb(0x12a5540) [opc=0x4020000 (DAOS) rpcid=0x2c178c3e0000001a rank:tag=9:5] RPC failed; rc: DER_TIMEDOUT(-1011): 'Time out'
object ERR src/object/cli_shard.c:631 dc_rw_cb() RPC 0 failed, DER_TIMEDOUT(-1011): 'Time out'
In the end, the DAOS system is in a state were it is hardly usable. Only
stopping the system and restarting the services brings it fully back to
life.

Maybe the object class has no impact at all but the with S1/S2 classes
the problem did not manifest. With SX we can provoke the issue quite
fast. While I would understand that striping over all nodes (which is my
understanding of SX) may decrease performance compared S1 or S2 I would
not expect that the system transitions into a unusable state. Could the
libfabric provider (sockets) be an isseue here?

Does anybody know what might the reason for this issue and/or what might
be changed to solve it?

Regards

Steffen


Timeouts/DAOS rendered useless when running IOR with SX/default object class

Steffen Christgau
 

Hi everybody,

during testing and performance assessment with IOR (latest Github version from main branch) we are facing problems with DAOS v1.1.3.

When running IOR from a single client node there is no problem with object class S1 and S2 with up to NP = 48 processes (from the dual socket 96 core client machine). When we use the SX class (which is the default in IOR), the benchmark successfully completes some of its iterations but then hangs. This happens with as "little" as NP = 16 processes on that single client.

mpiexec -n NP --map-by socket --bind-to core ior -F -r -w -t 1m -b 1g -i 3 -o /ior_file -a DFS --dfs.pool=... --dfs.cont=... --dfs.destroy --dfs.group=daos_server --dfs.oclass=OCLASS

In the client log we find the following

03/25-12:17:01.53 bcn1031 DAOS[536878/536878] rpc ERR src/cart/crt_context.c:806 crt_context_timeout_check(0x132e540) [opc=0x4020012 (DAOS) rpcid=0x5d481ae000000909 rank:tag=9:3] ctx_id 0, (status: 0x38) timed out (60 seconds), target (9:3)
03/25-12:17:01.53 bcn1031 DAOS[536875/536875] rpc ERR src/cart/crt_context.c:806 crt_context_timeout_check(0x1333750) [opc=0x4020012 (DAOS) rpcid=0x5edd88cd00000909 rank:tag=3:6] ctx_id 0, (status: 0x38) timed out (60 seconds), target (3:6)
03/25-12:17:01.53 bcn1031 DAOS[536874/536874] rpc ERR src/cart/crt_context.c:806 crt_context_timeout_check(0x13338a0) [opc=0x4020012 (DAOS) rpcid=0x454be3aa00000909 rank:tag=1:4] ctx_id 0, (status: 0x38) timed out (60 seconds), target (1:4)
03/25-12:17:01.53 bcn1031 DAOS[536874/536874] rpc ERR src/cart/crt_context.c:755 crt_req_timeout_hdlr(0x13338a0) [opc=0x4020012 (DAOS) rpcid=0x454be3aa00000909 rank:tag=1:4] aborting to group daos_server, rank 1, tgt_uri (null)
03/25-12:17:01.53 bcn1031 DAOS[536875/536875] rpc ERR src/cart/crt_context.c:755 crt_req_timeout_hdlr(0x1333750) [opc=0x4020012 (DAOS) rpcid=0x5edd88cd00000909 rank:tag=3:6] aborting to group daos_server, rank 3, tgt_uri (null)
03/25-12:17:01.53 bcn1031 DAOS[536878/536878] rpc ERR src/cart/crt_context.c:755 crt_req_timeout_hdlr(0x132e540) [opc=0x4020012 (DAOS) rpcid=0x5d481ae000000909 rank:tag=9:3] aborting to group daos_server, rank 9, tgt_uri (null)
03/25-12:17:01.53 bcn1031 DAOS[536873/536873] rpc ERR src/cart/crt_context.c:806 crt_context_timeout_check(0x13340c0) [opc=0x4020012 (DAOS) rpcid=0xaffa39e00000909 rank:tag=14:2] ctx_id 0, (status: 0x38) timed out (60 seconds), target (14:2)
03/25-12:17:01.53 bcn1031 DAOS[536873/536873] rpc ERR src/cart/crt_context.c:755 crt_req_timeout_hdlr(0x13340c0) [opc=0x4020012 (DAOS) rpcid=0xaffa39e00000909 rank:tag=14:2] aborting to group daos_server, rank 14, tgt_uri (null)
03/25-12:17:01.53 bcn1031 DAOS[536875/536875] hg ERR src/cart/crt_hg.c:1050 crt_hg_req_send_cb(0x1333750) [opc=0x4020012 (DAOS) rpcid=0x5edd88cd00000909 rank:tag=3:6] RPC failed; rc: DER_TIMEDOUT(-1011): 'Time out'
03/25-12:17:01.53 bcn1031 DAOS[536878/536878] hg ERR src/cart/crt_hg.c:1050 crt_hg_req_send_cb(0x132e540) [opc=0x4020012 (DAOS) rpcid=0x5d481ae000000909 rank:tag=9:3] RPC failed; rc: DER_TIMEDOUT(-1011): 'Time out'
03/25-12:17:01.53 bcn1031 DAOS[536874/536874] hg ERR src/cart/crt_hg.c:1050 crt_hg_req_send_cb(0x13338a0) [opc=0x4020012 (DAOS) rpcid=0x454be3aa00000909 rank:tag=1:4] RPC failed; rc: DER_TIMEDOUT(-1011): 'Time out'
03/25-12:17:01.53 bcn1031 DAOS[536873/536873] hg ERR src/cart/crt_hg.c:1050 crt_hg_req_send_cb(0x13340c0) [opc=0x4020012 (DAOS) rpcid=0xaffa39e00000909 rank:tag=14:2] RPC failed; rc: DER_TIMEDOUT(-1011): 'Time out'
At 60 seconds before the timestamp at which the timeout error occurs on the client we find the following on rank9 (which has hostname bdaos14)

03/25-12:16:01.53 bdaos14 DAOS[28486/28507] external ERR # HG -- error -- /builddir/build/BUILD/mercury-2.0.1rc1/src/mercury_core.c:2751
# hg_core_forward_na(): Could not post send for input buffer (NA_PROTOCOL_ERROR)
03/25-12:16:01.53 bdaos14 DAOS[28486/28507] external ERR # HG -- error -- /builddir/build/BUILD/mercury-2.0.1rc1/src/mercury_core.c:2674
# hg_core_forward(): Could not forward buffer
03/25-12:16:01.53 bdaos14 DAOS[28486/28507] external ERR # HG -- error -- /builddir/build/BUILD/mercury-2.0.1rc1/src/mercury_core.c:5017
# HG_Core_forward(): Could not forward handle
03/25-12:16:01.53 bdaos14 DAOS[28486/28507] external ERR # HG -- error -- /builddir/build/BUILD/mercury-2.0.1rc1/src/mercury.c:1960
# HG_Forward(): Could not forward call (HG_PROTOCOL_ERROR)
03/25-12:16:01.53 bdaos14 DAOS[28486/28507] hg ERR src/cart/crt_hg.c:1090 crt_hg_req_send(0x7f255ad80f40) [opc=0x4020012 (DAOS) rpcid=0x4a70d1bc00001f4f rank:tag=16:5] HG_Forward failed, hg_ret: 12
[...]
03/25-12:16:01.53 bdaos14 DAOS[28486/28507] rpc ERR src/cart/crt_context.c:806 crt_context_timeout_check(0x7f255ad80f40) [opc=0x4020012 (DAOS) rpcid=0x4a70d1bc00001f4f rank:tag=16:5] ctx_id 4, (status: 0x3f) timed out (60 seconds), target (16:5)
03/25-12:16:01.53 bdaos14 DAOS[28486/28507] rpc ERR src/cart/crt_context.c:743 crt_req_timeout_hdlr(0x7f255ad80f40) [opc=0x4020012 (DAOS) rpcid=0x4a70d1bc00001f4f rank:tag=16:5] failed due to group daos_server, rank 16, tgt_uri ofi+sockets://10.246.101.23:20005 can't rea
03/25-12:16:01.53 bdaos14 DAOS[28486/28507] rpc ERR src/cart/crt_context.c:292 crt_rpc_complete(0x7f255ad80f40) [opc=0x4020012 (DAOS) rpcid=0x4a70d1bc00001f4f rank:tag=16:5] failed, DER_UNREACH(-1006): 'Unreachable node'
03/25-12:16:01.57 bdaos14 DAOS[28486/28508] external ERR # HG -- error -- /builddir/build/BUILD/mercury-2.0.1rc1/src/mercury_core.c:2751
This happens for other rank:tag combinations as well.
The log on rank 16 (which is bdaos3) is basically clean at this point in time (12:16:01). At the time the timout error manifests at the client we see the following in the log of bdaos3.

03/25-12:17:01.56 bdaos3 DAOS[27816/27835] object ERR src/object/srv_obj.c:3946 ds_obj_dtx_follower() Handled DTX add8eaf5.199f0f144b80000 on non-leader: DER_UNKNOWN(1): 'Unknown error code 1'
03/25-12:17:01.56 bdaos3 DAOS[27816/27836] object ERR src/object/srv_obj.c:3946 ds_obj_dtx_follower() Handled DTX add8eaf5.199f0f144b80000 on non-leader: DER_UNKNOWN(1): 'Unknown error code 1'
There are a lot more similar errors over all server nodes which I can send in a PM to whoever raises a hand ;-) Basic operations like container creations and destruction are still working but even 'daos pool autotest' fails although it worked fine before we started the deadly IOR run.

daos pool autotest --pool=...
Step Operation Status Time(sec) Comment
0 Initializing DAOS OK 0.000 1 Connecting to pool OK 0.070 2 Creating container OK 0.000 uuid = 3 Opening container OK 0.060 10 Generating 1M S1 layouts OK 2.530 11 Generating 10K SX layouts OK 0.630 20 Inserting 1M 128B values rpc ERR src/cart/crt_context.c:806 crt_context_timeout_check(0x12a5540) [opc=0x4020000 (DAOS) rpcid=0x2c178c3e0000001a rank:tag=9:5] ctx_id 1, (status: 0x38) timed out (60 seconds), target (9:5)
rpc ERR src/cart/crt_context.c:755 crt_req_timeout_hdlr(0x12a5540) [opc=0x4020000 (DAOS) rpcid=0x2c178c3e0000001a rank:tag=9:5] aborting to group daos_server, rank 9, tgt_uri ofi+sockets://10.246.101.34:20005
hg ERR src/cart/crt_hg.c:1050 crt_hg_req_send_cb(0x12a5540) [opc=0x4020000 (DAOS) rpcid=0x2c178c3e0000001a rank:tag=9:5] RPC failed; rc: DER_TIMEDOUT(-1011): 'Time out'
object ERR src/object/cli_shard.c:631 dc_rw_cb() RPC 0 failed, DER_TIMEDOUT(-1011): 'Time out'
In the end, the DAOS system is in a state were it is hardly usable. Only stopping the system and restarting the services brings it fully back to life.

Maybe the object class has no impact at all but the with S1/S2 classes the problem did not manifest. With SX we can provoke the issue quite fast. While I would understand that striping over all nodes (which is my understanding of SX) may decrease performance compared S1 or S2 I would not expect that the system transitions into a unusable state. Could the libfabric provider (sockets) be an isseue here?

Does anybody know what might the reason for this issue and/or what might be changed to solve it?

Regards

Steffen


Re: Errors while compiling DAOS on ARM64 platform

Rosenzweig, Joel B
 

Hi Huijun,

 

At one point in time, we added “// +build linux,amd64” to the netdetect.go file to enable it to build under ARM.  Does your version of netdetect.go have the following at the end of the copyright header before “Package netdetect”?  If it does not, go ahead and patch your file accordingly and try again. 

 

//

// +build linux,amd64

//

 

Package netdetect

 

Regards,

Joel

 

From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of Wu Huijun
Sent: Friday, March 19, 2021 11:11 PM
To: daos@daos.groups.io
Subject: [daos] Errors while compiling DAOS on ARM64 platform

 

Hi all, 

I am trying to compile DAOS on ARM64 platform (little endian). I am working with the branch'tanabarr/control-no-ipmctl-May2020' to avoid the ipmctl dependency. 

However, I got errors below with go build github.com/daos-stack/daos/src/control/lib/netdetect
Any clue about this? I checked the GOPATH and it seems the go compiler and indeed find the code but just could not compile. 

ar rc build/dev/gcc/src/control/lib/spdk/libnvme_control.a build/dev/gcc/src/control/lib/spdk/src/nvme_control.o build/dev/gcc/src/control/lib/spdk/src/nvme_control_common.o

gcc -c -Isrc/cart/src/include -Isrc/cart/src/cart -I/root/daos/install/include -I/root/daos/install/include/na -E -P src/cart/src/cart/crt_swim.c | cat > build/dev/gcc/src/cart/src/cart/crt_swim_pp.c

ranlib build/dev/gcc/src/control/lib/spdk/libnvme_control.a

cd /root/daos/src/control; /usr/lib/go-1.13/bin/go build -mod vendor -v -ldflags "-X github.com/daos-stack/daos/src/control/build.DaosVersion=1.1.0 -X github.com/daos-stack/daos/src/control/build.ConfigDir=/root/daos/install/etc -B 0x91d6cda8b03b8b86157723c893b049e89e83e1d6" -o /root/daos/build/dev/gcc/src/control/bin/daos_admin github.com/daos-stack/daos/src/control/cmd/daos_admin

github.com/daos-stack/daos/src/control/lib/netdetect

go build github.com/daos-stack/daos/src/control/lib/netdetect: build constraints exclude all Go files in /root/daos/src/control/lib/netdetect

gcc -c -Isrc/cart/src/include -Isrc/cart/src/cart -I/root/daos/install/include -I/root/daos/install/include/na -E -P src/cart/src/cart/crt_tree.c | cat > build/dev/gcc/src/cart/src/cart/crt_tree_pp.c

scons: *** [build/dev/gcc/src/control/bin/daos_agent] Error 1

gcc -c -Isrc/cart/src/include -Isrc/cart/src/cart -I/root/daos/install/include -I/root/daos/install/include/na -E -P src/cart/src/cart/crt_tree_flat.c | cat > build/dev/gcc/src/cart/src/cart/crt_tree_flat_pp.c

gcc -c -Isrc/cart/src/include -Isrc/cart/src/cart -I/root/daos/install/include -I/root/daos/install/include/na -E -P src/cart/src/cart/crt_tree_kary.c | cat > build/dev/gcc/src/cart/src/cart/crt_tree_kary_pp.c

gcc -c -Isrc/cart/src/include -Isrc/cart/src/cart -I/root/daos/install/include -I/root/daos/install/include/na -E -P src/cart/src/cart/crt_tree_knomial.c | cat > build/dev/gcc/src/cart/src/cart/crt_tree_knomial_pp.c

gcc -c -Isrc/cart/src/include -Isrc/cart/src/cart -I/root/daos/install/include -I/root/daos/install/include/na -E -P src/cart/src/cart/crt_hlc.c | cat > build/dev/gcc/src/cart/src/cart/crt_hlc_pp.c

scons: building terminated because of errors.

Cheers,
Huijun

 


Errors while compiling DAOS on ARM64 platform

Wu Huijun
 

Hi all, 

I am trying to compile DAOS on ARM64 platform (little endian). I am working with the branch'tanabarr/control-no-ipmctl-May2020' to avoid the ipmctl dependency. 

However, I got errors below with go build github.com/daos-stack/daos/src/control/lib/netdetect
Any clue about this? I checked the GOPATH and it seems the go compiler and indeed find the code but just could not compile. 

ar rc build/dev/gcc/src/control/lib/spdk/libnvme_control.a build/dev/gcc/src/control/lib/spdk/src/nvme_control.o build/dev/gcc/src/control/lib/spdk/src/nvme_control_common.o
gcc -c -Isrc/cart/src/include -Isrc/cart/src/cart -I/root/daos/install/include -I/root/daos/install/include/na -E -P src/cart/src/cart/crt_swim.c | cat > build/dev/gcc/src/cart/src/cart/crt_swim_pp.c
ranlib build/dev/gcc/src/control/lib/spdk/libnvme_control.a
cd /root/daos/src/control; /usr/lib/go-1.13/bin/go build -mod vendor -v -ldflags "-X github.com/daos-stack/daos/src/control/build.DaosVersion=1.1.0 -X github.com/daos-stack/daos/src/control/build.ConfigDir=/root/daos/install/etc -B 0x91d6cda8b03b8b86157723c893b049e89e83e1d6" -o /root/daos/build/dev/gcc/src/control/bin/daos_admin github.com/daos-stack/daos/src/control/cmd/daos_admin
github.com/daos-stack/daos/src/control/lib/netdetect
go build github.com/daos-stack/daos/src/control/lib/netdetect: build constraints exclude all Go files in /root/daos/src/control/lib/netdetect
gcc -c -Isrc/cart/src/include -Isrc/cart/src/cart -I/root/daos/install/include -I/root/daos/install/include/na -E -P src/cart/src/cart/crt_tree.c | cat > build/dev/gcc/src/cart/src/cart/crt_tree_pp.c
scons: *** [build/dev/gcc/src/control/bin/daos_agent] Error 1
gcc -c -Isrc/cart/src/include -Isrc/cart/src/cart -I/root/daos/install/include -I/root/daos/install/include/na -E -P src/cart/src/cart/crt_tree_flat.c | cat > build/dev/gcc/src/cart/src/cart/crt_tree_flat_pp.c
gcc -c -Isrc/cart/src/include -Isrc/cart/src/cart -I/root/daos/install/include -I/root/daos/install/include/na -E -P src/cart/src/cart/crt_tree_kary.c | cat > build/dev/gcc/src/cart/src/cart/crt_tree_kary_pp.c
gcc -c -Isrc/cart/src/include -Isrc/cart/src/cart -I/root/daos/install/include -I/root/daos/install/include/na -E -P src/cart/src/cart/crt_tree_knomial.c | cat > build/dev/gcc/src/cart/src/cart/crt_tree_knomial_pp.c
gcc -c -Isrc/cart/src/include -Isrc/cart/src/cart -I/root/daos/install/include -I/root/daos/install/include/na -E -P src/cart/src/cart/crt_hlc.c | cat > build/dev/gcc/src/cart/src/cart/crt_hlc_pp.c
scons: building terminated because of errors.

Cheers,
Huijun
 


Re: Questions about Daos consistency

段世博
 

T3 starts before T1, so T3 can obtain a timestamp less than T1. T1 has not yet started when T3 is read, so there will be no uncertain reading
When T1 writes C1, it does not check the read timestamp smaller than itself, so T3 cannot see the write of T1.


Re: Questions about Daos consistency

Olivier, Jeffrey V
 

I may be missing something here but assuming T3 is at a later timestamp to T1, the read of C1 would update the read timestamp in the negative entry for C1 (based on a hash of the key).   Before T1 creates C1, it would check this timestamp, find a conflict, and be forced to restart at a later timestamp.

 

-Jeff

 

From: daos@daos.groups.io <daos@daos.groups.io> on behalf of Li, Wei G <wei.g.li@...>
Date: Friday, March 19, 2021 at 3:03 AM
To: daos@daos.groups.io <daos@daos.groups.io>
Subject: Re: [daos] Questions about Daos consistency

You are right. This can also happen with DAOS. I will correct that document.

Thanks,
liwei

> On Mar 19, 2021, at 4:06 PM, 段世博 <duanshibo.d@...> wrote:
>
>   I found that the concurrency control of DAOS is similar to CockroachDB, but the following situations may occur in CockroachDB according to jepsen analysis (https://jepsen.io/analyses/cockroachdb-beta-20160829). C1 and C2 are two unrelated data. T2 starts after T1 is committed. However, the data returned by T3 only sees the writing of T2 while can not see the writing of T1. Obviously, this violates external consistency.
>
> T3: r(C1) (not found)
> T1: w(C1)
> T1: commit
> T2: w(C2)
> T2: commit
> T3: r(C2) (found)
> T3: commit
>  
>   Can this happen in DAOS? If can't, How Daos avoids this situation?
>   Thanks.
>






Re: Questions about Daos consistency

Li, Wei G
 

You are right. This can also happen with DAOS. I will correct that document.

Thanks,
liwei

On Mar 19, 2021, at 4:06 PM, 段世博 <duanshibo.d@gmail.com> wrote:

I found that the concurrency control of DAOS is similar to CockroachDB, but the following situations may occur in CockroachDB according to jepsen analysis (https://jepsen.io/analyses/cockroachdb-beta-20160829). C1 and C2 are two unrelated data. T2 starts after T1 is committed. However, the data returned by T3 only sees the writing of T2 while can not see the writing of T1. Obviously, this violates external consistency.

T3: r(C1) (not found)
T1: w(C1)
T1: commit
T2: w(C2)
T2: commit
T3: r(C2) (found)
T3: commit

Can this happen in DAOS? If can't, How Daos avoids this situation?
Thanks.


Re: Questions about Daos consistency

段世博
 

  I found that the concurrency control of DAOS is similar to CockroachDB, but the following situations may occur in CockroachDB according to jepsen analysis (https://jepsen.io/analyses/cockroachdb-beta-20160829). C1 and C2 are two unrelated data. T2 starts after T1 is committed. However, the data returned by T3 only sees the writing of T2 while can not see the writing of T1. Obviously, this violates external consistency.

T3: r(C1) (not found) 
T1: w(C1)
T1: commit
T2: w(C2)
T2: commit
T3: r(C2) (found)
T3: commit
 
  Can this happen in DAOS? If can't, How Daos avoids this situation?
  Thanks.


DFS fio engine

Lombardi, Johann
 

Hi there,

 

I just would like to share with you that the DAOS File System (DFS) engine has been integrated into the upstream FIO repository (https://github.com/axboe/fio).

 

How to build it on centos7:

 

$ sudo yum install centos-release-scl

$ sudo yum install -y git devtoolset-9-gcc libuuid-devel

$ scl enable devtoolset-9 bash

$ git clone http://git.kernel.dk/fio.git

$ cd fio

 

If DAOS is installed via RPMs:

$ ./configure 

 

Otherwise:

$ CFLAGS="-I/path/to/daos/install/include" LDFLAGS="-L/path/to/daos/install/lib64" ./configure

 

$ make -j install

 

How to use it:

 

$ export POOL= # your pool UUID

$ export CONT= # your container UUID

$ fio ./examples/dfs.fio

 

Those instructions will be integrated soon into our online documentation.

 

Cheers,

Johann

---------------------------------------------------------------------
Intel Corporation SAS (French simplified joint stock company)
Registered headquarters: "Les Montalets"- 2, rue de Paris,
92196 Meudon Cedex, France
Registration Number:  302 456 199 R.C.S. NANTERRE
Capital: 4,572,000 Euros

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


Re: Questions about Daos consistency

Li, Wei G
 

Yes. A DAOS client can only "see a state” via unversioned transactions (including I/O operations submitted without an explicit transaction) and explicitly-created snapshots. If an application hacks the snapshot epoch, however, effectively specifying an arbitrary version that may not have been snapshotted, then no transactional consistency is promised.

liwei

On Mar 16, 2021, at 10:01 PM, 段世博 <duanshibo.d@gmail.com> wrote:

In the VOS document, the MVCC section mentions "The MVCC rules ensure that transactions execute as if they are serialized in their epoch order while complying with external consistency, as long as the system clock offsets are always within the expected maximum system clock offset (epsilon )."
I want to know whether the external consistency here has the same meaning as spanner's external consistency? which is "In addition if one transaction completes before another transaction starts to commit, the system guarantees that clients can never see a state that includes the effect of the second transaction but not the first."


Questions about Daos consistency

段世博
 

    In the VOS document, the MVCC section mentions "The MVCC rules ensure that transactions execute as if they are serialized in their epoch order while complying with external consistency, as long as the system clock offsets are always within the expected maximum system clock offset (epsilon )."
    I want to know whether the external consistency here has the same meaning as google spanner's external consistency? which is "In addition if one transaction completes before another transaction starts to commit, the system guarantees that clients can never see a state that includes the effect of the second transaction but not the first."


Questions about Daos consistency

段世博
 

     In the VOS document, the MVCC section mentions "The MVCC rules ensure that transactions execute as if they are serialized in their epoch order while complying with external consistency, as long as the system clock offsets are always within the expected maximum system clock offset (epsilon )."
    I want to know whether the external consistency here has the same meaning as spanner's external consistency? which is "In addition if one transaction completes before another transaction starts to commit, the system guarantees that clients can never see a state that includes the effect of the second transaction but not the first."

1 - 20 of 1391