Date   

Re: no dRPC client set on Ubuntu 20.04.1

Yunjae Lee
 

Same problem here, I used the master branch of daos on ubuntu 20.04.1
I also met this error when starting the daos_server:
ERROR: removing socket file: removing instance 0 socket file: no dRPC client set (data plane not started?)
The same message was printed in /tmp/daos_server.log:
bio ERR src/bio/bio_xstream.c:367 bios_spdk_env_init() Failed to init SPDK thread lib, DER_INVAL(-1003)
The identify error also happened in my case, however, it is now solved thanks to @Tom.
The results of the identify is attached, as well as the server logs and configurations i used.


Re: DAOS Distributed Transaction

Changwoo Min
 

Hi Johann,

Thank you for your reply. It helps a lot. 

Regards,
Changwoo Min

On Tue, Sep 8, 2020 at 1:12 PM Lombardi, Johann <johann.lombardi@...> wrote:

Hi there!

 

First of all, thanks for joining the community. I am *very* excited about the support of distributed transactions in DAOS. We have been working on this for a long time and completely changed the design in 2017 to properly support serializability and external consistency.

 

The first use case is to maintain internal consistency of our POSIX layer called DAOS File System (DFS). The integration with distributed transactions allows to improve the POSIX compliance of the DFS library by guaranteeing atomicity of metadata operations like rename(2). No orphans or dandling entries are left behind in case of a client crashing in the middle of a rename operation.

 

Since POSIX is just yet-another middleware library for us, the same applies to all I/O middleware libraries built natively over DAOS.  With the HDF5 DAOS VOL, HDF5 datasets can be updated safely in place without any risk of corrupting the internal HDF5 data structures when the application crashes or quits unexpectedly. One HDF5 operation typically requires multiple KV fetches/updates over the DAOS case. By bundling all those low-level operations under a single DAOS transaction, we can guarantee that the high-level HDF5 operation is atomic. Since the HDF5 DAOS VOL also supports independent operations (e.g. concurrent non-collective HDF5 group creations), the use of DAOS transactions also allow to preserve internal consistency when processing concurrent uncoordinated HDF5 operations. It is used in a similar way in several domain-specific data models that are in the process of being ported to the native DAOS API, bypassing POSIX entirely.

 

Last but not least, distributed transactions allow to support database semantics directly over DAOS. We are actually in the process of porting a SQL engine over DAOS as a proof of concept. I do believe that this capability combined with computational storage (i.e. running simple data-intensive tasks on DAOS storage nodes directly w/o moving the data over the fabric) will open the door to many interesting opportunities (e.g. query/indexing, …) in areas like data analytics and AI.

 

Cheers,

Johann

 

From: <daos@daos.groups.io> on behalf of Changwoo Min <changwoo@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Saturday 5 September 2020 at 06:54
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: [daos] DAOS Distributed Transaction

 

Hi DAOS community!

I'm Changwoo Min, a professor at Virginia Tech. My group does research on persistent memory and storage systems. I found that DAOS is an exciting and cool project!

In particular, it is interesting to me that DAOS supports distributed transactions. I am wondering what the typical/intended use cases and applications of the distributed transaction are. Especially considering, as far as I know, DAOS will be deployed to HPC systems. I wonder if transactions can benefit any HPC/AI/ML/analytics applications. Any comments will be helpful.

Regards,
Changwoo Min

---------------------------------------------------------------------
Intel Corporation SAS (French simplified joint stock company)
Registered headquarters: "Les Montalets"- 2, rue de Paris,
92196 Meudon Cedex, France
Registration Number:  302 456 199 R.C.S. NANTERRE
Capital: 4,572,000 Euros

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


Re: no dRPC client set on Ubuntu 20.04.1

Nabarro, Tom
 

Hello Gert,

 

./identify error can be fixed by pre-fixing command with "LD_LIBRARY_PATH=/path/to/spdk/libs" which in your case is probably /root/daos/install/prereq/dev/spdk/lib/libspdk_sock_posix.so.2.0 .

 

Please post the results of identify and we can go from there.

 

Regards,

Tom Nabarro – DCG/ESAD

M: +44 (0)7786 260986

Skype: tom.nabarro

 

From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of Gert Pauwels (intel)
Sent: Thursday, September 10, 2020 7:13 PM
To: daos@daos.groups.io
Subject: [daos] no dRPC client set on Ubuntu 20.04.1

 

Today on my Ubuntu 20.04.1 system I had the following error when running:

$ daos_server start

ERROR: removing socket file: removing instance 0 socket file: no dRPC client set (data plane not started?)

I tried to 'solve' the problem by reinstalling Ubuntu 20.04.1 from scratch and compiled the master branch of DAOS.
Problem was still there.

The first error in /tmp/daos_server.log log points a bit to SPDK:
bio  ERR  src/bio/bio_xstream.c:367 bio_spdk_env_init() Failed to init SPDK thread lib, DER_INVAL(-1003)

I did an attempt to check if SPDK runs fine without using DAOS by calling the identify application in the spdk example directory

root@intel-S2600WFD:~/daos/build/external/dev/spdk/examples/nvme/identify# ./identify 

./identify: error while loading shared libraries: libspdk_sock_posix.so.2.0: cannot open shared object file: No such file or directory


Any suggestion on how to find what is happening.

Thanks in advance,

Gert,

---------------------------------------------------------------------
Intel Corporation (UK) Limited
Registered No. 1134945 (England)
Registered Office: Pipers Way, Swindon SN3 1RJ
VAT No: 860 2173 47

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


no dRPC client set on Ubuntu 20.04.1

Gert Pauwels
 

Today on my Ubuntu 20.04.1 system I had the following error when running:
$ daos_server start
ERROR: removing socket file: removing instance 0 socket file: no dRPC client set (data plane not started?)

I tried to 'solve' the problem by reinstalling Ubuntu 20.04.1 from scratch and compiled the master branch of DAOS.
Problem was still there.

The first error in /tmp/daos_server.log log points a bit to SPDK:
bio  ERR  src/bio/bio_xstream.c:367 bio_spdk_env_init() Failed to init SPDK thread lib, DER_INVAL(-1003)

I did an attempt to check if SPDK runs fine without using DAOS by calling the identify application in the spdk example directory
root@intel-S2600WFD:~/daos/build/external/dev/spdk/examples/nvme/identify# ./identify 
./identify: error while loading shared libraries: libspdk_sock_posix.so.2.0: cannot open shared object file: No such file or directory

Any suggestion on how to find what is happening.

Thanks in advance,

Gert,


Re: DAOS Distributed Transaction

Lombardi, Johann
 

Hi there!

 

First of all, thanks for joining the community. I am *very* excited about the support of distributed transactions in DAOS. We have been working on this for a long time and completely changed the design in 2017 to properly support serializability and external consistency.

 

The first use case is to maintain internal consistency of our POSIX layer called DAOS File System (DFS). The integration with distributed transactions allows to improve the POSIX compliance of the DFS library by guaranteeing atomicity of metadata operations like rename(2). No orphans or dandling entries are left behind in case of a client crashing in the middle of a rename operation.

 

Since POSIX is just yet-another middleware library for us, the same applies to all I/O middleware libraries built natively over DAOS.  With the HDF5 DAOS VOL, HDF5 datasets can be updated safely in place without any risk of corrupting the internal HDF5 data structures when the application crashes or quits unexpectedly. One HDF5 operation typically requires multiple KV fetches/updates over the DAOS case. By bundling all those low-level operations under a single DAOS transaction, we can guarantee that the high-level HDF5 operation is atomic. Since the HDF5 DAOS VOL also supports independent operations (e.g. concurrent non-collective HDF5 group creations), the use of DAOS transactions also allow to preserve internal consistency when processing concurrent uncoordinated HDF5 operations. It is used in a similar way in several domain-specific data models that are in the process of being ported to the native DAOS API, bypassing POSIX entirely.

 

Last but not least, distributed transactions allow to support database semantics directly over DAOS. We are actually in the process of porting a SQL engine over DAOS as a proof of concept. I do believe that this capability combined with computational storage (i.e. running simple data-intensive tasks on DAOS storage nodes directly w/o moving the data over the fabric) will open the door to many interesting opportunities (e.g. query/indexing, …) in areas like data analytics and AI.

 

Cheers,

Johann

 

From: <daos@daos.groups.io> on behalf of Changwoo Min <changwoo@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Saturday 5 September 2020 at 06:54
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: [daos] DAOS Distributed Transaction

 

Hi DAOS community!

I'm Changwoo Min, a professor at Virginia Tech. My group does research on persistent memory and storage systems. I found that DAOS is an exciting and cool project!

In particular, it is interesting to me that DAOS supports distributed transactions. I am wondering what the typical/intended use cases and applications of the distributed transaction are. Especially considering, as far as I know, DAOS will be deployed to HPC systems. I wonder if transactions can benefit any HPC/AI/ML/analytics applications. Any comments will be helpful.

Regards,
Changwoo Min

---------------------------------------------------------------------
Intel Corporation SAS (French simplified joint stock company)
Registered headquarters: "Les Montalets"- 2, rue de Paris,
92196 Meudon Cedex, France
Registration Number:  302 456 199 R.C.S. NANTERRE
Capital: 4,572,000 Euros

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


Re: DAOS/OFI & MOFED Support

Farrell, Patrick Arthur <patrick.farrell@...>
 

We had no specific need of 5.1, so we rolled back for now, since 5.0.x is still supported from Mellanox.  The little digging I did suggested the issue is in OFA, rather than DAOS, so I expect that the Open Fabrics people will fix the incompatibility.

-Patrick

From: daos@daos.groups.io <daos@daos.groups.io> on behalf of Lombardi, Johann <johann.lombardi@...>
Sent: Monday, September 7, 2020 12:52 AM
To: daos@daos.groups.io <daos@daos.groups.io>
Subject: Re: [daos] DAOS/OFI & MOFED Support
 

Hi Patrick,

 

We are using MOFED 5.0.2 on Frontera and I don’t think we have ever tested with 5.1. Were you able to figure it out?

Cheers,

Johann

 

From: <daos@daos.groups.io> on behalf of "Farrell, Patrick Arthur" <patrick.farrell@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Friday 28 August 2020 at 23:34
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: [daos] DAOS/OFI & MOFED Support

 

Good afternoon,

 

I am curious if anyone has tried DAOS with MLNX_OFED_LINUX-5.1-0.6.6.0 - The current latest version of MOFED 5.1.

 

I did, and I'm getting mercury errors related to CQs...

 

So, before dumping the errors:
Should this work?  Is it supported to run DAOS with MOFED 5.1?

 

Thanks - error dump follows:

 

For example, on rank0 when trying to create a pool on ranks 0 and 1:

08/28-16:29:42.684123 delphi-002 DAOS[279252/279298] hg   ERR  # NA -- Error -- /delphi/common/daos/build/external/dev/mercury/src/na/na_ofi.c:2555

 # na_ofi_cq_read(): Operation ID was not canceled

08/28-16:29:42.684134 delphi-002 DAOS[279252/279298] hg   ERR  # NA -- Error -- /delphi/common/daos/build/external/dev/mercury/src/na/na_ofi.c:4585

 # na_ofi_progress(): Could not read events from context CQ

08/28-16:29:42.684154 delphi-002 DAOS[279252/279298] hg   ERR  # HG -- Error -- /delphi/common/daos/build/external/dev/mercury/src/mercury_core.c:2758

 # hg_core_progress_na(): Could not make progress on NA (NA_FAULT)

08/28-16:29:42.684161 delphi-002 DAOS[279252/279298] hg   ERR  # HG -- Error -- /delphi/common/daos/build/external/dev/mercury/src/mercury_core.c:2926

 # hg_core_progress(): hg_core_progress_na() failed

08/28-16:29:42.684168 delphi-002 DAOS[279252/279298] hg   ERR  # HG -- Error -- /delphi/common/daos/build/external/dev/mercury/src/mercury_core.c:4317

 # HG_Core_progress(): Could not make progress

08/28-16:29:42.684178 delphi-002 DAOS[279252/279298] hg   ERR  # HG -- Error -- /delphi/common/daos/build/external/dev/mercury/src/mercury.c:1996

 # HG_Progress(): Could not make progress on context (HG_FAULT)

08/28-16:29:42.684185 delphi-002 DAOS[279252/279298] hg   ERR  src/cart/crt_hg.c:1234 crt_hg_progress() HG_Progress failed, hg_ret: 7.

08/28-16:29:42.684194 delphi-002 DAOS[279252/279298] rpc  ERR  src/cart/crt_context.c:1316 crt_progress() crt_hg_progress failed, rc: -1020.

08/28-16:29:42.684201 delphi-002 DAOS[279252/279298] server ERR  src/iosrv/srv.c:565 dss_srv_handler() failed to progress CART context: -1020

08/28-16:30:42.684033 delphi-002 DAOS[279252/279298] rpc  ERR  src/cart/crt_context.c:790 crt_context_timeout_check(0x7fcd8d7a3870) [opc=0x1010007 rpcid=0x6608781e00000134 rank:tag=1:0] ctx_id 0, (status: 0x38) timed out, tgt rank 1, tag 0


And on rank 1:

08/28-16:07:42.807417 delphi-002 DAOS[279251/279299] rpc  ERR  src/cart/crt_context.c:790 crt_context_timeout_check(0x7fc6f820aba0) [opc=0xfe000000 rpcid=0x642008fe00000128 rank:tag=0:0] ctx_id 0, (status: 0x38) timed out, tgt rank 0, tag 0

08/28-16:07:42.807443 delphi-002 DAOS[279251/279299] rpc  ERR  src/cart/crt_context.c:748 crt_req_timeout_hdlr(0x7fc6f820aba0) [opc=0xfe000000 rpcid=0x642008fe00000128 rank:tag=0:0] aborting to group daos_server, rank 0, tgt_uri (null)

08/28-16:07:45.208410 delphi-002 DAOS[279251/279299] rpc  ERR  src/cart/crt_context.c:790 crt_context_timeout_check(0x7fc6f820b3f0) [opc=0xfe000000 rpcid=0x642008fe00000129 rank:tag=0:0] ctx_id 0, (status: 0x38) timed out, tgt rank 0, tag 0

08/28-16:07:45.208419 delphi-002 DAOS[279251/279299] rpc  ERR  src/cart/crt_context.c:748 crt_req_timeout_hdlr(0x7fc6f820b3f0) [opc=0xfe000000 rpcid=0x642008fe00000129 rank:tag=0:0] aborting to group daos_server, rank 0, tgt_uri (null)

08/28-16:07:47.609419 delphi-002 DAOS[279251/279299] rpc  ERR  src/cart/crt_context.c:790 crt_context_timeout_check(0x7fc6f820be90) [opc=0xfe000000 rpcid=0x642008fe0000012a rank:tag=0:0] ctx_id 0, (status: 0x38) timed out, tgt rank 0, tag 0

08/28-16:07:47.609428 delphi-002 DAOS[279251/279299] rpc  ERR  src/cart/crt_context.c:748 crt_req_timeout_hdlr(0x7fc6f820be90) [opc=0xfe000000 rpcid=0x642008fe0000012a rank:tag=0:0] aborting to group daos_server, rank 0, tgt_uri (null)

08/28-16:07:49.811412 delphi-002 DAOS[279251/279299] swim ERR  src/cart/swim/swim.c:802 swim_progress() SWIM shutdown

08/28-16:07:50.10411 delphi-002 DAOS[279251/279299] rpc  ERR  src/cart/crt_context.c:790 crt_context_timeout_check(0x7fc6f820c930) [opc=0xfe000000 rpcid=0x642008fe0000012b rank:tag=0:0] ctx_id 0, (status: 0x38) timed out, tgt rank 0, tag 0

08/28-16:07:50.10419 delphi-002 DAOS[279251/279299] rpc  ERR  src/cart/crt_context.c:748 crt_req_timeout_hdlr(0x7fc6f820c930) [opc=0xfe000000 rpcid=0x642008fe0000012b rank:tag=0:0] aborting to group daos_server, rank 0, tgt_uri (null)

08/28-16:08:14.96837 delphi-002 DAOS[279251/279299] hg   WARN # NA -- Warning -- /delphi/common/daos/build/external/dev/mercury/src/na/na_ofi.c:2575

 # na_ofi_cq_read(): fi_cq_readerr() got err: 5 (Input/output error), prov_errno: 12 (transport retry counter exceeded)

08/28-16:08:14.96853 delphi-002 DAOS[279251/279299] hg   ERR  src/cart/crt_hg.c:1031 crt_hg_req_send_cb(0x7fc6f820b3f0) [opc=0xfe000000 rpcid=0x642008fe00000129 rank:tag=0:0] RPC failed; rc: -1011

08/28-16:08:14.96867 delphi-002 DAOS[279251/279299] hg   ERR  src/cart/crt_hg.c:1031 crt_hg_req_send_cb(0x7fc6f820be90) [opc=0xfe000000 rpcid=0x642008fe0000012a rank:tag=0:0] RPC failed; rc: -1011

08/28-16:08:14.96874 delphi-002 DAOS[279251/279299] hg   ERR  src/cart/crt_hg.c:1031 crt_hg_req_send_cb(0x7fc6f820c930) [opc=0xfe000000 rpcid=0x642008fe0000012b rank:tag=0:0] RPC failed; rc: -1011

 

-Patrick

---------------------------------------------------------------------
Intel Corporation SAS (French simplified joint stock company)
Registered headquarters: "Les Montalets"- 2, rue de Paris,
92196 Meudon Cedex, France
Registration Number:  302 456 199 R.C.S. NANTERRE
Capital: 4,572,000 Euros

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


Re: DAOS/OFI & MOFED Support

Lombardi, Johann
 

Hi Patrick,

 

We are using MOFED 5.0.2 on Frontera and I don’t think we have ever tested with 5.1. Were you able to figure it out?

Cheers,

Johann

 

From: <daos@daos.groups.io> on behalf of "Farrell, Patrick Arthur" <patrick.farrell@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Friday 28 August 2020 at 23:34
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: [daos] DAOS/OFI & MOFED Support

 

Good afternoon,

 

I am curious if anyone has tried DAOS with MLNX_OFED_LINUX-5.1-0.6.6.0 - The current latest version of MOFED 5.1.

 

I did, and I'm getting mercury errors related to CQs...

 

So, before dumping the errors:
Should this work?  Is it supported to run DAOS with MOFED 5.1?

 

Thanks - error dump follows:

 

For example, on rank0 when trying to create a pool on ranks 0 and 1:

08/28-16:29:42.684123 delphi-002 DAOS[279252/279298] hg   ERR  # NA -- Error -- /delphi/common/daos/build/external/dev/mercury/src/na/na_ofi.c:2555

 # na_ofi_cq_read(): Operation ID was not canceled

08/28-16:29:42.684134 delphi-002 DAOS[279252/279298] hg   ERR  # NA -- Error -- /delphi/common/daos/build/external/dev/mercury/src/na/na_ofi.c:4585

 # na_ofi_progress(): Could not read events from context CQ

08/28-16:29:42.684154 delphi-002 DAOS[279252/279298] hg   ERR  # HG -- Error -- /delphi/common/daos/build/external/dev/mercury/src/mercury_core.c:2758

 # hg_core_progress_na(): Could not make progress on NA (NA_FAULT)

08/28-16:29:42.684161 delphi-002 DAOS[279252/279298] hg   ERR  # HG -- Error -- /delphi/common/daos/build/external/dev/mercury/src/mercury_core.c:2926

 # hg_core_progress(): hg_core_progress_na() failed

08/28-16:29:42.684168 delphi-002 DAOS[279252/279298] hg   ERR  # HG -- Error -- /delphi/common/daos/build/external/dev/mercury/src/mercury_core.c:4317

 # HG_Core_progress(): Could not make progress

08/28-16:29:42.684178 delphi-002 DAOS[279252/279298] hg   ERR  # HG -- Error -- /delphi/common/daos/build/external/dev/mercury/src/mercury.c:1996

 # HG_Progress(): Could not make progress on context (HG_FAULT)

08/28-16:29:42.684185 delphi-002 DAOS[279252/279298] hg   ERR  src/cart/crt_hg.c:1234 crt_hg_progress() HG_Progress failed, hg_ret: 7.

08/28-16:29:42.684194 delphi-002 DAOS[279252/279298] rpc  ERR  src/cart/crt_context.c:1316 crt_progress() crt_hg_progress failed, rc: -1020.

08/28-16:29:42.684201 delphi-002 DAOS[279252/279298] server ERR  src/iosrv/srv.c:565 dss_srv_handler() failed to progress CART context: -1020

08/28-16:30:42.684033 delphi-002 DAOS[279252/279298] rpc  ERR  src/cart/crt_context.c:790 crt_context_timeout_check(0x7fcd8d7a3870) [opc=0x1010007 rpcid=0x6608781e00000134 rank:tag=1:0] ctx_id 0, (status: 0x38) timed out, tgt rank 1, tag 0


And on rank 1:

08/28-16:07:42.807417 delphi-002 DAOS[279251/279299] rpc  ERR  src/cart/crt_context.c:790 crt_context_timeout_check(0x7fc6f820aba0) [opc=0xfe000000 rpcid=0x642008fe00000128 rank:tag=0:0] ctx_id 0, (status: 0x38) timed out, tgt rank 0, tag 0

08/28-16:07:42.807443 delphi-002 DAOS[279251/279299] rpc  ERR  src/cart/crt_context.c:748 crt_req_timeout_hdlr(0x7fc6f820aba0) [opc=0xfe000000 rpcid=0x642008fe00000128 rank:tag=0:0] aborting to group daos_server, rank 0, tgt_uri (null)

08/28-16:07:45.208410 delphi-002 DAOS[279251/279299] rpc  ERR  src/cart/crt_context.c:790 crt_context_timeout_check(0x7fc6f820b3f0) [opc=0xfe000000 rpcid=0x642008fe00000129 rank:tag=0:0] ctx_id 0, (status: 0x38) timed out, tgt rank 0, tag 0

08/28-16:07:45.208419 delphi-002 DAOS[279251/279299] rpc  ERR  src/cart/crt_context.c:748 crt_req_timeout_hdlr(0x7fc6f820b3f0) [opc=0xfe000000 rpcid=0x642008fe00000129 rank:tag=0:0] aborting to group daos_server, rank 0, tgt_uri (null)

08/28-16:07:47.609419 delphi-002 DAOS[279251/279299] rpc  ERR  src/cart/crt_context.c:790 crt_context_timeout_check(0x7fc6f820be90) [opc=0xfe000000 rpcid=0x642008fe0000012a rank:tag=0:0] ctx_id 0, (status: 0x38) timed out, tgt rank 0, tag 0

08/28-16:07:47.609428 delphi-002 DAOS[279251/279299] rpc  ERR  src/cart/crt_context.c:748 crt_req_timeout_hdlr(0x7fc6f820be90) [opc=0xfe000000 rpcid=0x642008fe0000012a rank:tag=0:0] aborting to group daos_server, rank 0, tgt_uri (null)

08/28-16:07:49.811412 delphi-002 DAOS[279251/279299] swim ERR  src/cart/swim/swim.c:802 swim_progress() SWIM shutdown

08/28-16:07:50.10411 delphi-002 DAOS[279251/279299] rpc  ERR  src/cart/crt_context.c:790 crt_context_timeout_check(0x7fc6f820c930) [opc=0xfe000000 rpcid=0x642008fe0000012b rank:tag=0:0] ctx_id 0, (status: 0x38) timed out, tgt rank 0, tag 0

08/28-16:07:50.10419 delphi-002 DAOS[279251/279299] rpc  ERR  src/cart/crt_context.c:748 crt_req_timeout_hdlr(0x7fc6f820c930) [opc=0xfe000000 rpcid=0x642008fe0000012b rank:tag=0:0] aborting to group daos_server, rank 0, tgt_uri (null)

08/28-16:08:14.96837 delphi-002 DAOS[279251/279299] hg   WARN # NA -- Warning -- /delphi/common/daos/build/external/dev/mercury/src/na/na_ofi.c:2575

 # na_ofi_cq_read(): fi_cq_readerr() got err: 5 (Input/output error), prov_errno: 12 (transport retry counter exceeded)

08/28-16:08:14.96853 delphi-002 DAOS[279251/279299] hg   ERR  src/cart/crt_hg.c:1031 crt_hg_req_send_cb(0x7fc6f820b3f0) [opc=0xfe000000 rpcid=0x642008fe00000129 rank:tag=0:0] RPC failed; rc: -1011

08/28-16:08:14.96867 delphi-002 DAOS[279251/279299] hg   ERR  src/cart/crt_hg.c:1031 crt_hg_req_send_cb(0x7fc6f820be90) [opc=0xfe000000 rpcid=0x642008fe0000012a rank:tag=0:0] RPC failed; rc: -1011

08/28-16:08:14.96874 delphi-002 DAOS[279251/279299] hg   ERR  src/cart/crt_hg.c:1031 crt_hg_req_send_cb(0x7fc6f820c930) [opc=0xfe000000 rpcid=0x642008fe0000012b rank:tag=0:0] RPC failed; rc: -1011

 

-Patrick

---------------------------------------------------------------------
Intel Corporation SAS (French simplified joint stock company)
Registered headquarters: "Les Montalets"- 2, rue de Paris,
92196 Meudon Cedex, France
Registration Number:  302 456 199 R.C.S. NANTERRE
Capital: 4,572,000 Euros

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


DAOS Distributed Transaction

Changwoo Min
 

Hi DAOS community!

I'm Changwoo Min, a professor at Virginia Tech. My group does research on persistent memory and storage systems. I found that DAOS is an exciting and cool project!

In particular, it is interesting to me that DAOS supports distributed transactions. I am wondering what the typical/intended use cases and applications of the distributed transaction are. Especially considering, as far as I know, DAOS will be deployed to HPC systems. I wonder if transactions can benefit any HPC/AI/ML/analytics applications. Any comments will be helpful.

Regards,
Changwoo Min


[DUG'20] Save the date & call for presentations!

Lombardi, Johann
 

Hi there,

 

As every year since 2017, we would like to hold the 4th annual DAOS User Group (DUG) around SC’20. Due to the pandemic situation, the DUG will obviously be virtual this year with live presentations on Nov 19. We purposely picked a date after the SC Tutorials/Workshops/BoFs to minimize conflicts, but please don’t hesitate to let us know (on this mailing list, on the slack channel or privately) if you are aware of any major conflict(s) that day. The time hasn’t been finalized yet, but we are shooting for a 3h-ish slot in the morning for America, late afternoon for EMEA and late evening for APAC (sorry about that) to maximize participations. Details are yet to be finalized and will be shared on the mailing list once ready.

 

As previous years, we would like to invite community members to submit presentation proposals (i.e. title + short summary) to daos-info@daos.groups.io

We encourage any feedback and would like to hear from you on your experience with DAOS, future plans, what you have contributed or intend to contribute, what worked … and did not work so well. We are looking forward to your submissions!

 

Take care.

Johann – on behalf of the Intel DAOS Team

---------------------------------------------------------------------
Intel Corporation SAS (French simplified joint stock company)
Registered headquarters: "Les Montalets"- 2, rue de Paris,
92196 Meudon Cedex, France
Registration Number:  302 456 199 R.C.S. NANTERRE
Capital: 4,572,000 Euros

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


DAOS/OFI & MOFED Support

Farrell, Patrick Arthur <patrick.farrell@...>
 

Good afternoon,

I am curious if anyone has tried DAOS with MLNX_OFED_LINUX-5.1-0.6.6.0 - The current latest version of MOFED 5.1.

I did, and I'm getting mercury errors related to CQs...

So, before dumping the errors:
Should this work?  Is it supported to run DAOS with MOFED 5.1?

Thanks - error dump follows:

For example, on rank0 when trying to create a pool on ranks 0 and 1:
08/28-16:29:42.684123 delphi-002 DAOS[279252/279298] hg   ERR  # NA -- Error -- /delphi/common/daos/build/external/dev/mercury/src/na/na_ofi.c:2555
 # na_ofi_cq_read(): Operation ID was not canceled
08/28-16:29:42.684134 delphi-002 DAOS[279252/279298] hg   ERR  # NA -- Error -- /delphi/common/daos/build/external/dev/mercury/src/na/na_ofi.c:4585
 # na_ofi_progress(): Could not read events from context CQ
08/28-16:29:42.684154 delphi-002 DAOS[279252/279298] hg   ERR  # HG -- Error -- /delphi/common/daos/build/external/dev/mercury/src/mercury_core.c:2758
 # hg_core_progress_na(): Could not make progress on NA (NA_FAULT)
08/28-16:29:42.684161 delphi-002 DAOS[279252/279298] hg   ERR  # HG -- Error -- /delphi/common/daos/build/external/dev/mercury/src/mercury_core.c:2926
 # hg_core_progress(): hg_core_progress_na() failed
08/28-16:29:42.684168 delphi-002 DAOS[279252/279298] hg   ERR  # HG -- Error -- /delphi/common/daos/build/external/dev/mercury/src/mercury_core.c:4317
 # HG_Core_progress(): Could not make progress
08/28-16:29:42.684178 delphi-002 DAOS[279252/279298] hg   ERR  # HG -- Error -- /delphi/common/daos/build/external/dev/mercury/src/mercury.c:1996
 # HG_Progress(): Could not make progress on context (HG_FAULT)
08/28-16:29:42.684185 delphi-002 DAOS[279252/279298] hg   ERR  src/cart/crt_hg.c:1234 crt_hg_progress() HG_Progress failed, hg_ret: 7.
08/28-16:29:42.684194 delphi-002 DAOS[279252/279298] rpc  ERR  src/cart/crt_context.c:1316 crt_progress() crt_hg_progress failed, rc: -1020.
08/28-16:29:42.684201 delphi-002 DAOS[279252/279298] server ERR  src/iosrv/srv.c:565 dss_srv_handler() failed to progress CART context: -1020
08/28-16:30:42.684033 delphi-002 DAOS[279252/279298] rpc  ERR  src/cart/crt_context.c:790 crt_context_timeout_check(0x7fcd8d7a3870) [opc=0x1010007 rpcid=0x6608781e00000134 rank:tag=1:0] ctx_id 0, (status: 0x38) timed out, tgt rank 1, tag 0

And on rank 1:
08/28-16:07:42.807417 delphi-002 DAOS[279251/279299] rpc  ERR  src/cart/crt_context.c:790 crt_context_timeout_check(0x7fc6f820aba0) [opc=0xfe000000 rpcid=0x642008fe00000128 rank:tag=0:0] ctx_id 0, (status: 0x38) timed out, tgt rank 0, tag 0
08/28-16:07:42.807443 delphi-002 DAOS[279251/279299] rpc  ERR  src/cart/crt_context.c:748 crt_req_timeout_hdlr(0x7fc6f820aba0) [opc=0xfe000000 rpcid=0x642008fe00000128 rank:tag=0:0] aborting to group daos_server, rank 0, tgt_uri (null)
08/28-16:07:45.208410 delphi-002 DAOS[279251/279299] rpc  ERR  src/cart/crt_context.c:790 crt_context_timeout_check(0x7fc6f820b3f0) [opc=0xfe000000 rpcid=0x642008fe00000129 rank:tag=0:0] ctx_id 0, (status: 0x38) timed out, tgt rank 0, tag 0
08/28-16:07:45.208419 delphi-002 DAOS[279251/279299] rpc  ERR  src/cart/crt_context.c:748 crt_req_timeout_hdlr(0x7fc6f820b3f0) [opc=0xfe000000 rpcid=0x642008fe00000129 rank:tag=0:0] aborting to group daos_server, rank 0, tgt_uri (null)
08/28-16:07:47.609419 delphi-002 DAOS[279251/279299] rpc  ERR  src/cart/crt_context.c:790 crt_context_timeout_check(0x7fc6f820be90) [opc=0xfe000000 rpcid=0x642008fe0000012a rank:tag=0:0] ctx_id 0, (status: 0x38) timed out, tgt rank 0, tag 0
08/28-16:07:47.609428 delphi-002 DAOS[279251/279299] rpc  ERR  src/cart/crt_context.c:748 crt_req_timeout_hdlr(0x7fc6f820be90) [opc=0xfe000000 rpcid=0x642008fe0000012a rank:tag=0:0] aborting to group daos_server, rank 0, tgt_uri (null)
08/28-16:07:49.811412 delphi-002 DAOS[279251/279299] swim ERR  src/cart/swim/swim.c:802 swim_progress() SWIM shutdown
08/28-16:07:50.10411 delphi-002 DAOS[279251/279299] rpc  ERR  src/cart/crt_context.c:790 crt_context_timeout_check(0x7fc6f820c930) [opc=0xfe000000 rpcid=0x642008fe0000012b rank:tag=0:0] ctx_id 0, (status: 0x38) timed out, tgt rank 0, tag 0
08/28-16:07:50.10419 delphi-002 DAOS[279251/279299] rpc  ERR  src/cart/crt_context.c:748 crt_req_timeout_hdlr(0x7fc6f820c930) [opc=0xfe000000 rpcid=0x642008fe0000012b rank:tag=0:0] aborting to group daos_server, rank 0, tgt_uri (null)
08/28-16:08:14.96837 delphi-002 DAOS[279251/279299] hg   WARN # NA -- Warning -- /delphi/common/daos/build/external/dev/mercury/src/na/na_ofi.c:2575
 # na_ofi_cq_read(): fi_cq_readerr() got err: 5 (Input/output error), prov_errno: 12 (transport retry counter exceeded)
08/28-16:08:14.96853 delphi-002 DAOS[279251/279299] hg   ERR  src/cart/crt_hg.c:1031 crt_hg_req_send_cb(0x7fc6f820b3f0) [opc=0xfe000000 rpcid=0x642008fe00000129 rank:tag=0:0] RPC failed; rc: -1011
08/28-16:08:14.96867 delphi-002 DAOS[279251/279299] hg   ERR  src/cart/crt_hg.c:1031 crt_hg_req_send_cb(0x7fc6f820be90) [opc=0xfe000000 rpcid=0x642008fe0000012a rank:tag=0:0] RPC failed; rc: -1011
08/28-16:08:14.96874 delphi-002 DAOS[279251/279299] hg   ERR  src/cart/crt_hg.c:1031 crt_hg_req_send_cb(0x7fc6f820c930) [opc=0xfe000000 rpcid=0x642008fe0000012b rank:tag=0:0] RPC failed; rc: -1011

-Patrick


Re: DAOS in Docker

Lombardi, Johann
 

Hi,

 

Just to confirm, you are running docker on Linux, right?

Could you please try to run the SPDK init script manually and send me the output?

Cheers,

Johann

 

 

From: <daos@daos.groups.io> on behalf of "helloworld@..." <helloworld@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Monday 24 August 2020 at 03:30
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: Re: [daos] DAOS in Docker

 

Johann, Thank you for replying it

Of course I already loaded the uio_pci_generic kernel module.

Now, Im using only SCM based on RAM emulation, not using NVMe SSD emulation
So then it works now

However I'd like to use NVMe SSD emulation based on RAM...
How can I fix it?

---------------------------------------------------------------------
Intel Corporation SAS (French simplified joint stock company)
Registered headquarters: "Les Montalets"- 2, rue de Paris,
92196 Meudon Cedex, France
Registration Number:  302 456 199 R.C.S. NANTERRE
Capital: 4,572,000 Euros

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


Re: Behavior of daos_kv_get for non-existent Keys

Steffen Christgau
 

On 8/25/20 3:19 PM, Chaarawi, Mohamad wrote:
We have recently added conditional operations to the DAOS object and KV api to allow for such conditional operations:
DAOS_COND_KEY_INSERT/UPDATE/FETCH/PUNCH (for the daos_kv_* API)
Which would give you what you need.
Thanks for pointing that out, Mohamad.

However for the KV API, I actually see an issue where these flags are not properly set.
I will push a patch to fix this soon and let you know.
Great. Looking forward for a notification.

Just to be sure: Given that the API would work correctly, these conditional operations are passed with the flags parameter which are marked as "currently ignored"?!

Regards, Steffen


Re: Behavior of daos_kv_get for non-existent Keys

Chaarawi, Mohamad
 

Hi Steffen,

We have recently added conditional operations to the DAOS object and KV api to allow for such conditional operations:
DAOS_COND_KEY_INSERT/UPDATE/FETCH/PUNCH (for the daos_kv_* API)
Which would give you what you need.

However for the KV API, I actually see an issue where these flags are not properly set.
I will push a patch to fix this soon and let you know.

Thanks,
Mohamad

On 8/24/20, 7:59 AM, "daos@daos.groups.io on behalf of Steffen Christgau" <daos@daos.groups.io on behalf of christgau@...> wrote:

Hi everybody,

I'm experimenting with the (low level) DAOS Key Value API, i.e.
daos_kv_get and friends. For the get function, I observed that passing
an non-existent key returns both 0, indicating success, as well as an
"actual size of the value" of again 0.

However, it is also valid to put a key with a zero length value into the
KV store. That key is subsequently found when enumerating the names
inside the object (daos_kv_list).

Is this behavior of the get operation, i.e. returning success and an
empty (value), intended? If so, how can I check if a queried key really
existed other than by enumerating the (whole) object?

Regards, Steffen


Re: DAOS & HDF5

Steffen Christgau
 

Hi Patrick, hi everybody,

On 8/25/20 2:39 PM, Farrell, Patrick Arthur wrote:
I'm aware there's an HDF5 plugin for DAOS, but I am not certain about the current status of the plugin,
I'm interested in that information as well. And moreover: What about the support for netCDF?

or where to find it.
https://bitbucket.hdfgroup.org/projects/HDF5VOL/repos/daos-vol/browse

I'm currently working on that matter but I'm struggling with the compilation process to get the tests compiled successfully.

Just as a side note: The Bitbucket's HEAD is not working with DAOS 1.0.1 due to some API changes in DAOS, but commit 34f3d46 appears to do. At least it compiles (without tests).

Steffen


DAOS & HDF5

Farrell, Patrick Arthur <patrick.farrell@...>
 

Good morning,

I'm aware there's an HDF5 plugin for DAOS, but I am not certain about the current status of the plugin, or where to find it.

Is there current info on this or can someone provide a pointer?

Thanks much.
-Patrick


Behavior of daos_kv_get for non-existent Keys

Steffen Christgau
 

Hi everybody,

I'm experimenting with the (low level) DAOS Key Value API, i.e. daos_kv_get and friends. For the get function, I observed that passing an non-existent key returns both 0, indicating success, as well as an "actual size of the value" of again 0.

However, it is also valid to put a key with a zero length value into the KV store. That key is subsequently found when enumerating the names inside the object (daos_kv_list).

Is this behavior of the get operation, i.e. returning success and an empty (value), intended? If so, how can I check if a queried key really existed other than by enumerating the (whole) object?

Regards, Steffen


Re: DAOS in Docker

helloworld@...
 

Johann, Thank you for replying it

Of course I already loaded the uio_pci_generic kernel module.

Now, Im using only SCM based on RAM emulation, not using NVMe SSD emulation
So then it works now

However I'd like to use NVMe SSD emulation based on RAM...
How can I fix it?


Slack community channel

Lombardi, Johann
 

Hi there,

 

I got several requests recently to migrate the community chat from Gitter to Slack. I have thus created a daos-stack workspace on slack and also enabled the integration with groups.io. Any subscribers to the DAOS community mailing should thus automatically receive an invite to join the slack channel. Let me know if you have any problems/concerns.

 

Cheers,

Johann

---------------------------------------------------------------------
Intel Corporation SAS (French simplified joint stock company)
Registered headquarters: "Les Montalets"- 2, rue de Paris,
92196 Meudon Cedex, France
Registration Number:  302 456 199 R.C.S. NANTERRE
Capital: 4,572,000 Euros

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


Re: DAOS in Docker

Lombardi, Johann
 

Hi there,

 

Did you load the uio_pci_generic module in the kernel as specific in the note?

 

Cheers,

Johann

 

From: <daos@daos.groups.io> on behalf of "helloworld@..." <helloworld@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Wednesday 12 August 2020 at 13:51
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: [daos] DAOS in Docker

 

I'm configuring the DAOS in Docker with only-RAM emulation
When testing DAOS server, I met the /usr/bin/daos_io_server errors
How can I fix it ??

In details, "ERR  src/bio/bio_xstream.c:224 bio_spdk_env_init() Failed to init SPDK thread lib, DER_INVAL(-1003)"

I use the ~/daos/utils/config/examples/daos_server_local.yaml" as configuration file.

and 

scm_mount: /mnt/daos

scm_class: ram             

scm_size: 4                

                           

bdev_class: file           

bdev_size: 16              

bdev_list: [/tmp/daos-bdev]

---------------------------------------------------------------------
Intel Corporation SAS (French simplified joint stock company)
Registered headquarters: "Les Montalets"- 2, rue de Paris,
92196 Meudon Cedex, France
Registration Number:  302 456 199 R.C.S. NANTERRE
Capital: 4,572,000 Euros

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


Avocado's upcoming LTS release

Cleber Rosa
 

Hi DAOS community,

Given that some of the DAOS testing[1] uses the Avocado testing
framework, i'd like to bring to your attention that we have an
upcoming 82.0 LTS release scheduled for Sept 7th[2].

For that release, we'd like to keep as much compatibility as possible,
and when not possible, allow for a smoother migration. 69.x LTS will
be maintained for another 6 months, after the release 82.0 LTS
release, but the sooner any issue is addressed, the better.

For that, we have an epic issue[3] in which we could use your help,
with:
* running the existing tests you have, with the most recent Avocado
version possible
* opening any issues[4] you encounter

This will feed into either bug fixes, or documentation on how to
migrate from 69.x LTS to 82.0 LTS.

In addition to this this, feel free to engage with us about how the
new Avocado features (and there's a lot of them) may be beneficial to
the Falco project.

Thanks!
- Cleber

--

[1] - https://github.com/daos-stack/daos/blob/master/src/tests/ftest/launch.py#L749
[2] - https://github.com/avocado-framework/avocado/milestone/8
[3] - https://github.com/avocado-framework/avocado/issues/4103
[4] - https://github.com/avocado-framework/avocado/issues/new/choose

521 - 540 of 1664