Topics

no dRPC client set on Ubuntu 20.04.1


Gert Pauwels
 

Today on my Ubuntu 20.04.1 system I had the following error when running:
$ daos_server start
ERROR: removing socket file: removing instance 0 socket file: no dRPC client set (data plane not started?)

I tried to 'solve' the problem by reinstalling Ubuntu 20.04.1 from scratch and compiled the master branch of DAOS.
Problem was still there.

The first error in /tmp/daos_server.log log points a bit to SPDK:
bio  ERR  src/bio/bio_xstream.c:367 bio_spdk_env_init() Failed to init SPDK thread lib, DER_INVAL(-1003)

I did an attempt to check if SPDK runs fine without using DAOS by calling the identify application in the spdk example directory
root@intel-S2600WFD:~/daos/build/external/dev/spdk/examples/nvme/identify# ./identify 
./identify: error while loading shared libraries: libspdk_sock_posix.so.2.0: cannot open shared object file: No such file or directory

Any suggestion on how to find what is happening.

Thanks in advance,

Gert,


Nabarro, Tom
 

Hello Gert,

 

./identify error can be fixed by pre-fixing command with "LD_LIBRARY_PATH=/path/to/spdk/libs" which in your case is probably /root/daos/install/prereq/dev/spdk/lib/libspdk_sock_posix.so.2.0 .

 

Please post the results of identify and we can go from there.

 

Regards,

Tom Nabarro – DCG/ESAD

M: +44 (0)7786 260986

Skype: tom.nabarro

 

From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of Gert Pauwels (intel)
Sent: Thursday, September 10, 2020 7:13 PM
To: daos@daos.groups.io
Subject: [daos] no dRPC client set on Ubuntu 20.04.1

 

Today on my Ubuntu 20.04.1 system I had the following error when running:

$ daos_server start

ERROR: removing socket file: removing instance 0 socket file: no dRPC client set (data plane not started?)

I tried to 'solve' the problem by reinstalling Ubuntu 20.04.1 from scratch and compiled the master branch of DAOS.
Problem was still there.

The first error in /tmp/daos_server.log log points a bit to SPDK:
bio  ERR  src/bio/bio_xstream.c:367 bio_spdk_env_init() Failed to init SPDK thread lib, DER_INVAL(-1003)

I did an attempt to check if SPDK runs fine without using DAOS by calling the identify application in the spdk example directory

root@intel-S2600WFD:~/daos/build/external/dev/spdk/examples/nvme/identify# ./identify 

./identify: error while loading shared libraries: libspdk_sock_posix.so.2.0: cannot open shared object file: No such file or directory


Any suggestion on how to find what is happening.

Thanks in advance,

Gert,

---------------------------------------------------------------------
Intel Corporation (UK) Limited
Registered No. 1134945 (England)
Registered Office: Pipers Way, Swindon SN3 1RJ
VAT No: 860 2173 47

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


Yunjae Lee
 

Same problem here, I used the master branch of daos on ubuntu 20.04.1
I also met this error when starting the daos_server:
ERROR: removing socket file: removing instance 0 socket file: no dRPC client set (data plane not started?)
The same message was printed in /tmp/daos_server.log:
bio ERR src/bio/bio_xstream.c:367 bios_spdk_env_init() Failed to init SPDK thread lib, DER_INVAL(-1003)
The identify error also happened in my case, however, it is now solved thanks to @Tom.
The results of the identify is attached, as well as the server logs and configurations i used.


Gert Pauwels
 

Hi Tom,

I set the variable LD_LIBRARY_PATH as you indicated to spdk library path and that indeed solves the problem for the identify application. Now the identify application returns all the data as expected.

It did not make any difference  when running:
$ echo $LD_LIBRARY_PATH
/root/daos/install/prereq/dev/spdk/lib/
$ daos_server start
It still the same  error

Rgds,

Gert


Nabarro, Tom
 

Hello,

 

You and Gurt are experiencing the same issue:

 

09/10-14:23:55.96 intel-S2600WFD DAOS[260088/260088] bio  ERR  src/bio/bio_xstream.c:367 bio_spdk_env_init() Failed to init SPDK thread lib, DER_INVAL(-1003)

09/10-14:23:55.97 intel-S2600WFD DAOS[260088/260088] server ERR  src/iosrv/init.c:521 server_init() DAOS cannot be initialized using the configured path (/mnt/daos).   Please ensure it is on a PMDK compatible file system and writeable by the current user

 

Which is that the SPDK thread library cannot be initialised, the thread library is not used in the identify example so that would explain why it runs fine.

 

spdk_thread_lib_init() is returning an error but unfortunately we wrap it with DER_INVAL and mask the spdk error, which needs fixing.

 

Can you please apply this patch and rerun to give us a better insight and in the meantime I will find an Ubuntu system to verify:

 

```

diff --git a/src/bio/bio_xstream.c b/src/bio/bio_xstream.c

index 758ba6d..c888047 100644

--- a/src/bio/bio_xstream.c

+++ b/src/bio/bio_xstream.c

@@ -30,6 +30,7 @@

#include <spdk/env.h>

#include <spdk/nvme.h>

#include <spdk/vmd.h>

+#include <spdk/string.h>

#include <spdk/thread.h>

#include <spdk/bdev.h>

#include <spdk/io_channel.h>

@@ -364,7 +365,8 @@ bio_spdk_env_init(void)

        rc = spdk_thread_lib_init(NULL, 0);

        if (rc != 0) {

                rc = -DER_INVAL;

-               D_ERROR("Failed to init SPDK thread lib, "DF_RC"\n", DP_RC(rc));

+               D_ERROR("Failed to init SPDK thread lib, %s (%d)\n",

+                       spdk_strerror(rc), rc);

                spdk_env_fini();

                return rc;

        }

```

 

Regards,

Tom Nabarro – DCG/ESAD

M: +44 (0)7786 260986

Skype: tom.nabarro

 

From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of Yunjae Lee
Sent: Friday, September 11, 2020 8:34 AM
To: daos@daos.groups.io
Subject: Re: [daos] no dRPC client set on Ubuntu 20.04.1

 

Same problem here, I used the master branch of daos on ubuntu 20.04.1
I also met this error when starting the daos_server:

ERROR: removing socket file: removing instance 0 socket file: no dRPC client set (data plane not started?)

The same message was printed in /tmp/daos_server.log:

bio ERR src/bio/bio_xstream.c:367 bios_spdk_env_init() Failed to init SPDK thread lib, DER_INVAL(-1003)

The identify error also happened in my case, however, it is now solved thanks to @Tom.
The results of the identify is attached, as well as the server logs and configurations i used.

---------------------------------------------------------------------
Intel Corporation (UK) Limited
Registered No. 1134945 (England)
Registered Office: Pipers Way, Swindon SN3 1RJ
VAT No: 860 2173 47

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


Nabarro, Tom
 

Thanks Gurt,

 

Yes, that was only to get the identify application running, which looks like it runs fine as expected given the problem looks like initializing the thread library which identify doesn’t do.

 

Regards,

Tom Nabarro – DCG/ESAD

M: +44 (0)7786 260986

Skype: tom.nabarro

 

From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of Gert Pauwels
Sent: Friday, September 11, 2020 1:41 PM
To: daos@daos.groups.io
Subject: Re: [daos] no dRPC client set on Ubuntu 20.04.1

 

Hi Tom,

I set the variable LD_LIBRARY_PATH as you indicated to spdk library path and that indeed solves the problem for the identify application. Now the identify application returns all the data as expected.

It did not make any difference  when running:

$ echo $LD_LIBRARY_PATH
/root/daos/install/prereq/dev/spdk/lib/
$ daos_server start

It still the same  error

Rgds,

Gert

---------------------------------------------------------------------
Intel Corporation (UK) Limited
Registered No. 1134945 (England)
Registered Office: Pipers Way, Swindon SN3 1RJ
VAT No: 860 2173 47

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


Gert Pauwels
 

Hi Tom,

After applying the patch you provided the error is now:
    09/11-17:47:17.20 intel-S2600WFD DAOS[11713/11713] bio  ERR  src/bio/bio_xst   09/11-17 ream.c:369 bio_spdk_env_init() Failed to init SPDK thread lib, Unknown error -1003 (-1003)

the original one was:

   DAOS[7642/7642] bio  ERR  src/bio/bio_xstream.c:367 bio_spdk_env_init() Failed to init SPDK thread lib, DER_INVAL(-1003)


The whole /tmp/daos_server looks like:

09/11-17:39:03.32 intel-S2600WFD DAOS[7642/7642] fi   INFO src/gurt/fault_inject.c:481 d_fault_inject_init() No config file, fault injection is OFF.
09/11-17:39:03.35 intel-S2600WFD DAOS[7642/7642] server INFO src/iosrv/init.c:419 set_abt_max_num_xstreams() Setting ABT_MAX_NUM_XSTREAMS to 4
09/11-17:39:03.35 intel-S2600WFD DAOS[7642/7642] server INFO src/iosrv/init.c:490 server_init() Module interface successfully initialized
09/11-17:39:03.36 intel-S2600WFD DAOS[7642/7642] server INFO src/iosrv/init.c:498 server_init() Module vos,rdb,rsvc,security,mgmt,dtx,pool,cont,obj,rebuild successfully loaded
09/11-17:39:03.36 intel-S2600WFD DAOS[7642/7642] crt  INFO src/cart/crt_init.c:269 crt_init_opt() libcart version 4.8.0 initializing
09/11-17:39:03.36 intel-S2600WFD DAOS[7642/7642] crt  WARN src/cart/crt_init.c:161 data_init() FI_UNIVERSE_SIZE was not set; setting to 2048
09/11-17:39:03.36 intel-S2600WFD DAOS[7642/7642] server INFO src/iosrv/init.c:507 server_init() Network successfully initialized
09/11-17:39:03.37 intel-S2600WFD DAOS[7642/7642] server INFO src/iosrv/init.c:516 server_init() Module vos,rdb,rsvc,security,mgmt,dtx,pool,cont,obj,rebuild successfully initialized
09/11-17:39:03.44 intel-S2600WFD DAOS[7642/7642] bio  ERR  src/bio/bio_xstream.c:367 bio_spdk_env_init() Failed to init SPDK thread lib, DER_INVAL(-1003)
09/11-17:39:03.44 intel-S2600WFD DAOS[7642/7642] server ERR  src/iosrv/init.c:521 server_init() DAOS cannot be initialized using the configured path (/mnt/daos).   Please ensure it is on a PMDK compatible file system and writeable by the current user
09/11-17:47:17.09 intel-S2600WFD DAOS[11713/11713] fi   INFO src/gurt/fault_inject.c:481 d_fault_inject_init() No config file, fault injection is OFF.
09/11-17:47:17.13 intel-S2600WFD DAOS[11713/11713] server INFO src/iosrv/init.c:419 set_abt_max_num_xstreams() Setting ABT_MAX_NUM_XSTREAMS to 4
09/11-17:47:17.13 intel-S2600WFD DAOS[11713/11713] server INFO src/iosrv/init.c:490 server_init() Module interface successfully initialized
09/11-17:47:17.13 intel-S2600WFD DAOS[11713/11713] server INFO src/iosrv/init.c:498 server_init() Module vos,rdb,rsvc,security,mgmt,dtx,pool,cont,obj,rebuild successfully loaded
09/11-17:47:17.13 intel-S2600WFD DAOS[11713/11713] crt  INFO src/cart/crt_init.c:269 crt_init_opt() libcart version 4.8.0 initializing
09/11-17:47:17.13 intel-S2600WFD DAOS[11713/11713] crt  WARN src/cart/crt_init.c:161 data_init() FI_UNIVERSE_SIZE was not set; setting to 2048
09/11-17:47:17.13 intel-S2600WFD DAOS[11713/11713] server INFO src/iosrv/init.c:507 server_init() Network successfully initialized
09/11-17:47:17.13 intel-S2600WFD DAOS[11713/11713] server INFO src/iosrv/init.c:516 server_init() Module vos,rdb,rsvc,security,mgmt,dtx,pool,cont,obj,rebuild successfully initialized
09/11-17:47:17.20 intel-S2600WFD DAOS[11713/11713] bio  ERR  src/bio/bio_xstream.c:369 bio_spdk_env_init() Failed to init SPDK thread lib, Unknown error -1003 (-1003)
09/11-17:47:17.21 intel-S2600WFD DAOS[11713/11713] server ERR  src/iosrv/init.c:521 server_init() DAOS cannot be initialized using the configured path (/mnt/daos).   Please ensure it is on a PMDK compatible file system and writeable by the current user

Rgds,

Gert


Nabarro, Tom
 

Hello Gert,

 

It’s likely the thread lib init failure is due to a problem with hugepages configuration as spdk_thread_lib_init() only calls spdk_mempool_create().

 

Can you please do some debug as follows:

 

  • First , some basic verification of hp state e.g.

bash-4.2$ cat /proc/meminfo| grep Huge

AnonHugePages:     73728 kB

HugePages_Total:   16384

HugePages_Free:    16371

HugePages_Rsvd:        0

HugePages_Surp:        0

Hugepagesize:       2048 kB

bash-4.2$ ls -lah /dev/hugepages/spdk_pid217893map_*

-rw------- 1 root daos_admins 2.0M Sep 11 12:33 /dev/hugepages/spdk_pid217893map_0

-rw------- 1 root daos_admins 2.0M Sep 11 12:33 /dev/hugepages/spdk_pid217893map_1

-rw------- 1 root daos_admins 2.0M Sep 11 12:33 /dev/hugepages/spdk_pid217893map_10

-rw------- 1 root daos_admins 2.0M Sep 11 12:33 /dev/hugepages/spdk_pid217893map_11

-rw------- 1 root daos_admins 2.0M Sep 11 12:33 /dev/hugepages/spdk_pid217893map_12

-rw------- 1 root daos_admins 2.0M Sep 11 12:33 /dev/hugepages/spdk_pid217893map_2

...

 

  • Then , enable daos_admin/privileged helper logging by setting "helper_log_file: /tmp/daos_admin.log" with the correct indentation towards the top of daos_server yaml config file.
  • Stop daos_server instances/services.
  • Wipe pmem with "sudo umount /mnt/daos[01]; sudo wipefs -a /dev/pmem[01]".
  • Bind nvme devices back to kernel "sudo daos_server storage prepare -n --reset".
  • Optionally reboot (maybe try without first).
  • Start daos_server instances/services.
  • Wait for reformat prompt "SCM format required...".
  • Run "dmg storage format" from different tty.
  • Wait for io instances to start/error then send logs including daos_admin.log please.

 

I realise you don’t need that level of instruction but included for completeness.

 

Regards,

Tom Nabarro – DCG/ESAD

M: +44 (0)7786 260986

Skype: tom.nabarro

 

From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of Gert Pauwels
Sent: Friday, September 11, 2020 5:42 PM
To: daos@daos.groups.io
Subject: Re: [daos] no dRPC client set on Ubuntu 20.04.1

 

Hi Tom,

After applying the patch you provided the error is now:
    09/11-17:47:17.20 intel-S2600WFD DAOS[11713/11713] bio  ERR  src/bio/bio_xst   09/11-17 ream.c:369 bio_spdk_env_init() Failed to init SPDK thread lib, Unknown error -1003 (-1003)

the original one was:

   DAOS[7642/7642] bio  ERR  src/bio/bio_xstream.c:367 bio_spdk_env_init() Failed to init SPDK thread lib, DER_INVAL(-1003)


The whole /tmp/daos_server looks like:

09/11-17:39:03.32 intel-S2600WFD DAOS[7642/7642] fi   INFO src/gurt/fault_inject.c:481 d_fault_inject_init() No config file, fault injection is OFF.
09/11-17:39:03.35 intel-S2600WFD DAOS[7642/7642] server INFO src/iosrv/init.c:419 set_abt_max_num_xstreams() Setting ABT_MAX_NUM_XSTREAMS to 4
09/11-17:39:03.35 intel-S2600WFD DAOS[7642/7642] server INFO src/iosrv/init.c:490 server_init() Module interface successfully initialized
09/11-17:39:03.36 intel-S2600WFD DAOS[7642/7642] server INFO src/iosrv/init.c:498 server_init() Module vos,rdb,rsvc,security,mgmt,dtx,pool,cont,obj,rebuild successfully loaded
09/11-17:39:03.36 intel-S2600WFD DAOS[7642/7642] crt  INFO src/cart/crt_init.c:269 crt_init_opt() libcart version 4.8.0 initializing
09/11-17:39:03.36 intel-S2600WFD DAOS[7642/7642] crt  WARN src/cart/crt_init.c:161 data_init() FI_UNIVERSE_SIZE was not set; setting to 2048
09/11-17:39:03.36 intel-S2600WFD DAOS[7642/7642] server INFO src/iosrv/init.c:507 server_init() Network successfully initialized
09/11-17:39:03.37 intel-S2600WFD DAOS[7642/7642] server INFO src/iosrv/init.c:516 server_init() Module vos,rdb,rsvc,security,mgmt,dtx,pool,cont,obj,rebuild successfully initialized
09/11-17:39:03.44 intel-S2600WFD DAOS[7642/7642] bio  ERR  src/bio/bio_xstream.c:367 bio_spdk_env_init() Failed to init SPDK thread lib, DER_INVAL(-1003)
09/11-17:39:03.44 intel-S2600WFD DAOS[7642/7642] server ERR  src/iosrv/init.c:521 server_init() DAOS cannot be initialized using the configured path (/mnt/daos).   Please ensure it is on a PMDK compatible file system and writeable by the current user
09/11-17:47:17.09 intel-S2600WFD DAOS[11713/11713] fi   INFO src/gurt/fault_inject.c:481 d_fault_inject_init() No config file, fault injection is OFF.
09/11-17:47:17.13 intel-S2600WFD DAOS[11713/11713] server INFO src/iosrv/init.c:419 set_abt_max_num_xstreams() Setting ABT_MAX_NUM_XSTREAMS to 4
09/11-17:47:17.13 intel-S2600WFD DAOS[11713/11713] server INFO src/iosrv/init.c:490 server_init() Module interface successfully initialized
09/11-17:47:17.13 intel-S2600WFD DAOS[11713/11713] server INFO src/iosrv/init.c:498 server_init() Module vos,rdb,rsvc,security,mgmt,dtx,pool,cont,obj,rebuild successfully loaded
09/11-17:47:17.13 intel-S2600WFD DAOS[11713/11713] crt  INFO src/cart/crt_init.c:269 crt_init_opt() libcart version 4.8.0 initializing
09/11-17:47:17.13 intel-S2600WFD DAOS[11713/11713] crt  WARN src/cart/crt_init.c:161 data_init() FI_UNIVERSE_SIZE was not set; setting to 2048
09/11-17:47:17.13 intel-S2600WFD DAOS[11713/11713] server INFO src/iosrv/init.c:507 server_init() Network successfully initialized
09/11-17:47:17.13 intel-S2600WFD DAOS[11713/11713] server INFO src/iosrv/init.c:516 server_init() Module vos,rdb,rsvc,security,mgmt,dtx,pool,cont,obj,rebuild successfully initialized
09/11-17:47:17.20 intel-S2600WFD DAOS[11713/11713] bio  ERR  src/bio/bio_xstream.c:369 bio_spdk_env_init() Failed to init SPDK thread lib, Unknown error -1003 (-1003)
09/11-17:47:17.21 intel-S2600WFD DAOS[11713/11713] server ERR  src/iosrv/init.c:521 server_init() DAOS cannot be initialized using the configured path (/mnt/daos).   Please ensure it is on a PMDK compatible file system and writeable by the current user

Rgds,

Gert

Send

---------------------------------------------------------------------
Intel Corporation (UK) Limited
Registered No. 1134945 (England)
Registered Office: Pipers Way, Swindon SN3 1RJ
VAT No: 860 2173 47

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


Gert Pauwels
 

Hi Tom,

After booting I run the following 2 commands:
root@intel-S2600WFD:~/daos# cat /proc/meminfo | grep Huge

AnonHugePages:         0 kB
ShmemHugePages:        0 kB
FileHugePages:         0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
Hugetlb:               0 kB
root@intel-S2600WFD:~/daos# ls -lah /dev/hugepages/
total 0
drwxr-xr-x  2 root root    0 Sep 14 16:40 .
drwxr-xr-x 20 root root 4,3K Sep 14 16:41 ..

 
At this point followed the steps you described to bind the NVMe drives to the kernel and wiped the pmem[01] devices.
After running "dmg storage format" from another tty the error showed and I stopped daos_server.

The daos_server.log, daos_adim.log and daos_control.log are attached.

At this point I run the two commands again:

root@intel-S2600WFD:~/daos# cat /proc/meminfo | grep Huge
AnonHugePages:         0 kB
ShmemHugePages:        0 kB
FileHugePages:         0 kB
HugePages_Total:    4096
HugePages_Free:     4095
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
Hugetlb:         8388608 kB

root@intel-S2600WFD:~/daos# ls -lah /dev/hugepages/spdk_pid4857map_0
-rw------- 1 root root 2,0M Sep 14 16:49 /dev/hugepages/spdk_pid4857map_0


Regards,

Gert,


Nabarro, Tom
 

Hello Gert,

 

I can’t see anything obviously wrong with bringup and hugepage allocation, did you reboot the nodes (just as a sanity check)?

 

The next steps I will take is to get in touch with SPDK folks and reproduce on our side.

 

Regards,

Tom Nabarro – DCG/ESAD

M: +44 (0)7786 260986

Skype: tom.nabarro

 

From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of Gert Pauwels
Sent: Monday, September 14, 2020 4:30 PM
To: daos@daos.groups.io
Subject: Re: [daos] no dRPC client set on Ubuntu 20.04.1

 

Hi Tom,

After booting I run the following 2 commands:
root@intel-S2600WFD:~/daos# cat /proc/meminfo | grep Huge
AnonHugePages:         0 kB
ShmemHugePages:        0 kB
FileHugePages:         0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
Hugetlb:               0 kB
root@intel-S2600WFD:~/daos# ls -lah /dev/hugepages/
total 0
drwxr-xr-x  2 root root    0 Sep 14 16:40 .
drwxr-xr-x 20 root root 4,3K Sep 14 16:41 ..

 
At this point followed the steps you described to bind the NVMe drives to the kernel and wiped the pmem[01] devices.
After running "dmg storage format" from another tty the error showed and I stopped daos_server.

The daos_server.log, daos_adim.log and daos_control.log are attached.

At this point I run the two commands again:

root@intel-S2600WFD:~/daos# cat /proc/meminfo | grep Huge
AnonHugePages:         0 kB
ShmemHugePages:        0 kB
FileHugePages:         0 kB
HugePages_Total:    4096
HugePages_Free:     4095
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
Hugetlb:         8388608 kB

root@intel-S2600WFD:~/daos# ls -lah /dev/hugepages/spdk_pid4857map_0
-rw------- 1 root root 2,0M Sep 14 16:49 /dev/hugepages/spdk_pid4857map_0


Regards,

Gert,

---------------------------------------------------------------------
Intel Corporation (UK) Limited
Registered No. 1134945 (England)
Registered Office: Pipers Way, Swindon SN3 1RJ
VAT No: 860 2173 47

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


Nabarro, Tom
 

Can now reproduce locally and working to find a solution.

 

From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of Gert Pauwels
Sent: Friday, September 11, 2020 5:42 PM
To: daos@daos.groups.io
Subject: Re: [daos] no dRPC client set on Ubuntu 20.04.1

 

Hi Tom,

After applying the patch you provided the error is now:
    09/11-17:47:17.20 intel-S2600WFD DAOS[11713/11713] bio  ERR  src/bio/bio_xst   09/11-17 ream.c:369 bio_spdk_env_init() Failed to init SPDK thread lib, Unknown error -1003 (-1003)

the original one was:

   DAOS[7642/7642] bio  ERR  src/bio/bio_xstream.c:367 bio_spdk_env_init() Failed to init SPDK thread lib, DER_INVAL(-1003)


The whole /tmp/daos_server looks like:

09/11-17:39:03.32 intel-S2600WFD DAOS[7642/7642] fi   INFO src/gurt/fault_inject.c:481 d_fault_inject_init() No config file, fault injection is OFF.
09/11-17:39:03.35 intel-S2600WFD DAOS[7642/7642] server INFO src/iosrv/init.c:419 set_abt_max_num_xstreams() Setting ABT_MAX_NUM_XSTREAMS to 4
09/11-17:39:03.35 intel-S2600WFD DAOS[7642/7642] server INFO src/iosrv/init.c:490 server_init() Module interface successfully initialized
09/11-17:39:03.36 intel-S2600WFD DAOS[7642/7642] server INFO src/iosrv/init.c:498 server_init() Module vos,rdb,rsvc,security,mgmt,dtx,pool,cont,obj,rebuild successfully loaded
09/11-17:39:03.36 intel-S2600WFD DAOS[7642/7642] crt  INFO src/cart/crt_init.c:269 crt_init_opt() libcart version 4.8.0 initializing
09/11-17:39:03.36 intel-S2600WFD DAOS[7642/7642] crt  WARN src/cart/crt_init.c:161 data_init() FI_UNIVERSE_SIZE was not set; setting to 2048
09/11-17:39:03.36 intel-S2600WFD DAOS[7642/7642] server INFO src/iosrv/init.c:507 server_init() Network successfully initialized
09/11-17:39:03.37 intel-S2600WFD DAOS[7642/7642] server INFO src/iosrv/init.c:516 server_init() Module vos,rdb,rsvc,security,mgmt,dtx,pool,cont,obj,rebuild successfully initialized
09/11-17:39:03.44 intel-S2600WFD DAOS[7642/7642] bio  ERR  src/bio/bio_xstream.c:367 bio_spdk_env_init() Failed to init SPDK thread lib, DER_INVAL(-1003)
09/11-17:39:03.44 intel-S2600WFD DAOS[7642/7642] server ERR  src/iosrv/init.c:521 server_init() DAOS cannot be initialized using the configured path (/mnt/daos).   Please ensure it is on a PMDK compatible file system and writeable by the current user
09/11-17:47:17.09 intel-S2600WFD DAOS[11713/11713] fi   INFO src/gurt/fault_inject.c:481 d_fault_inject_init() No config file, fault injection is OFF.
09/11-17:47:17.13 intel-S2600WFD DAOS[11713/11713] server INFO src/iosrv/init.c:419 set_abt_max_num_xstreams() Setting ABT_MAX_NUM_XSTREAMS to 4
09/11-17:47:17.13 intel-S2600WFD DAOS[11713/11713] server INFO src/iosrv/init.c:490 server_init() Module interface successfully initialized
09/11-17:47:17.13 intel-S2600WFD DAOS[11713/11713] server INFO src/iosrv/init.c:498 server_init() Module vos,rdb,rsvc,security,mgmt,dtx,pool,cont,obj,rebuild successfully loaded
09/11-17:47:17.13 intel-S2600WFD DAOS[11713/11713] crt  INFO src/cart/crt_init.c:269 crt_init_opt() libcart version 4.8.0 initializing
09/11-17:47:17.13 intel-S2600WFD DAOS[11713/11713] crt  WARN src/cart/crt_init.c:161 data_init() FI_UNIVERSE_SIZE was not set; setting to 2048
09/11-17:47:17.13 intel-S2600WFD DAOS[11713/11713] server INFO src/iosrv/init.c:507 server_init() Network successfully initialized
09/11-17:47:17.13 intel-S2600WFD DAOS[11713/11713] server INFO src/iosrv/init.c:516 server_init() Module vos,rdb,rsvc,security,mgmt,dtx,pool,cont,obj,rebuild successfully initialized
09/11-17:47:17.20 intel-S2600WFD DAOS[11713/11713] bio  ERR  src/bio/bio_xstream.c:369 bio_spdk_env_init() Failed to init SPDK thread lib, Unknown error -1003 (-1003)
09/11-17:47:17.21 intel-S2600WFD DAOS[11713/11713] server ERR  src/iosrv/init.c:521 server_init() DAOS cannot be initialized using the configured path (/mnt/daos).   Please ensure it is on a PMDK compatible file system and writeable by the current user

Rgds,

Gert

---------------------------------------------------------------------
Intel Corporation (UK) Limited
Registered No. 1134945 (England)
Registered Office: Pipers Way, Swindon SN3 1RJ
VAT No: 860 2173 47

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


Nabarro, Tom
 

Hello Gert and Lee,

 

https://github.com/daos-stack/daos/pull/3480 has been merged with master to fix this issue.

 

dpdk (used as the default environment for spdk) is using weak symbols when linking:

 

"dpdk/lib/librte_eal/common/include/rte_common.h:#define _rte_weak _attribute__((__weak__)) "

 

and Ubuntu sets --as-needed which means some of the constructor functions in dpdk mempool libs (which handle the access to hugepages) don't get run and cause failure in spdk_thread_lib_init() (called from bio_xstream.c in DAOS).

 

The fix involves setting --no-as-needed linker flags explicitly during build of the relevant bio module and control-plane libs to pull in all symbols regardless of whether they are explicitly called or not.

 

Thanks to Jim Harris in the SPDK for help in resolving the issue. Please confirm the fix works if possible.

 

Regards,

Tom Nabarro – DCG/ESAD

M: +44 (0)7786 260986

Skype: tom.nabarro

 

From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of Gert Pauwels
Sent: Monday, September 14, 2020 4:30 PM
To: daos@daos.groups.io
Subject: Re: [daos] no dRPC client set on Ubuntu 20.04.1

 

Hi Tom,

After booting I run the following 2 commands:
root@intel-S2600WFD:~/daos# cat /proc/meminfo | grep Huge
AnonHugePages:         0 kB
ShmemHugePages:        0 kB
FileHugePages:         0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
Hugetlb:               0 kB
root@intel-S2600WFD:~/daos# ls -lah /dev/hugepages/
total 0
drwxr-xr-x  2 root root    0 Sep 14 16:40 .
drwxr-xr-x 20 root root 4,3K Sep 14 16:41 ..

 
At this point followed the steps you described to bind the NVMe drives to the kernel and wiped the pmem[01] devices.
After running "dmg storage format" from another tty the error showed and I stopped daos_server.

The daos_server.log, daos_adim.log and daos_control.log are attached.

At this point I run the two commands again:

root@intel-S2600WFD:~/daos# cat /proc/meminfo | grep Huge
AnonHugePages:         0 kB
ShmemHugePages:        0 kB
FileHugePages:         0 kB
HugePages_Total:    4096
HugePages_Free:     4095
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
Hugetlb:         8388608 kB

root@intel-S2600WFD:~/daos# ls -lah /dev/hugepages/spdk_pid4857map_0
-rw------- 1 root root 2,0M Sep 14 16:49 /dev/hugepages/spdk_pid4857map_0


Regards,

Gert,

---------------------------------------------------------------------
Intel Corporation (UK) Limited
Registered No. 1134945 (England)
Registered Office: Pipers Way, Swindon SN3 1RJ
VAT No: 860 2173 47

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


Gert Pauwels
 

Hi Tom,

Thanks for resolving the issue. Thanks to Jim too.

I don't see the errors anymore. 

Rgds,

Gert,