Date   

Re: DAOS_test failed

Lombardi, Johann
 

Hi Anton,

 

Yes, I think it is fine. You probably leaked pools from prior runs that show up in pool listing while the test is assuming that no other pools are present. I think you are good to go.

 

Cheers,

Johann

 

From: <daos@daos.groups.io> on behalf of "anton.brekhov@..." <anton.brekhov@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Wednesday 16 September 2020 at 13:24
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: Re: [daos] DAOS_test failed

 

[Edited Message Follows]

I've set new sizes, and launch only management test :

[root@sky08 ~]# export POOL_NVME_SIZE=16

[root@sky08 ~]# export POOL_SCM_SIZE=8

[root@sky08 ~]# mpirun --allow-run-as-root -np 1 daos_test -m

 

 

=================

DAOS management tests..

=====================

[==========] Running 5 test(s).

[ RUN      ] MGMT1: create/destroy pool on all tgts

creating pool synchronously ... success uuid = 95650fb9-3eec-4b48-9c5f-ffaff9597df8

destroying pool synchronously ... success

[       OK ] MGMT1: create/destroy pool on all tgts

[ RUN      ] MGMT2: create/destroy pool on all tgts (async)

creating pool asynchronously ... success uuid = 865c9dcc-c28c-4759-88fe-46e009c52362

destroying pool asynchronously ... success

[       OK ] MGMT2: create/destroy pool on all tgts (async)

[ RUN      ] MGMT3: list-pools with no pools in sys

[  ERROR   ] --- 0x2 != 0

[   LINE   ] --- src/tests/suite/daos_mgmt.c:262: error: Failure!

[  FAILED  ] MGMT3: list-pools with no pools in sys

[ RUN      ] MGMT4: list-pools with multiple pools in sys

setup: creating pool, SCM size=8 GB, NVMe size=16 GB

setup: created pool 2817ca85-971a-4a43-b01b-fefcec01ec1d

setup: creating pool, SCM size=8 GB, NVMe size=16 GB

setup: created pool f4aaa2fe-5e00-47d8-9d04-a9559df709d5

setup: creating pool, SCM size=8 GB, NVMe size=16 GB

setup: created pool 2e81be40-1c5a-401f-be5a-c9646603a9f8

setup: creating pool, SCM size=8 GB, NVMe size=16 GB

setup: created pool 929dad5e-f797-4c4e-bb33-44fde4c958cf

teardown: destroyed pool 2817ca85-971a-4a43-b01b-fefcec01ec1d

teardown: destroyed pool f4aaa2fe-5e00-47d8-9d04-a9559df709d5

teardown: destroyed pool 2e81be40-1c5a-401f-be5a-c9646603a9f8

teardown: destroyed pool 929dad5e-f797-4c4e-bb33-44fde4c958cf

[  FAILED  ] MGMT4: list-pools with multiple pools in sys

[ RUN      ] MGMT5: retry MGMT_POOL_{CREATE,DESETROY} upon errors

Fault injection required for test, skipping...

[  ERROR   ] --- 0x6 != 0x4

[   LINE   ] --- src/tests/suite/daos_mgmt.c:262: error: Failure!

[  SKIPPED ] MGMT5: retry MGMT_POOL_{CREATE,DESETROY} upon errors

[==========] 5 test(s) run.

[  PASSED  ] 2 test(s).

[  SKIPPED ] 1 test(s), listed below:

[  SKIPPED ] MGMT5: retry MGMT_POOL_{CREATE,DESETROY} upon errors

 

 1 SKIPPED TEST(S)

[  FAILED  ] 2 test(s), listed below:

[  FAILED  ] MGMT3: list-pools with no pools in sys

[  FAILED  ] MGMT4: list-pools with multiple pools in sys

 

 2 FAILED TEST(S)

 

============ Summary src/tests/suite/daos_test.c

ERROR, 2 TEST(S) FAILED

--------------------------------------------------------------------------

Primary job  terminated normally, but 1 process returned

a non-zero exit code. Per user-direction, the job has been aborted.

--------------------------------------------------------------------------

--------------------------------------------------------------------------

mpirun detected that one or more processes exited with non-zero status, thus causing

the job to be terminated. The first process to do so was:

 

  Process name: [[13612,1],0]

  Exit code:    2

--------------------------------------------------------------------------

Is it ok that few tests have failed?

---------------------------------------------------------------------
Intel Corporation SAS (French simplified joint stock company)
Registered headquarters: "Les Montalets"- 2, rue de Paris,
92196 Meudon Cedex, France
Registration Number:  302 456 199 R.C.S. NANTERRE
Capital: 4,572,000 Euros

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


Re: no dRPC client set on Ubuntu 20.04.1

Nabarro, Tom
 

Can now reproduce locally and working to find a solution.

 

From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of Gert Pauwels
Sent: Friday, September 11, 2020 5:42 PM
To: daos@daos.groups.io
Subject: Re: [daos] no dRPC client set on Ubuntu 20.04.1

 

Hi Tom,

After applying the patch you provided the error is now:
    09/11-17:47:17.20 intel-S2600WFD DAOS[11713/11713] bio  ERR  src/bio/bio_xst   09/11-17 ream.c:369 bio_spdk_env_init() Failed to init SPDK thread lib, Unknown error -1003 (-1003)

the original one was:

   DAOS[7642/7642] bio  ERR  src/bio/bio_xstream.c:367 bio_spdk_env_init() Failed to init SPDK thread lib, DER_INVAL(-1003)


The whole /tmp/daos_server looks like:

09/11-17:39:03.32 intel-S2600WFD DAOS[7642/7642] fi   INFO src/gurt/fault_inject.c:481 d_fault_inject_init() No config file, fault injection is OFF.
09/11-17:39:03.35 intel-S2600WFD DAOS[7642/7642] server INFO src/iosrv/init.c:419 set_abt_max_num_xstreams() Setting ABT_MAX_NUM_XSTREAMS to 4
09/11-17:39:03.35 intel-S2600WFD DAOS[7642/7642] server INFO src/iosrv/init.c:490 server_init() Module interface successfully initialized
09/11-17:39:03.36 intel-S2600WFD DAOS[7642/7642] server INFO src/iosrv/init.c:498 server_init() Module vos,rdb,rsvc,security,mgmt,dtx,pool,cont,obj,rebuild successfully loaded
09/11-17:39:03.36 intel-S2600WFD DAOS[7642/7642] crt  INFO src/cart/crt_init.c:269 crt_init_opt() libcart version 4.8.0 initializing
09/11-17:39:03.36 intel-S2600WFD DAOS[7642/7642] crt  WARN src/cart/crt_init.c:161 data_init() FI_UNIVERSE_SIZE was not set; setting to 2048
09/11-17:39:03.36 intel-S2600WFD DAOS[7642/7642] server INFO src/iosrv/init.c:507 server_init() Network successfully initialized
09/11-17:39:03.37 intel-S2600WFD DAOS[7642/7642] server INFO src/iosrv/init.c:516 server_init() Module vos,rdb,rsvc,security,mgmt,dtx,pool,cont,obj,rebuild successfully initialized
09/11-17:39:03.44 intel-S2600WFD DAOS[7642/7642] bio  ERR  src/bio/bio_xstream.c:367 bio_spdk_env_init() Failed to init SPDK thread lib, DER_INVAL(-1003)
09/11-17:39:03.44 intel-S2600WFD DAOS[7642/7642] server ERR  src/iosrv/init.c:521 server_init() DAOS cannot be initialized using the configured path (/mnt/daos).   Please ensure it is on a PMDK compatible file system and writeable by the current user
09/11-17:47:17.09 intel-S2600WFD DAOS[11713/11713] fi   INFO src/gurt/fault_inject.c:481 d_fault_inject_init() No config file, fault injection is OFF.
09/11-17:47:17.13 intel-S2600WFD DAOS[11713/11713] server INFO src/iosrv/init.c:419 set_abt_max_num_xstreams() Setting ABT_MAX_NUM_XSTREAMS to 4
09/11-17:47:17.13 intel-S2600WFD DAOS[11713/11713] server INFO src/iosrv/init.c:490 server_init() Module interface successfully initialized
09/11-17:47:17.13 intel-S2600WFD DAOS[11713/11713] server INFO src/iosrv/init.c:498 server_init() Module vos,rdb,rsvc,security,mgmt,dtx,pool,cont,obj,rebuild successfully loaded
09/11-17:47:17.13 intel-S2600WFD DAOS[11713/11713] crt  INFO src/cart/crt_init.c:269 crt_init_opt() libcart version 4.8.0 initializing
09/11-17:47:17.13 intel-S2600WFD DAOS[11713/11713] crt  WARN src/cart/crt_init.c:161 data_init() FI_UNIVERSE_SIZE was not set; setting to 2048
09/11-17:47:17.13 intel-S2600WFD DAOS[11713/11713] server INFO src/iosrv/init.c:507 server_init() Network successfully initialized
09/11-17:47:17.13 intel-S2600WFD DAOS[11713/11713] server INFO src/iosrv/init.c:516 server_init() Module vos,rdb,rsvc,security,mgmt,dtx,pool,cont,obj,rebuild successfully initialized
09/11-17:47:17.20 intel-S2600WFD DAOS[11713/11713] bio  ERR  src/bio/bio_xstream.c:369 bio_spdk_env_init() Failed to init SPDK thread lib, Unknown error -1003 (-1003)
09/11-17:47:17.21 intel-S2600WFD DAOS[11713/11713] server ERR  src/iosrv/init.c:521 server_init() DAOS cannot be initialized using the configured path (/mnt/daos).   Please ensure it is on a PMDK compatible file system and writeable by the current user

Rgds,

Gert

---------------------------------------------------------------------
Intel Corporation (UK) Limited
Registered No. 1134945 (England)
Registered Office: Pipers Way, Swindon SN3 1RJ
VAT No: 860 2173 47

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


Re: DAOS_test failed

anton.brekhov@...
 
Edited

I've set new sizes, and launch only management test :

[root@sky08 ~]# export POOL_NVME_SIZE=16
[root@sky08 ~]# export POOL_SCM_SIZE=8
[root@sky08 ~]# mpirun --allow-run-as-root -np 1 daos_test -m
 
 
=================
DAOS management tests..
=====================
[==========] Running 5 test(s).
[ RUN      ] MGMT1: create/destroy pool on all tgts
creating pool synchronously ... success uuid = 95650fb9-3eec-4b48-9c5f-ffaff9597df8
destroying pool synchronously ... success
[       OK ] MGMT1: create/destroy pool on all tgts
[ RUN      ] MGMT2: create/destroy pool on all tgts (async)
creating pool asynchronously ... success uuid = 865c9dcc-c28c-4759-88fe-46e009c52362
destroying pool asynchronously ... success
[       OK ] MGMT2: create/destroy pool on all tgts (async)
[ RUN      ] MGMT3: list-pools with no pools in sys
[  ERROR   ] --- 0x2 != 0
[   LINE   ] --- src/tests/suite/daos_mgmt.c:262: error: Failure!
[  FAILED  ] MGMT3: list-pools with no pools in sys
[ RUN      ] MGMT4: list-pools with multiple pools in sys
setup: creating pool, SCM size=8 GB, NVMe size=16 GB
setup: created pool 2817ca85-971a-4a43-b01b-fefcec01ec1d
setup: creating pool, SCM size=8 GB, NVMe size=16 GB
setup: created pool f4aaa2fe-5e00-47d8-9d04-a9559df709d5
setup: creating pool, SCM size=8 GB, NVMe size=16 GB
setup: created pool 2e81be40-1c5a-401f-be5a-c9646603a9f8
setup: creating pool, SCM size=8 GB, NVMe size=16 GB
setup: created pool 929dad5e-f797-4c4e-bb33-44fde4c958cf
teardown: destroyed pool 2817ca85-971a-4a43-b01b-fefcec01ec1d
teardown: destroyed pool f4aaa2fe-5e00-47d8-9d04-a9559df709d5
teardown: destroyed pool 2e81be40-1c5a-401f-be5a-c9646603a9f8
teardown: destroyed pool 929dad5e-f797-4c4e-bb33-44fde4c958cf
[  FAILED  ] MGMT4: list-pools with multiple pools in sys
[ RUN      ] MGMT5: retry MGMT_POOL_{CREATE,DESETROY} upon errors
Fault injection required for test, skipping...
[  ERROR   ] --- 0x6 != 0x4
[   LINE   ] --- src/tests/suite/daos_mgmt.c:262: error: Failure!
[  SKIPPED ] MGMT5: retry MGMT_POOL_{CREATE,DESETROY} upon errors
[==========] 5 test(s) run.
[  PASSED  ] 2 test(s).
[  SKIPPED ] 1 test(s), listed below:
[  SKIPPED ] MGMT5: retry MGMT_POOL_{CREATE,DESETROY} upon errors
 
 1 SKIPPED TEST(S)
[  FAILED  ] 2 test(s), listed below:
[  FAILED  ] MGMT3: list-pools with no pools in sys
[  FAILED  ] MGMT4: list-pools with multiple pools in sys
 
 2 FAILED TEST(S)
 
============ Summary src/tests/suite/daos_test.c
ERROR, 2 TEST(S) FAILED
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:
 
  Process name: [[13612,1],0]
  Exit code:    2
--------------------------------------------------------------------------

Is it ok that few tests have failed?


Re: DAOS_test failed

Wang, Di
 

Hello,

 

This  basically means there are not enough servers to run rebuild tests, so they were being skipped.

 

The failure here is probably due to the incorrect usage of cmoka, which are used by some DAOS tests. Anyway it is not a real “failure”.

 

If you are interested in running rebuild tests. You need at least 6 DAOS servers.

 

Thanks

WangDi

 

 

On 9/15/20, 2:07 PM, "daos@daos.groups.io on behalf of anton.brekhov@..." <daos@daos.groups.io on behalf of anton.brekhov@...> wrote:

 

I've set this env vars:
export POOL_SCM_SIZE=2

export POOL_NVME_SIZE=4
And this is mine output before an error:

REBUILD12: rebuild send objects failed

setup: creating pool, SCM size=2 GB, NVMe size=4 GB

setup: created pool 8abfd4aa-0fb4-4122-aef7-f58b3fe6d81f

setup: connecting to pool

connected to pool, ntarget=4

setup: creating container ac978bdb-cb4c-4729-a0e4-cf3f3973696d

setup: opening container

No enough targets, skipping (4/0)

teardown: destroyed pool 8abfd4aa-0fb4-4122-aef7-f58b3fe6d81f

REBUILD13: rebuild empty pool offline

setup: creating pool, SCM size=2 GB, NVMe size=4 GB

setup: created pool 803c465f-6547-4c8f-a473-2dea0d457081

setup: connecting to pool

connected to pool, ntarget=4

setup: creating container a7490729-0d9a-4935-900e-ede0f656871d

setup: opening container

No enough targets, skipping (4/0)

teardown: destroyed pool 803c465f-6547-4c8f-a473-2dea0d457081

REBUILD14: rebuild no space failure

setup: creating pool, SCM size=2 GB, NVMe size=4 GB

setup: created pool cf718311-93cb-4b9f-a159-419585af99e8

setup: connecting to pool

connected to pool, ntarget=4

setup: creating container f5a357b4-3af8-4670-9032-f5e9fc2944af

setup: opening container

No enough targets, skipping (4/0)

--------------------------------------------------------------------------

Primary job  terminated normally, but 1 process returned

a non-zero exit code. Per user-direction, the job has been aborted.

--------------------------------------------------------------------------

--------------------------------------------------------------------------

mpirun detected that one or more processes exited with non-zero status, thus causing

the job to be terminated. The first process to do so was:

 

  Process name: [[25434,1],0]

  Exit code:    255

 

--------------------------------------------------------------------------

 

 

 

 

 


Re: DAOS_test failed

Farrell, Patrick Arthur
 

You'll want to turn on debug (see the troubleshooting section in the user guide) to get more information on why this failed.

Also, the pool size (both SCM and NVME) will not be large enough to complete the tests.  I think you need something like at least 16 GB NVMe and 8 GB SCM?  I'm not saying that is your issue here (though it might be), but it will stop you later.

-Patrick


From: daos@daos.groups.io <daos@daos.groups.io> on behalf of anton.brekhov@... <anton.brekhov@...>
Sent: Tuesday, September 15, 2020 4:07 PM
To: daos@daos.groups.io <daos@daos.groups.io>
Subject: Re: [daos] DAOS_test failed
 

I've set this env vars:
export POOL_SCM_SIZE=2

export POOL_NVME_SIZE=4
And this is mine output before an error:

REBUILD12: rebuild send objects failed

setup: creating pool, SCM size=2 GB, NVMe size=4 GB

setup: created pool 8abfd4aa-0fb4-4122-aef7-f58b3fe6d81f

setup: connecting to pool

connected to pool, ntarget=4

setup: creating container ac978bdb-cb4c-4729-a0e4-cf3f3973696d

setup: opening container

No enough targets, skipping (4/0)

teardown: destroyed pool 8abfd4aa-0fb4-4122-aef7-f58b3fe6d81f

REBUILD13: rebuild empty pool offline

setup: creating pool, SCM size=2 GB, NVMe size=4 GB

setup: created pool 803c465f-6547-4c8f-a473-2dea0d457081

setup: connecting to pool

connected to pool, ntarget=4

setup: creating container a7490729-0d9a-4935-900e-ede0f656871d

setup: opening container

No enough targets, skipping (4/0)

teardown: destroyed pool 803c465f-6547-4c8f-a473-2dea0d457081

REBUILD14: rebuild no space failure

setup: creating pool, SCM size=2 GB, NVMe size=4 GB

setup: created pool cf718311-93cb-4b9f-a159-419585af99e8

setup: connecting to pool

connected to pool, ntarget=4

setup: creating container f5a357b4-3af8-4670-9032-f5e9fc2944af

setup: opening container

No enough targets, skipping (4/0)

--------------------------------------------------------------------------

Primary job  terminated normally, but 1 process returned

a non-zero exit code. Per user-direction, the job has been aborted.

--------------------------------------------------------------------------

--------------------------------------------------------------------------

mpirun detected that one or more processes exited with non-zero status, thus causing

the job to be terminated. The first process to do so was:

 

  Process name: [[25434,1],0]

  Exit code:    255

 

--------------------------------------------------------------------------

 
 
 
 
 


Re: DAOS_test failed

anton.brekhov@...
 

I've set this env vars:
export POOL_SCM_SIZE=2

export POOL_NVME_SIZE=4
And this is mine output before an error:

REBUILD12: rebuild send objects failed

setup: creating pool, SCM size=2 GB, NVMe size=4 GB

setup: created pool 8abfd4aa-0fb4-4122-aef7-f58b3fe6d81f

setup: connecting to pool

connected to pool, ntarget=4

setup: creating container ac978bdb-cb4c-4729-a0e4-cf3f3973696d

setup: opening container

No enough targets, skipping (4/0)

teardown: destroyed pool 8abfd4aa-0fb4-4122-aef7-f58b3fe6d81f

REBUILD13: rebuild empty pool offline

setup: creating pool, SCM size=2 GB, NVMe size=4 GB

setup: created pool 803c465f-6547-4c8f-a473-2dea0d457081

setup: connecting to pool

connected to pool, ntarget=4

setup: creating container a7490729-0d9a-4935-900e-ede0f656871d

setup: opening container

No enough targets, skipping (4/0)

teardown: destroyed pool 803c465f-6547-4c8f-a473-2dea0d457081

REBUILD14: rebuild no space failure

setup: creating pool, SCM size=2 GB, NVMe size=4 GB

setup: created pool cf718311-93cb-4b9f-a159-419585af99e8

setup: connecting to pool

connected to pool, ntarget=4

setup: creating container f5a357b4-3af8-4670-9032-f5e9fc2944af

setup: opening container

No enough targets, skipping (4/0)

--------------------------------------------------------------------------

Primary job  terminated normally, but 1 process returned

a non-zero exit code. Per user-direction, the job has been aborted.

--------------------------------------------------------------------------

--------------------------------------------------------------------------

mpirun detected that one or more processes exited with non-zero status, thus causing

the job to be terminated. The first process to do so was:

 

  Process name: [[25434,1],0]

  Exit code:    255

 

--------------------------------------------------------------------------

 
 
 
 
 


Re: DAOS_test failed

anton.brekhov@...
 

I've installed MOFED 5.0.2 on both hosts, and openmpi. And it works a little further!
I've launched daos_test like 
mpirun --allow-run-as-root -np 1 daos_test.

There were passed and failed tests, but it ended with another error:

=================

DAOS rebuild tests..

=================

[  PASSED  ] 3 test(s).

setup: creating pool, SCM size=4 GB, NVMe size=8 GB

setup: created pool 88a9a5f1-8a21-4e45-a32d-ef87791c5f80

setup: connecting to pool

connected to pool, ntarget=4

setup: creating container 28aa2f9d-8559-41f3-907f-ec1b893ca90c

setup: opening container

REBUILD0: drop rebuild scan reply

No enough targets, skipping (4/0)

teardown: destroyed pool 88a9a5f1-8a21-4e45-a32d-ef87791c5f80

REBUILD1: retry rebuild for not ready

setup: creating pool, SCM size=0 GB, NVMe size=0 GB

daos_pool_create failed, rc: -1003

[sky08:21007:0:21007] Caught signal 11 (Segmentation fault: tkill(2) or tgkill(2) at address 0x520f)

==== backtrace (tid:  21007) ====

 0 0x000000000004cb95 ucs_debug_print_backtrace()  ???:0

 1 0x000000000045471e ???()  /usr/bin/daos_test:0

 2 0x000000000044ca21 ???()  /usr/bin/daos_test:0

 3 0x0000000000406db3 ???()  /usr/bin/daos_test:0

 4 0x0000000000022505 __libc_start_main()  ???:0

 5 0x00000000004079e2 ???()  /usr/bin/daos_test:0

=================================

[sky08:21007] *** Process received signal ***

[sky08:21007] Signal: Segmentation fault (11)

[sky08:21007] Signal code:  (-6)

[sky08:21007] Failing at address: 0x520f

[sky08:21007] [ 0] /lib64/libpthread.so.0(+0xf5f0)[0x7fdac52295f0]

[sky08:21007] [ 1] daos_test[0x45471e]

Do I need to set some env variables with scm and nvme size for this phase?

 
 
 
 
 


Re: no dRPC client set on Ubuntu 20.04.1

Nabarro, Tom
 

Hello Gert,

 

I can’t see anything obviously wrong with bringup and hugepage allocation, did you reboot the nodes (just as a sanity check)?

 

The next steps I will take is to get in touch with SPDK folks and reproduce on our side.

 

Regards,

Tom Nabarro – DCG/ESAD

M: +44 (0)7786 260986

Skype: tom.nabarro

 

From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of Gert Pauwels
Sent: Monday, September 14, 2020 4:30 PM
To: daos@daos.groups.io
Subject: Re: [daos] no dRPC client set on Ubuntu 20.04.1

 

Hi Tom,

After booting I run the following 2 commands:
root@intel-S2600WFD:~/daos# cat /proc/meminfo | grep Huge
AnonHugePages:         0 kB
ShmemHugePages:        0 kB
FileHugePages:         0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
Hugetlb:               0 kB
root@intel-S2600WFD:~/daos# ls -lah /dev/hugepages/
total 0
drwxr-xr-x  2 root root    0 Sep 14 16:40 .
drwxr-xr-x 20 root root 4,3K Sep 14 16:41 ..

 
At this point followed the steps you described to bind the NVMe drives to the kernel and wiped the pmem[01] devices.
After running "dmg storage format" from another tty the error showed and I stopped daos_server.

The daos_server.log, daos_adim.log and daos_control.log are attached.

At this point I run the two commands again:

root@intel-S2600WFD:~/daos# cat /proc/meminfo | grep Huge
AnonHugePages:         0 kB
ShmemHugePages:        0 kB
FileHugePages:         0 kB
HugePages_Total:    4096
HugePages_Free:     4095
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
Hugetlb:         8388608 kB

root@intel-S2600WFD:~/daos# ls -lah /dev/hugepages/spdk_pid4857map_0
-rw------- 1 root root 2,0M Sep 14 16:49 /dev/hugepages/spdk_pid4857map_0


Regards,

Gert,

---------------------------------------------------------------------
Intel Corporation (UK) Limited
Registered No. 1134945 (England)
Registered Office: Pipers Way, Swindon SN3 1RJ
VAT No: 860 2173 47

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


Re: DAOS_test failed

Oganezov, Alexander A
 

Hi Anton,

 

The last one that we’ve tried and worked for us was MOFED 5.0.2 which is what we currently use for our test clusters that use ofi+verbs;ofi_rxm provider.

 

Thanks,

~~Alex.

 

From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of anton.brekhov@...
Sent: Monday, September 14, 2020 12:45 AM
To: daos@daos.groups.io
Subject: Re: [daos] DAOS_test failed

 

I'm using Optane PMEM ( there are 4 modules 512GB each, two PMEM modules near each socket). I created /dev/pmem0 and /dev/pmem1 devices using `ipmctl create -goal PersistentMemoryType=AppDirect` command. On daos server I have two mellanox IB interfaces, but only one in use (mlx5_0). Here is ibstat output:

[root@apache512 tmp]# ibstat

CA 'mlx5_0'

        CA type: MT4123

        Number of ports: 1

        Firmware version: 20.28.1002

        Hardware version: 0

        Node GUID: 0xb8599f0300e4f800

        System image GUID: 0xb8599f0300e4f800

        Port 1:

                State: Active

                Physical state: LinkUp

                Rate: 56

                Base lid: 4

                LMC: 0

                SM lid: 4

                Capability mask: 0x2659e84a

                Port GUID: 0xb8599f0300e4f800

                Link layer: InfiniBand

Seems like I didn't install OFED drivers, because ofed_info not found. Which version is better to use? 

 

 

 

 

 


Re: no dRPC client set on Ubuntu 20.04.1

Gert Pauwels
 

Hi Tom,

After booting I run the following 2 commands:
root@intel-S2600WFD:~/daos# cat /proc/meminfo | grep Huge

AnonHugePages:         0 kB
ShmemHugePages:        0 kB
FileHugePages:         0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
Hugetlb:               0 kB
root@intel-S2600WFD:~/daos# ls -lah /dev/hugepages/
total 0
drwxr-xr-x  2 root root    0 Sep 14 16:40 .
drwxr-xr-x 20 root root 4,3K Sep 14 16:41 ..

 
At this point followed the steps you described to bind the NVMe drives to the kernel and wiped the pmem[01] devices.
After running "dmg storage format" from another tty the error showed and I stopped daos_server.

The daos_server.log, daos_adim.log and daos_control.log are attached.

At this point I run the two commands again:

root@intel-S2600WFD:~/daos# cat /proc/meminfo | grep Huge
AnonHugePages:         0 kB
ShmemHugePages:        0 kB
FileHugePages:         0 kB
HugePages_Total:    4096
HugePages_Free:     4095
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
Hugetlb:         8388608 kB

root@intel-S2600WFD:~/daos# ls -lah /dev/hugepages/spdk_pid4857map_0
-rw------- 1 root root 2,0M Sep 14 16:49 /dev/hugepages/spdk_pid4857map_0


Regards,

Gert,


Re: DAOS_test failed

Farrell, Patrick Arthur
 

MOFED 5.0 works well, 5.1 seems to have an incompatibility with libfabric currently.  (Note this is not a supported list or anything - This is just what we've used successfully.)

Regards,
-Patrick


From: daos@daos.groups.io <daos@daos.groups.io> on behalf of anton.brekhov@... <anton.brekhov@...>
Sent: Monday, September 14, 2020 2:45 AM
To: daos@daos.groups.io <daos@daos.groups.io>
Subject: Re: [daos] DAOS_test failed
 
I'm using Optane PMEM ( there are 4 modules 512GB each, two PMEM modules near each socket). I created /dev/pmem0 and /dev/pmem1 devices using `ipmctl create -goal PersistentMemoryType=AppDirect` command. On daos server I have two mellanox IB interfaces, but only one in use (mlx5_0). Here is ibstat output:
[root@apache512 tmp]# ibstat
CA 'mlx5_0'
        CA type: MT4123
        Number of ports: 1
        Firmware version: 20.28.1002
        Hardware version: 0
        Node GUID: 0xb8599f0300e4f800
        System image GUID: 0xb8599f0300e4f800
        Port 1:
                State: Active
                Physical state: LinkUp
                Rate: 56
                Base lid: 4
                LMC: 0
                SM lid: 4
                Capability mask: 0x2659e84a
                Port GUID: 0xb8599f0300e4f800
                Link layer: InfiniBand

Seems like I didn't install OFED drivers, because ofed_info not found. Which version is better to use? 

 
 
 
 
 


Re: no dRPC client set on Ubuntu 20.04.1

Nabarro, Tom
 

Hello Gert,

 

It’s likely the thread lib init failure is due to a problem with hugepages configuration as spdk_thread_lib_init() only calls spdk_mempool_create().

 

Can you please do some debug as follows:

 

  • First , some basic verification of hp state e.g.

bash-4.2$ cat /proc/meminfo| grep Huge

AnonHugePages:     73728 kB

HugePages_Total:   16384

HugePages_Free:    16371

HugePages_Rsvd:        0

HugePages_Surp:        0

Hugepagesize:       2048 kB

bash-4.2$ ls -lah /dev/hugepages/spdk_pid217893map_*

-rw------- 1 root daos_admins 2.0M Sep 11 12:33 /dev/hugepages/spdk_pid217893map_0

-rw------- 1 root daos_admins 2.0M Sep 11 12:33 /dev/hugepages/spdk_pid217893map_1

-rw------- 1 root daos_admins 2.0M Sep 11 12:33 /dev/hugepages/spdk_pid217893map_10

-rw------- 1 root daos_admins 2.0M Sep 11 12:33 /dev/hugepages/spdk_pid217893map_11

-rw------- 1 root daos_admins 2.0M Sep 11 12:33 /dev/hugepages/spdk_pid217893map_12

-rw------- 1 root daos_admins 2.0M Sep 11 12:33 /dev/hugepages/spdk_pid217893map_2

...

 

  • Then , enable daos_admin/privileged helper logging by setting "helper_log_file: /tmp/daos_admin.log" with the correct indentation towards the top of daos_server yaml config file.
  • Stop daos_server instances/services.
  • Wipe pmem with "sudo umount /mnt/daos[01]; sudo wipefs -a /dev/pmem[01]".
  • Bind nvme devices back to kernel "sudo daos_server storage prepare -n --reset".
  • Optionally reboot (maybe try without first).
  • Start daos_server instances/services.
  • Wait for reformat prompt "SCM format required...".
  • Run "dmg storage format" from different tty.
  • Wait for io instances to start/error then send logs including daos_admin.log please.

 

I realise you don’t need that level of instruction but included for completeness.

 

Regards,

Tom Nabarro – DCG/ESAD

M: +44 (0)7786 260986

Skype: tom.nabarro

 

From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of Gert Pauwels
Sent: Friday, September 11, 2020 5:42 PM
To: daos@daos.groups.io
Subject: Re: [daos] no dRPC client set on Ubuntu 20.04.1

 

Hi Tom,

After applying the patch you provided the error is now:
    09/11-17:47:17.20 intel-S2600WFD DAOS[11713/11713] bio  ERR  src/bio/bio_xst   09/11-17 ream.c:369 bio_spdk_env_init() Failed to init SPDK thread lib, Unknown error -1003 (-1003)

the original one was:

   DAOS[7642/7642] bio  ERR  src/bio/bio_xstream.c:367 bio_spdk_env_init() Failed to init SPDK thread lib, DER_INVAL(-1003)


The whole /tmp/daos_server looks like:

09/11-17:39:03.32 intel-S2600WFD DAOS[7642/7642] fi   INFO src/gurt/fault_inject.c:481 d_fault_inject_init() No config file, fault injection is OFF.
09/11-17:39:03.35 intel-S2600WFD DAOS[7642/7642] server INFO src/iosrv/init.c:419 set_abt_max_num_xstreams() Setting ABT_MAX_NUM_XSTREAMS to 4
09/11-17:39:03.35 intel-S2600WFD DAOS[7642/7642] server INFO src/iosrv/init.c:490 server_init() Module interface successfully initialized
09/11-17:39:03.36 intel-S2600WFD DAOS[7642/7642] server INFO src/iosrv/init.c:498 server_init() Module vos,rdb,rsvc,security,mgmt,dtx,pool,cont,obj,rebuild successfully loaded
09/11-17:39:03.36 intel-S2600WFD DAOS[7642/7642] crt  INFO src/cart/crt_init.c:269 crt_init_opt() libcart version 4.8.0 initializing
09/11-17:39:03.36 intel-S2600WFD DAOS[7642/7642] crt  WARN src/cart/crt_init.c:161 data_init() FI_UNIVERSE_SIZE was not set; setting to 2048
09/11-17:39:03.36 intel-S2600WFD DAOS[7642/7642] server INFO src/iosrv/init.c:507 server_init() Network successfully initialized
09/11-17:39:03.37 intel-S2600WFD DAOS[7642/7642] server INFO src/iosrv/init.c:516 server_init() Module vos,rdb,rsvc,security,mgmt,dtx,pool,cont,obj,rebuild successfully initialized
09/11-17:39:03.44 intel-S2600WFD DAOS[7642/7642] bio  ERR  src/bio/bio_xstream.c:367 bio_spdk_env_init() Failed to init SPDK thread lib, DER_INVAL(-1003)
09/11-17:39:03.44 intel-S2600WFD DAOS[7642/7642] server ERR  src/iosrv/init.c:521 server_init() DAOS cannot be initialized using the configured path (/mnt/daos).   Please ensure it is on a PMDK compatible file system and writeable by the current user
09/11-17:47:17.09 intel-S2600WFD DAOS[11713/11713] fi   INFO src/gurt/fault_inject.c:481 d_fault_inject_init() No config file, fault injection is OFF.
09/11-17:47:17.13 intel-S2600WFD DAOS[11713/11713] server INFO src/iosrv/init.c:419 set_abt_max_num_xstreams() Setting ABT_MAX_NUM_XSTREAMS to 4
09/11-17:47:17.13 intel-S2600WFD DAOS[11713/11713] server INFO src/iosrv/init.c:490 server_init() Module interface successfully initialized
09/11-17:47:17.13 intel-S2600WFD DAOS[11713/11713] server INFO src/iosrv/init.c:498 server_init() Module vos,rdb,rsvc,security,mgmt,dtx,pool,cont,obj,rebuild successfully loaded
09/11-17:47:17.13 intel-S2600WFD DAOS[11713/11713] crt  INFO src/cart/crt_init.c:269 crt_init_opt() libcart version 4.8.0 initializing
09/11-17:47:17.13 intel-S2600WFD DAOS[11713/11713] crt  WARN src/cart/crt_init.c:161 data_init() FI_UNIVERSE_SIZE was not set; setting to 2048
09/11-17:47:17.13 intel-S2600WFD DAOS[11713/11713] server INFO src/iosrv/init.c:507 server_init() Network successfully initialized
09/11-17:47:17.13 intel-S2600WFD DAOS[11713/11713] server INFO src/iosrv/init.c:516 server_init() Module vos,rdb,rsvc,security,mgmt,dtx,pool,cont,obj,rebuild successfully initialized
09/11-17:47:17.20 intel-S2600WFD DAOS[11713/11713] bio  ERR  src/bio/bio_xstream.c:369 bio_spdk_env_init() Failed to init SPDK thread lib, Unknown error -1003 (-1003)
09/11-17:47:17.21 intel-S2600WFD DAOS[11713/11713] server ERR  src/iosrv/init.c:521 server_init() DAOS cannot be initialized using the configured path (/mnt/daos).   Please ensure it is on a PMDK compatible file system and writeable by the current user

Rgds,

Gert

Send

---------------------------------------------------------------------
Intel Corporation (UK) Limited
Registered No. 1134945 (England)
Registered Office: Pipers Way, Swindon SN3 1RJ
VAT No: 860 2173 47

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


Re: DAOS_test failed

anton.brekhov@...
 

I'm using Optane PMEM ( there are 4 modules 512GB each, two PMEM modules near each socket). I created /dev/pmem0 and /dev/pmem1 devices using `ipmctl create -goal PersistentMemoryType=AppDirect` command. On daos server I have two mellanox IB interfaces, but only one in use (mlx5_0). Here is ibstat output:
[root@apache512 tmp]# ibstat
CA 'mlx5_0'
        CA type: MT4123
        Number of ports: 1
        Firmware version: 20.28.1002
        Hardware version: 0
        Node GUID: 0xb8599f0300e4f800
        System image GUID: 0xb8599f0300e4f800
        Port 1:
                State: Active
                Physical state: LinkUp
                Rate: 56
                Base lid: 4
                LMC: 0
                SM lid: 4
                Capability mask: 0x2659e84a
                Port GUID: 0xb8599f0300e4f800
                Link layer: InfiniBand

Seems like I didn't install OFED drivers, because ofed_info not found. Which version is better to use? 

 
 
 
 
 


Re: DAOS_test failed

Lombardi, Johann
 

It sounds like the DAOS server is not able to register memory to initiate a RDMA. Could you please tell me more about the network and storage you use on the server? Optane PMEM or DRAM? Also, what version of OFED do you use?

 

Cheers,

Johann

 

From: <daos@daos.groups.io> on behalf of "anton.brekhov@..." <anton.brekhov@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Friday 11 September 2020 at 19:41
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: Re: [daos] DAOS_test failed

 

This is errors from daos_server.log :

09/11-20:41:22.58 apache512 DAOS[10757/10786] hg   ERR  # NA -- Error -- /builddir/build/BUILD/mercury-2.0.0a1/src/na/na_ofi.c:4196

 # na_ofi_mem_register(): fi_mr_reg() failed, rc: -95 (Operation not supported)

09/11-20:41:26.20 apache512 DAOS[10757/10786] hg   ERR  # HG -- Error -- /builddir/build/BUILD/mercury-2.0.0a1/src/mercury_bulk.c:494

 # hg_bulk_create(): NA_Mem_register() failed (NA_PROTOCOL_ERROR)

09/11-20:41:26.20 apache512 DAOS[10757/10786] hg   ERR  # HG -- Error -- /builddir/build/BUILD/mercury-2.0.0a1/src/mercury_bulk.c:1072

 # HG_Bulk_create(): Could not create bulk handle

09/11-20:41:26.20 apache512 DAOS[10757/10786] hg   ERR  src/cart/crt_hg.c:1445 crt_hg_bulk_create() HG_Bulk_create failed, hg_ret: 11.

09/11-20:41:26.20 apache512 DAOS[10757/10786] bulk ERR  src/cart/crt_bulk.c:137 crt_bulk_create() crt_hg_bulk_create failed, rc: -1020.

09/11-20:41:26.20 apache512 DAOS[10757/10786] object ERR  src/object/srv_obj.c:359 obj_bulk_transfer() crt_bulk_create 0 error (-1020).

09/11-20:41:26.20 apache512 DAOS[10757/10786] object ERR  src/object/srv_obj.c:992 obj_local_rw() 1155473290706288642.0.1 data transfer failed, dma 1 rc DER_HG(-1020)

09/11-20:41:26.20 apache512 DAOS[10757/10786] object ERR  src/object/srv_obj.c:96 obj_rw_complete() 1155473290706288642.0.1Fetch end failed: -1020

09/11-20:41:26.20 apache512 DAOS[10757/10777] rpc  DBUG src/cart/crt_register.c:215 crt_opc_lookup() looking up opcode: 0x2010003

09/11-20:41:26.20 apache512 DAOS[10757/10777] bio  DBUG src/bio/bio_buffer.c:824 copy_one() bio copy 0x7f8c98212380 size 24

09/11-20:41:26.20 apache512 DAOS[10757/10777] bio  DBUG src/bio/bio_buffer.c:824 copy_one() bio copy 0x7f8c981e3680 size 320

09/11-20:41:26.20 apache512 DAOS[10757/10777] vos  DBUG src/vos/vos_io.c:607 akey_fetch() akey [16] fetch single epr 9-6359751

09/11-20:41:26.20 apache512 DAOS[10757/10777] bio  DBUG src/bio/bio_buffer.c:824 copy_one() bio copy 0x7f8c98207700 size 8

09/11-20:41:26.20 apache512 DAOS[10757/10777] vos  DBUG src/vos/vos_io.c:607 akey_fetch() akey [12] fetch single epr 5-6359751

09/11-20:41:26.20 apache512 DAOS[10757/10777] bio  DBUG src/bio/bio_buffer.c:824 copy_one() bio copy 0x7f8c98241400 size 4

09/11-20:41:26.20 apache512 DAOS[10757/10777] vos  DBUG src/vos/vos_io.c:607 akey_fetch() akey [11] fetch single epr 5-6359751

09/11-20:41:26.20 apache512 DAOS[10757/10786] rpc  DBUG src/cart/crt_register.c:215 crt_opc_lookup() looking up opcode: 0x4010001

09/11-20:41:26.20 apache512 DAOS[10757/10786] object DBUG src/object/srv_obj.c:1337 ds_obj_rw_handler() overwrite epoch 1599846087457767431

09/11-20:41:26.20 apache512 DAOS[10757/10786] vos  DBUG src/vos/vos_io.c:607 akey_fetch() akey [1] fetch array epr 1599844210152833028-1599846087457767431

---------------------------------------------------------------------
Intel Corporation SAS (French simplified joint stock company)
Registered headquarters: "Les Montalets"- 2, rue de Paris,
92196 Meudon Cedex, France
Registration Number:  302 456 199 R.C.S. NANTERRE
Capital: 4,572,000 Euros

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


Re: DAOS_test failed

anton.brekhov@...
 

This is errors from daos_server.log :
09/11-20:41:22.58 apache512 DAOS[10757/10786] hg   ERR  # NA -- Error -- /builddir/build/BUILD/mercury-2.0.0a1/src/na/na_ofi.c:4196
 # na_ofi_mem_register(): fi_mr_reg() failed, rc: -95 (Operation not supported)
09/11-20:41:26.20 apache512 DAOS[10757/10786] hg   ERR  # HG -- Error -- /builddir/build/BUILD/mercury-2.0.0a1/src/mercury_bulk.c:494
 # hg_bulk_create(): NA_Mem_register() failed (NA_PROTOCOL_ERROR)
09/11-20:41:26.20 apache512 DAOS[10757/10786] hg   ERR  # HG -- Error -- /builddir/build/BUILD/mercury-2.0.0a1/src/mercury_bulk.c:1072
 # HG_Bulk_create(): Could not create bulk handle
09/11-20:41:26.20 apache512 DAOS[10757/10786] hg   ERR  src/cart/crt_hg.c:1445 crt_hg_bulk_create() HG_Bulk_create failed, hg_ret: 11.
09/11-20:41:26.20 apache512 DAOS[10757/10786] bulk ERR  src/cart/crt_bulk.c:137 crt_bulk_create() crt_hg_bulk_create failed, rc: -1020.
09/11-20:41:26.20 apache512 DAOS[10757/10786] object ERR  src/object/srv_obj.c:359 obj_bulk_transfer() crt_bulk_create 0 error (-1020).
09/11-20:41:26.20 apache512 DAOS[10757/10786] object ERR  src/object/srv_obj.c:992 obj_local_rw() 1155473290706288642.0.1 data transfer failed, dma 1 rc DER_HG(-1020)
09/11-20:41:26.20 apache512 DAOS[10757/10786] object ERR  src/object/srv_obj.c:96 obj_rw_complete() 1155473290706288642.0.1Fetch end failed: -1020
09/11-20:41:26.20 apache512 DAOS[10757/10777] rpc  DBUG src/cart/crt_register.c:215 crt_opc_lookup() looking up opcode: 0x2010003
09/11-20:41:26.20 apache512 DAOS[10757/10777] bio  DBUG src/bio/bio_buffer.c:824 copy_one() bio copy 0x7f8c98212380 size 24
09/11-20:41:26.20 apache512 DAOS[10757/10777] bio  DBUG src/bio/bio_buffer.c:824 copy_one() bio copy 0x7f8c981e3680 size 320
09/11-20:41:26.20 apache512 DAOS[10757/10777] vos  DBUG src/vos/vos_io.c:607 akey_fetch() akey [16] fetch single epr 9-6359751
09/11-20:41:26.20 apache512 DAOS[10757/10777] bio  DBUG src/bio/bio_buffer.c:824 copy_one() bio copy 0x7f8c98207700 size 8
09/11-20:41:26.20 apache512 DAOS[10757/10777] vos  DBUG src/vos/vos_io.c:607 akey_fetch() akey [12] fetch single epr 5-6359751
09/11-20:41:26.20 apache512 DAOS[10757/10777] bio  DBUG src/bio/bio_buffer.c:824 copy_one() bio copy 0x7f8c98241400 size 4
09/11-20:41:26.20 apache512 DAOS[10757/10777] vos  DBUG src/vos/vos_io.c:607 akey_fetch() akey [11] fetch single epr 5-6359751
09/11-20:41:26.20 apache512 DAOS[10757/10786] rpc  DBUG src/cart/crt_register.c:215 crt_opc_lookup() looking up opcode: 0x4010001
09/11-20:41:26.20 apache512 DAOS[10757/10786] object DBUG src/object/srv_obj.c:1337 ds_obj_rw_handler() overwrite epoch 1599846087457767431
09/11-20:41:26.20 apache512 DAOS[10757/10786] vos  DBUG src/vos/vos_io.c:607 akey_fetch() akey [1] fetch array epr 1599844210152833028-1599846087457767431


Re: no dRPC client set on Ubuntu 20.04.1

Gert Pauwels
 

Hi Tom,

After applying the patch you provided the error is now:
    09/11-17:47:17.20 intel-S2600WFD DAOS[11713/11713] bio  ERR  src/bio/bio_xst   09/11-17 ream.c:369 bio_spdk_env_init() Failed to init SPDK thread lib, Unknown error -1003 (-1003)

the original one was:

   DAOS[7642/7642] bio  ERR  src/bio/bio_xstream.c:367 bio_spdk_env_init() Failed to init SPDK thread lib, DER_INVAL(-1003)


The whole /tmp/daos_server looks like:

09/11-17:39:03.32 intel-S2600WFD DAOS[7642/7642] fi   INFO src/gurt/fault_inject.c:481 d_fault_inject_init() No config file, fault injection is OFF.
09/11-17:39:03.35 intel-S2600WFD DAOS[7642/7642] server INFO src/iosrv/init.c:419 set_abt_max_num_xstreams() Setting ABT_MAX_NUM_XSTREAMS to 4
09/11-17:39:03.35 intel-S2600WFD DAOS[7642/7642] server INFO src/iosrv/init.c:490 server_init() Module interface successfully initialized
09/11-17:39:03.36 intel-S2600WFD DAOS[7642/7642] server INFO src/iosrv/init.c:498 server_init() Module vos,rdb,rsvc,security,mgmt,dtx,pool,cont,obj,rebuild successfully loaded
09/11-17:39:03.36 intel-S2600WFD DAOS[7642/7642] crt  INFO src/cart/crt_init.c:269 crt_init_opt() libcart version 4.8.0 initializing
09/11-17:39:03.36 intel-S2600WFD DAOS[7642/7642] crt  WARN src/cart/crt_init.c:161 data_init() FI_UNIVERSE_SIZE was not set; setting to 2048
09/11-17:39:03.36 intel-S2600WFD DAOS[7642/7642] server INFO src/iosrv/init.c:507 server_init() Network successfully initialized
09/11-17:39:03.37 intel-S2600WFD DAOS[7642/7642] server INFO src/iosrv/init.c:516 server_init() Module vos,rdb,rsvc,security,mgmt,dtx,pool,cont,obj,rebuild successfully initialized
09/11-17:39:03.44 intel-S2600WFD DAOS[7642/7642] bio  ERR  src/bio/bio_xstream.c:367 bio_spdk_env_init() Failed to init SPDK thread lib, DER_INVAL(-1003)
09/11-17:39:03.44 intel-S2600WFD DAOS[7642/7642] server ERR  src/iosrv/init.c:521 server_init() DAOS cannot be initialized using the configured path (/mnt/daos).   Please ensure it is on a PMDK compatible file system and writeable by the current user
09/11-17:47:17.09 intel-S2600WFD DAOS[11713/11713] fi   INFO src/gurt/fault_inject.c:481 d_fault_inject_init() No config file, fault injection is OFF.
09/11-17:47:17.13 intel-S2600WFD DAOS[11713/11713] server INFO src/iosrv/init.c:419 set_abt_max_num_xstreams() Setting ABT_MAX_NUM_XSTREAMS to 4
09/11-17:47:17.13 intel-S2600WFD DAOS[11713/11713] server INFO src/iosrv/init.c:490 server_init() Module interface successfully initialized
09/11-17:47:17.13 intel-S2600WFD DAOS[11713/11713] server INFO src/iosrv/init.c:498 server_init() Module vos,rdb,rsvc,security,mgmt,dtx,pool,cont,obj,rebuild successfully loaded
09/11-17:47:17.13 intel-S2600WFD DAOS[11713/11713] crt  INFO src/cart/crt_init.c:269 crt_init_opt() libcart version 4.8.0 initializing
09/11-17:47:17.13 intel-S2600WFD DAOS[11713/11713] crt  WARN src/cart/crt_init.c:161 data_init() FI_UNIVERSE_SIZE was not set; setting to 2048
09/11-17:47:17.13 intel-S2600WFD DAOS[11713/11713] server INFO src/iosrv/init.c:507 server_init() Network successfully initialized
09/11-17:47:17.13 intel-S2600WFD DAOS[11713/11713] server INFO src/iosrv/init.c:516 server_init() Module vos,rdb,rsvc,security,mgmt,dtx,pool,cont,obj,rebuild successfully initialized
09/11-17:47:17.20 intel-S2600WFD DAOS[11713/11713] bio  ERR  src/bio/bio_xstream.c:369 bio_spdk_env_init() Failed to init SPDK thread lib, Unknown error -1003 (-1003)
09/11-17:47:17.21 intel-S2600WFD DAOS[11713/11713] server ERR  src/iosrv/init.c:521 server_init() DAOS cannot be initialized using the configured path (/mnt/daos).   Please ensure it is on a PMDK compatible file system and writeable by the current user

Rgds,

Gert


Re: no dRPC client set on Ubuntu 20.04.1

Nabarro, Tom
 

Thanks Gurt,

 

Yes, that was only to get the identify application running, which looks like it runs fine as expected given the problem looks like initializing the thread library which identify doesn’t do.

 

Regards,

Tom Nabarro – DCG/ESAD

M: +44 (0)7786 260986

Skype: tom.nabarro

 

From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of Gert Pauwels
Sent: Friday, September 11, 2020 1:41 PM
To: daos@daos.groups.io
Subject: Re: [daos] no dRPC client set on Ubuntu 20.04.1

 

Hi Tom,

I set the variable LD_LIBRARY_PATH as you indicated to spdk library path and that indeed solves the problem for the identify application. Now the identify application returns all the data as expected.

It did not make any difference  when running:

$ echo $LD_LIBRARY_PATH
/root/daos/install/prereq/dev/spdk/lib/
$ daos_server start

It still the same  error

Rgds,

Gert

---------------------------------------------------------------------
Intel Corporation (UK) Limited
Registered No. 1134945 (England)
Registered Office: Pipers Way, Swindon SN3 1RJ
VAT No: 860 2173 47

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


Re: no dRPC client set on Ubuntu 20.04.1

Nabarro, Tom
 

Hello,

 

You and Gurt are experiencing the same issue:

 

09/10-14:23:55.96 intel-S2600WFD DAOS[260088/260088] bio  ERR  src/bio/bio_xstream.c:367 bio_spdk_env_init() Failed to init SPDK thread lib, DER_INVAL(-1003)

09/10-14:23:55.97 intel-S2600WFD DAOS[260088/260088] server ERR  src/iosrv/init.c:521 server_init() DAOS cannot be initialized using the configured path (/mnt/daos).   Please ensure it is on a PMDK compatible file system and writeable by the current user

 

Which is that the SPDK thread library cannot be initialised, the thread library is not used in the identify example so that would explain why it runs fine.

 

spdk_thread_lib_init() is returning an error but unfortunately we wrap it with DER_INVAL and mask the spdk error, which needs fixing.

 

Can you please apply this patch and rerun to give us a better insight and in the meantime I will find an Ubuntu system to verify:

 

```

diff --git a/src/bio/bio_xstream.c b/src/bio/bio_xstream.c

index 758ba6d..c888047 100644

--- a/src/bio/bio_xstream.c

+++ b/src/bio/bio_xstream.c

@@ -30,6 +30,7 @@

#include <spdk/env.h>

#include <spdk/nvme.h>

#include <spdk/vmd.h>

+#include <spdk/string.h>

#include <spdk/thread.h>

#include <spdk/bdev.h>

#include <spdk/io_channel.h>

@@ -364,7 +365,8 @@ bio_spdk_env_init(void)

        rc = spdk_thread_lib_init(NULL, 0);

        if (rc != 0) {

                rc = -DER_INVAL;

-               D_ERROR("Failed to init SPDK thread lib, "DF_RC"\n", DP_RC(rc));

+               D_ERROR("Failed to init SPDK thread lib, %s (%d)\n",

+                       spdk_strerror(rc), rc);

                spdk_env_fini();

                return rc;

        }

```

 

Regards,

Tom Nabarro – DCG/ESAD

M: +44 (0)7786 260986

Skype: tom.nabarro

 

From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of Yunjae Lee
Sent: Friday, September 11, 2020 8:34 AM
To: daos@daos.groups.io
Subject: Re: [daos] no dRPC client set on Ubuntu 20.04.1

 

Same problem here, I used the master branch of daos on ubuntu 20.04.1
I also met this error when starting the daos_server:

ERROR: removing socket file: removing instance 0 socket file: no dRPC client set (data plane not started?)

The same message was printed in /tmp/daos_server.log:

bio ERR src/bio/bio_xstream.c:367 bios_spdk_env_init() Failed to init SPDK thread lib, DER_INVAL(-1003)

The identify error also happened in my case, however, it is now solved thanks to @Tom.
The results of the identify is attached, as well as the server logs and configurations i used.

---------------------------------------------------------------------
Intel Corporation (UK) Limited
Registered No. 1134945 (England)
Registered Office: Pipers Way, Swindon SN3 1RJ
VAT No: 860 2173 47

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


Re: no dRPC client set on Ubuntu 20.04.1

Gert Pauwels
 

Hi Tom,

I set the variable LD_LIBRARY_PATH as you indicated to spdk library path and that indeed solves the problem for the identify application. Now the identify application returns all the data as expected.

It did not make any difference  when running:
$ echo $LD_LIBRARY_PATH
/root/daos/install/prereq/dev/spdk/lib/
$ daos_server start
It still the same  error

Rgds,

Gert


Re: DAOS_test failed

anton.brekhov@...
 
Edited

Johann thanks! You right, and also I've run daos_test compiled from source. Now I've ran from package daos-tests (RPM of v1.0.1) and it works, but one test is stuck:
DAOS_IOD_SINGLE:NVMe
        size: 5120
[       OK ] IO2: simple update/fetch/verify (async)
[ RUN      ] IO3: i/o with variable rec size
Record size: 1 val: 'X' dkey: 707937202

And daos.log is filled with error:

09/11-14:37:23.48 sky08 DAOS[36236/36236] object ERR  src/object/cli_shard.c:216 dc_rw_cb() rpc 0x404a030 RPC 1 failed: DER_HG(-1020)

Does anyone know why it can be?