Date   

Re: daos_test failing with Infiniband

Lombardi, Johann
 

Hi Peter,

 

Could you please advise what provider you have specified in the DAOS yaml file? Libfabric seems to be loading libucs.so which is, AFAIK, a library of UCX that we don’t support.

 

Cheers,

Johann

 

From: <daos@daos.groups.io> on behalf of Peter <magpiesaresoawesome@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Tuesday 15 December 2020 at 08:10
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: [daos] daos_test failing with Infiniband

 

Hello,

I have had issues getting DAOS to work with Infiniband, and I have been unable to diagnose the issue. I am running DAOS v1.1.1 and have tested both rpms and built from source, on Cent OS 7.
I have installed the latest mellanox drivers, and successfully ran the infiniband tests. I can run ibping between my hosts. The DAOS cluster appears to start without issue, as far as I can tell.

[daos@swat7-01 ~]$ docker exec dc_ib_auto dmg -i system query --verbose
Rank UUID                                 Control Address State  Reason
---- ----                                 --------------- -----  ------
0    c7adb803-af21-497d-aaba-5da5b8cd121f 10.0.0.63:10001 Joined
1    5333e417-47ef-4747-b4a5-241b88188092 10.0.0.64:10001 Joined
2    768f4769-e21a-44a2-b3a0-647a9a6a5f2f 10.0.0.65:10001 Joined
3    b3bb804b-e453-417b-885d-cf1bae9fa179 10.0.0.61:10001 Joined

However, when attempting to run daos_test, I receive the following error:  (I can get this test to succeed over ethernet).

[daos@swat7-01 ~]$ docker exec dc_ib_auto daos_test -i

--------------------------------------------------------------------------
WARNING: No preset parameters were found for the device that Open MPI
detected:

  Local host:            swat7-01
  Device name:           mlx5_0
  Device vendor ID:      0x02c9
  Device vendor part ID: 4123

Default device parameters will be used, which may result in lower
performance.  You can edit any of the files specified by the
btl_openib_device_param_files MCA parameter to set values for your
device.

NOTE: You can turn off this warning by setting the MCA parameter
      btl_openib_warn_no_device_params_found to 0.
--------------------------------------------------------------------------
12/15-06:55:24.37 swat7-01 DAOS[574/574] fi   INFO src/gurt/fault_inject.c:481 d_fault_inject_init() No config file, fault injection is OFF.
12/15-06:55:24.37 swat7-01 DAOS[574/574] daos INFO src/common/drpc.c:717 drpc_close() Closing dRPC socket fd=32
12/15-06:55:24.37 swat7-01 DAOS[574/574] mgmt INFO src/mgmt/cli_mgmt.c:523 dc_mgmt_net_cfg() Using client provided OFI_INTERFACE: ib0
12/15-06:55:24.37 swat7-01 DAOS[574/574] crt  INFO src/cart/crt_init.c:269 crt_init_opt() libcart version 4.8.0 initializing
12/15-06:55:24.37 swat7-01 DAOS[574/574] crt  WARN src/cart/crt_init.c:161 data_init() FI_UNIVERSE_SIZE was not set; setting to 2048
12/15-06:55:24.37 swat7-01 DAOS[574/574] crt  WARN src/cart/crt_init.c:380 crt_init_opt() FI_OFI_RXM_USE_SRX not set, set=1
12/15-06:55:24.40 swat7-01 DAOS[574/574] external ERR  # NA -- Error -- /home/daos/daos/build/external/dev/mercury/src/na/na_ofi.c:2064
 # na_ofi_basic_ep_open(): fi_enable() failed, rc: -12 (Cannot allocate memory)
12/15-06:55:24.40 swat7-01 DAOS[574/574] external ERR  # NA -- Error -- /home/daos/daos/build/external/dev/mercury/src/na/na_ofi.c:1981
 # na_ofi_endpoint_open(): na_ofi_basic_ep_open() failed
[swat7-01:574  :0:574] Caught signal 11 (Segmentation fault: address not mapped to object at address 0xc)
==== backtrace ====
    0  /lib64/libucs.so.0(+0x17970) [0x7f1f66279970]
    1  /lib64/libucs.so.0(+0x17b22) [0x7f1f66279b22]
    2  /home/daos/daos/install/bin/../lib64/../prereq/dev/mercury/lib/../../ofi/lib/libfabric.so.1(fi_log_enabled+0x13) [0x7f1f7a3c49b3]
    3  /home/daos/daos/install/bin/../lib64/../prereq/dev/mercury/lib/../../ofi/lib/libfabric.so.1(+0x7353e) [0x7f1f7a41e53e]
    4  /home/daos/daos/install/bin/../lib64/../prereq/dev/mercury/lib/../../ofi/lib/libfabric.so.1(+0x7459c) [0x7f1f7a41f59c]
    5  /home/daos/daos/install/bin/../lib64/../prereq/dev/mercury/lib/libna.so.2(+0xc3ec) [0x7f1f7bdd63ec]
    6  /home/daos/daos/install/bin/../lib64/../prereq/dev/mercury/lib/libna.so.2(+0xd44d) [0x7f1f7bdd744d]
    7  /home/daos/daos/install/bin/../lib64/../prereq/dev/mercury/lib/libna.so.2(NA_Initialize_opt+0x3bf) [0x7f1f7bdce0cf]
    8  /home/daos/daos/install/bin/../lib64/../prereq/dev/mercury/lib/libmercury.so.2(HG_Core_init_opt+0xef) [0x7f1f7bff862f]
    9  /home/daos/daos/install/bin/../lib64/../prereq/dev/mercury/lib/libmercury.so.2(HG_Init_opt+0x6f) [0x7f1f7bfefdbf]
   10  /home/daos/daos/install/bin/../lib64/libcart.so.4(+0x4b211) [0x7f1f7e239211]
   11  /home/daos/daos/install/bin/../lib64/libcart.so.4(crt_hg_ctx_init+0x388) [0x7f1f7e23a548]
   12  /home/daos/daos/install/bin/../lib64/libcart.so.4(crt_context_create+0x3dd) [0x7f1f7e207d8d]
   13  /home/daos/daos/install/bin/../lib64/libdaos.so.0(daos_eq_lib_init+0x1fc) [0x7f1f7eb4776c]
   14  /home/daos/daos/install/bin/../lib64/libdaos.so.0(daos_init+0x184) [0x7f1f7eb4b3f4]
   15  daos_test() [0x407baf]
   16  /lib64/libc.so.6(__libc_start_main+0xf5) [0x7f1f7d511555]
   17  daos_test() [0x409050]

Would anyone happen to know what is causing this error, and how I could fix it?

Thank you, I appreciate any help.

Best,
Peter

---------------------------------------------------------------------
Intel Corporation SAS (French simplified joint stock company)
Registered headquarters: "Les Montalets"- 2, rue de Paris,
92196 Meudon Cedex, France
Registration Number:  302 456 199 R.C.S. NANTERRE
Capital: 4,572,000 Euros

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


daos_test failing with Infiniband

Peter
 

Hello,

I have had issues getting DAOS to work with Infiniband, and I have been unable to diagnose the issue. I am running DAOS v1.1.1 and have tested both rpms and built from source, on Cent OS 7.
I have installed the latest mellanox drivers, and successfully ran the infiniband tests. I can run ibping between my hosts. The DAOS cluster appears to start without issue, as far as I can tell.

[daos@swat7-01 ~]$ docker exec dc_ib_auto dmg -i system query --verbose
Rank UUID                                 Control Address State  Reason
---- ----                                 --------------- -----  ------
0    c7adb803-af21-497d-aaba-5da5b8cd121f 10.0.0.63:10001 Joined
1    5333e417-47ef-4747-b4a5-241b88188092 10.0.0.64:10001 Joined
2    768f4769-e21a-44a2-b3a0-647a9a6a5f2f 10.0.0.65:10001 Joined
3    b3bb804b-e453-417b-885d-cf1bae9fa179 10.0.0.61:10001 Joined

However, when attempting to run daos_test, I receive the following error:  (I can get this test to succeed over ethernet).

[daos@swat7-01 ~]$ docker exec dc_ib_auto daos_test -i

--------------------------------------------------------------------------
WARNING: No preset parameters were found for the device that Open MPI
detected:

  Local host:            swat7-01
  Device name:           mlx5_0
  Device vendor ID:      0x02c9
  Device vendor part ID: 4123

Default device parameters will be used, which may result in lower
performance.  You can edit any of the files specified by the
btl_openib_device_param_files MCA parameter to set values for your
device.

NOTE: You can turn off this warning by setting the MCA parameter
      btl_openib_warn_no_device_params_found to 0.
--------------------------------------------------------------------------
12/15-06:55:24.37 swat7-01 DAOS[574/574] fi   INFO src/gurt/fault_inject.c:481 d_fault_inject_init() No config file, fault injection is OFF.
12/15-06:55:24.37 swat7-01 DAOS[574/574] daos INFO src/common/drpc.c:717 drpc_close() Closing dRPC socket fd=32
12/15-06:55:24.37 swat7-01 DAOS[574/574] mgmt INFO src/mgmt/cli_mgmt.c:523 dc_mgmt_net_cfg() Using client provided OFI_INTERFACE: ib0
12/15-06:55:24.37 swat7-01 DAOS[574/574] crt  INFO src/cart/crt_init.c:269 crt_init_opt() libcart version 4.8.0 initializing
12/15-06:55:24.37 swat7-01 DAOS[574/574] crt  WARN src/cart/crt_init.c:161 data_init() FI_UNIVERSE_SIZE was not set; setting to 2048
12/15-06:55:24.37 swat7-01 DAOS[574/574] crt  WARN src/cart/crt_init.c:380 crt_init_opt() FI_OFI_RXM_USE_SRX not set, set=1
12/15-06:55:24.40 swat7-01 DAOS[574/574] external ERR  # NA -- Error -- /home/daos/daos/build/external/dev/mercury/src/na/na_ofi.c:2064
 # na_ofi_basic_ep_open(): fi_enable() failed, rc: -12 (Cannot allocate memory)
12/15-06:55:24.40 swat7-01 DAOS[574/574] external ERR  # NA -- Error -- /home/daos/daos/build/external/dev/mercury/src/na/na_ofi.c:1981
 # na_ofi_endpoint_open(): na_ofi_basic_ep_open() failed
[swat7-01:574  :0:574] Caught signal 11 (Segmentation fault: address not mapped to object at address 0xc)
==== backtrace ====
    0  /lib64/libucs.so.0(+0x17970) [0x7f1f66279970]
    1  /lib64/libucs.so.0(+0x17b22) [0x7f1f66279b22]
    2  /home/daos/daos/install/bin/../lib64/../prereq/dev/mercury/lib/../../ofi/lib/libfabric.so.1(fi_log_enabled+0x13) [0x7f1f7a3c49b3]
    3  /home/daos/daos/install/bin/../lib64/../prereq/dev/mercury/lib/../../ofi/lib/libfabric.so.1(+0x7353e) [0x7f1f7a41e53e]
    4  /home/daos/daos/install/bin/../lib64/../prereq/dev/mercury/lib/../../ofi/lib/libfabric.so.1(+0x7459c) [0x7f1f7a41f59c]
    5  /home/daos/daos/install/bin/../lib64/../prereq/dev/mercury/lib/libna.so.2(+0xc3ec) [0x7f1f7bdd63ec]
    6  /home/daos/daos/install/bin/../lib64/../prereq/dev/mercury/lib/libna.so.2(+0xd44d) [0x7f1f7bdd744d]
    7  /home/daos/daos/install/bin/../lib64/../prereq/dev/mercury/lib/libna.so.2(NA_Initialize_opt+0x3bf) [0x7f1f7bdce0cf]
    8  /home/daos/daos/install/bin/../lib64/../prereq/dev/mercury/lib/libmercury.so.2(HG_Core_init_opt+0xef) [0x7f1f7bff862f]
    9  /home/daos/daos/install/bin/../lib64/../prereq/dev/mercury/lib/libmercury.so.2(HG_Init_opt+0x6f) [0x7f1f7bfefdbf]
   10  /home/daos/daos/install/bin/../lib64/libcart.so.4(+0x4b211) [0x7f1f7e239211]
   11  /home/daos/daos/install/bin/../lib64/libcart.so.4(crt_hg_ctx_init+0x388) [0x7f1f7e23a548]
   12  /home/daos/daos/install/bin/../lib64/libcart.so.4(crt_context_create+0x3dd) [0x7f1f7e207d8d]
   13  /home/daos/daos/install/bin/../lib64/libdaos.so.0(daos_eq_lib_init+0x1fc) [0x7f1f7eb4776c]
   14  /home/daos/daos/install/bin/../lib64/libdaos.so.0(daos_init+0x184) [0x7f1f7eb4b3f4]
   15  daos_test() [0x407baf]
   16  /lib64/libc.so.6(__libc_start_main+0xf5) [0x7f1f7d511555]
   17  daos_test() [0x409050]

Would anyone happen to know what is causing this error, and how I could fix it?

Thank you, I appreciate any help.

Best,
Peter


Re: Error on simple test on POSIX container

Yunjae Lee
 

Hi Johann,

I've seen the problem also in v1.1.2 on Ubuntu 20.04.
I reinstalled CentOS 7.7 on the server machine, and as your experiment showed, the problem has gone now.
I guess there is a compatibility issue with Ubuntu kernel or FUSE version?

Thanks,
Yunjae


Re: Error on simple test on POSIX container

Lombardi, Johann
 

Hi there,

 

The fact that you can only reproduce this mercury/transport error with dfuse and not DFS is interesting.

I have just tried on CentOS and couldn’t reproduce this on latest master. I might have to try with Ubuntu …

 

Cheers,

Johann

 

From: <daos@daos.groups.io> on behalf of Yunjae Lee <lyj7694@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Tuesday 8 December 2020 at 15:24
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: Re: [daos] Error on simple test on POSIX container

 

Hi Johann,

Yes, I'm using "ofi+verbs;ofi_rxm".

I guess the problem is independent to the DFS, since issuing small DFS IO showed no errors.


Thanks,
Yunjae

---------------------------------------------------------------------
Intel Corporation SAS (French simplified joint stock company)
Registered headquarters: "Les Montalets"- 2, rue de Paris,
92196 Meudon Cedex, France
Registration Number:  302 456 199 R.C.S. NANTERRE
Capital: 4,572,000 Euros

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


Auto-generation of server config file

Nabarro, Tom
 

Given a host set with homogenous hardware configuration it should now be possible to generate an optimal server configuration file for DAOS using the command ‘dmg config generate’, more details are in the admin guide:

https://daos-stack.github.io/admin/deployment/#auto-generate-configuration-file

 

Please don’t hesitate to give feedback and ideas for improvement, thanks.

 

---------------------------------------------------------------------
Intel Corporation (UK) Limited
Registered No. 1134945 (England)
Registered Office: Pipers Way, Swindon SN3 1RJ
VAT No: 860 2173 47

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


Re: Error on simple test on POSIX container

Yunjae Lee
 

Hi Johann,

Yes, I'm using "ofi+verbs;ofi_rxm".

I guess the problem is independent to the DFS, since issuing small DFS IO showed no errors.


Thanks,
Yunjae


Re: Error on simple test on POSIX container

Lombardi, Johann
 

Hi there,

 

I assume that you are using “ofi+verbs;ofi_rxm” as the provider, right?

 

Cheers,

Johann

 

From: <daos@daos.groups.io> on behalf of Yunjae Lee <lyj7694@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Tuesday 1 December 2020 at 06:45
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: Re: [daos] Error on simple test on POSIX container

 

It seems to be related to the size of the file.
When creating a file smaller than 4k, reading the file using cat fails.

---------------------------------------------------------------------
Intel Corporation SAS (French simplified joint stock company)
Registered headquarters: "Les Montalets"- 2, rue de Paris,
92196 Meudon Cedex, France
Registration Number:  302 456 199 R.C.S. NANTERRE
Capital: 4,572,000 Euros

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


Build change in latest master

Olivier, Jeffrey V
 

The default BUILD_TYPE setting in the latest master has been changed to ‘release’ to remove our fault injection code by default.

 

Tests that rely on this feature will auto skip themselves if DAOS is not compiled with fault injection enabled.

 

You will notice if you update to the latest master that DAOS will think you need to recompile your prerequisities.  This is because the default TARGET_TYPE is set to ‘default’ which means it will follow the BUILD_TYPE setting and will want to compile release prerequisites.

 

If you don’t want to use release build (or release prerequisites), you can simply change either or both of these options.

 

scons BUILD_TYPE=dev will restore the prior behavior.  This option is sticky meaning it is saved in your daos directory in daos.conf so only needs to be specified once.

 

-Jeff


Re: Error on simple test on POSIX container

Yunjae Lee
 

It seems to be related to the size of the file.
When creating a file smaller than 4k, reading the file using cat fails.


Error on simple test on POSIX container

Yunjae Lee
 

Hi,

I created a POSIX container and mounted at /mnt/dfuse on the client node,
and ran the following command:
```
# echo "foo" > /mnt/dfuse/bar
# cat /mnt/dfuse/bar
```

But it gives me the following error repeated infinitely.
object ERR src/object/cli_shard.c:631 dc_rw_cb() rpc 0x7ffa3801d6e0 opc 1 to rank 0 tag 7 failed: DER_HG(-1020): 'Transport layer mercury error'
OS: Ubuntu 20.04
Network: Infiniband with MOFED 5.0-2
DAOS version: c20c47 (commit at 2020-11-28)


DUG'20 slides are available!

Lombardi, Johann
 

Hi there,

 

I have posted all the DUG presentations on the wiki (see https://wiki.hpdd.intel.com/display/DC/DUG20)

We need some more time for the video recordings that will be published on our YouTube channel (i.e. https://www.youtube.com/channel/UCVP4e_UTnSJg15Cm80UtNwg) when ready.

 

Cheers,

Johann

---------------------------------------------------------------------
Intel Corporation SAS (French simplified joint stock company)
Registered headquarters: "Les Montalets"- 2, rue de Paris,
92196 Meudon Cedex, France
Registration Number:  302 456 199 R.C.S. NANTERRE
Capital: 4,572,000 Euros

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


Re: DUG'20 Agenda Online

Carrier, John
 

Note that the time for DUG listed in the SC2020 schedule is not correct.  Please use the webex info below.

 

From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of Lombardi, Johann
Sent: Tuesday, November 17, 2020 11:29 PM
To: daos@daos.groups.io
Subject: Re: [daos] DUG'20 Agenda Online

 

Just a reminder that the DUG’20 is tomorrow.

 

Hope to see you there!

 

Cheers,

Johann

 

From: <daos@daos.groups.io> on behalf of "Lombardi, Johann" <johann.lombardi@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Thursday 15 October 2020 at 09:51
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: [daos] DUG'20 Agenda Online

 

Hi there,

 

Please note that the agenda for the 4th annual DAOS User Group meeting is now available online:

https://wiki.hpdd.intel.com/display/DC/DUG20

 

I am very excited by the diversity and number of presentations this year. A big thank you to all the presenters.

 

As a reminder, the DUG is virtual this year:

-          On Nov 19

-          Starts at 7:30am Pacific / 8:30am Mountain / 9:30am Central / 4:30pm CET / 11:30pm China

-          3h30 of live presentations

-          Please see instructions on how to join in the webex invite

 

We also encourage everyone to join the #community slack channel for side discussions between attendees/presenters after the event.

 

Hope to see you there!

 

Best regards,

Johann

 

---------------------------------------------------------------------
Intel Corporation SAS (French simplified joint stock company)
Registered headquarters: "Les Montalets"- 2, rue de Paris,
92196 Meudon Cedex, France
Registration Number:  302 456 199 R.C.S. NANTERRE
Capital: 4,572,000 Euros

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


Re: DUG'20 Agenda Online

Lombardi, Johann
 

Just a reminder that the DUG’20 is tomorrow.

 

Hope to see you there!

 

Cheers,

Johann

 

From: <daos@daos.groups.io> on behalf of "Lombardi, Johann" <johann.lombardi@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Thursday 15 October 2020 at 09:51
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: [daos] DUG'20 Agenda Online

 

Hi there,

 

Please note that the agenda for the 4th annual DAOS User Group meeting is now available online:

https://wiki.hpdd.intel.com/display/DC/DUG20

 

I am very excited by the diversity and number of presentations this year. A big thank you to all the presenters.

 

As a reminder, the DUG is virtual this year:

-          On Nov 19

-          Starts at 7:30am Pacific / 8:30am Mountain / 9:30am Central / 4:30pm CET / 11:30pm China

-          3h30 of live presentations

-          Please see instructions on how to join in the webex invite

 

We also encourage everyone to join the #community slack channel for side discussions between attendees/presenters after the event.

 

Hope to see you there!

 

Best regards,

Johann

 

---------------------------------------------------------------------
Intel Corporation SAS (French simplified joint stock company)
Registered headquarters: "Les Montalets"- 2, rue de Paris,
92196 Meudon Cedex, France
Registration Number:  302 456 199 R.C.S. NANTERRE
Capital: 4,572,000 Euros

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


Re: Install problem

fhoa@...
 

Did you find a workaround for this problem ? I am experiencing the same problem when trying to setup on an ubuntu 20.04.1 OS. 

Commands I tried to run:

$ Git clone https://github.com/daos-stack/daos
$ docker build --no-cache -t daos -f utils/docker/Dockerfile.ubuntu.20.04 --build-arg NOBUILD=1 .
$ docker run -it -d --privileged --name server -v ${daospath}:/home/daos/daos:Z -v /dev/hugepages:/dev/hugepages daos
$ docker exec server scons --build-deps=yes install PREFIX=/usr

This last command fails with similar error as above, namely:

"
gcc -o build/dev/gcc/src/tests/security/acl_dump_test -Wl,-rpath-link=build/dev/gcc/src/gurt -Wl,-rpath-link=build/dev/gcc/src/cart -Wl,--enable-new-dtags -Wl,-rpath-link=/home/daos/daos/build/dev/gcc/src/gurt -Wl,-rpath-link=/usr/prereq/dev/pmdk/lib -Wl,-rpath-link=/usr/prereq/dev/isal/lib -Wl,-rpath-link=/usr/prereq/dev/isal_crypto/lib -Wl,-rpath-link=/usr/prereq/dev/argobots/lib -Wl,-rpath-link=/usr/prereq/dev/protobufc/lib -Wl,-rpath-link=/usr/lib64 -Wl,-rpath=/usr/lib -Wl,-rpath=\$ORIGIN/../../home/daos/daos/build/dev/gcc/src/gurt -Wl,-rpath=\$ORIGIN/../prereq/dev/pmdk/lib -Wl,-rpath=\$ORIGIN/../prereq/dev/isal/lib -Wl,-rpath=\$ORIGIN/../prereq/dev/isal_crypto/lib -Wl,-rpath=\$ORIGIN/../prereq/dev/argobots/lib -Wl,-rpath=\$ORIGIN/../prereq/dev/protobufc/lib -Wl,-rpath=\$ORIGIN/../lib64 build/dev/gcc/src/tests/security/acl_dump_test.o -Lbuild/dev/gcc/src/gurt -Lbuild/dev/gcc/src/cart/swim -Lbuild/dev/gcc/src/cart -Lbuild/dev/gcc/src/common -L/usr/prereq/dev/pmdk/lib -L/usr/prereq/dev/isal/lib -L/usr/prereq/dev/isal_crypto/lib -Lbuild/dev/gcc/src/bio -Lbuild/dev/gcc/src/bio/smd -Lbuild/dev/gcc/src/vea -Lbuild/dev/gcc/src/vos -Lbuild/dev/gcc/src/mgmt -Lbuild/dev/gcc/src/pool -Lbuild/dev/gcc/src/container -Lbuild/dev/gcc/src/placement -Lbuild/dev/gcc/src/dtx -Lbuild/dev/gcc/src/object -Lbuild/dev/gcc/src/rebuild -Lbuild/dev/gcc/src/security -Lbuild/dev/gcc/src/client/api -Lbuild/dev/gcc/src/control -L/usr/prereq/dev/argobots/lib -L/usr/prereq/dev/protobufc/lib -lpmemobj -lisal -lisal_crypto -labt -lprotobuf-c -lhwloc -ldaos -ldaos_common -lgurt

/usr/bin/ld: warning: libna.so.2, needed by /usr/prereq/dev/mercury/lib/libmercury.so.2, not found (try using -rpath or -rpath-link)

/usr/bin/ld: /usr/prereq/dev/mercury/lib/libmercury.so.2: undefined reference to `NA_Error_to_string'

/usr/bin/ld: /usr/prereq/dev/mercury/lib/libmercury.so.2: undefined reference to `NA_Addr_free'

/usr/bin/ld: /usr/prereq/dev/mercury/lib/libmercury.so.2: undefined reference to `NA_Mem_handle_create_segments'

/usr/bin/ld: /usr/prereq/dev/mercury/lib/libmercury.so.2: undefined reference to `NA_Op_create'

/usr/bin/ld: /usr/prereq/dev/mercury/lib/libmercury.so.2: undefined reference to `NA_Mem_handle_free'

[...]

"



Re: Install problem

nicolau.manubens@...
 

Thanks for your help.

I have tried the ubuntu and leap dockerfiles too. Leap worked fine. The ubuntu one failed with a similar error when compiling acl_dump_test. I leave a snippet of the error below.

Although I can continue with the leap one for now, it would still be good to have the centos one working for tests, as our final DAOS system will be deployed on machines with centos.

Nicolau


/usr/bin/ld: /usr/prereq/dev/mercury/lib/libmercury.so.2: undefined reference to `NA_Error_to_string'

/usr/bin/ld: /usr/prereq/dev/mercury/lib/libmercury.so.2: undefined reference to `NA_Addr_free'

/usr/bin/ld: /usr/prereq/dev/mercury/lib/libmercury.so.2: undefined reference to `NA_Mem_handle_create_segments'

/usr/bin/ld: /usr/prereq/dev/mercury/lib/libmercury.so.2: undefined reference to `NA_Op_create'

/usr/bin/ld: /usr/prereq/dev/mercury/lib/libmercury.so.2: undefined reference to `NA_Mem_handle_free'

[...]


Re: Install problem

Olivier, Jeffrey V
 

The logic in utils/sl for scons should be detecting that the libfabric version installed is not suitable automatically and building a suitable version.   I’m trying it locally to see what is going on

 

-Jeff

 

From: <daos@daos.groups.io> on behalf of "maureen.jean@..." <maureen.jean@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Wednesday, November 11, 2020 at 8:14 AM
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: Re: [daos] Install problem

 

Yes you need a later version of libfabric; preferably 1.11.   But you need a libfabric that supports ABI 1.3  (FABRIC 1.3 )


Re: Install problem

maureen.jean@...
 

Yes you need a later version of libfabric; preferably 1.11.   But you need a libfabric that supports ABI 1.3  (FABRIC 1.3 )


Re: Install problem

nicolau.manubens@...
 
Edited

The dockerfile I am taking from master is installing libfabric 1.7 in the image. Should I modify the scons script in order to replace the libfabric version?


Re: Install problem

maureen.jean@...
 

What version of libfabric are you using?   Try using libfabric >= 1.11

/usr/prereq/dev/mercury/lib/libna.so.2: undefined reference to `fi_dupinfo@...'

/usr/prereq/dev/mercury/lib/libna.so.2: undefined reference to `fi_freeinfo@...'

/usr/prereq/dev/mercury/lib/libna.so.2: undefined reference to `fi_getinfo@...'


Re: Install problem

nicolau.manubens@...
 

Hello,

I am finding a similar error also when trying to build the DAOS docker image.

wget https://raw.githubusercontent.com/daos-stack/daos/master/utils/docker/Dockerfile.centos.7

docker build --no-cache -t daos -f ./Dockerfile.centos.7 .

The Dockerfile is being pulled from the master branch, e.g. commit 4cbb16cf8edc9ddf5c7503b4448bf897c8331ea3

The output follows:

gcc -o build/dev/gcc/src/tests/security/acl_dump_test -Wl,-rpath-link=build/dev/gcc/src/gurt -Wl,-rpath-link=build/dev/gcc/src/cart -Wl,--enable-new-dtags -Wl,-rpath-link=/home/daos/daos/build/dev/gcc/src/gurt -Wl,-rpath-link=/usr/prereq/dev/pmdk/lib -Wl,-rpath-link=/usr/prereq/dev/isal/lib -Wl,-rpath-link=/usr/prereq/dev/isal_crypto/lib -Wl,-rpath-link=/usr/prereq/dev/argobots/lib -Wl,-rpath-link=/usr/prereq/dev/protobufc/lib -Wl,-rpath-link=/usr/lib64 -Wl,-rpath=/usr/lib -Wl,-rpath=\$ORIGIN/../../home/daos/daos/build/dev/gcc/src/gurt -Wl,-rpath=\$ORIGIN/../prereq/dev/pmdk/lib -Wl,-rpath=\$ORIGIN/../prereq/dev/isal/lib -Wl,-rpath=\$ORIGIN/../prereq/dev/isal_crypto/lib -Wl,-rpath=\$ORIGIN/../prereq/dev/argobots/lib -Wl,-rpath=\$ORIGIN/../prereq/dev/protobufc/lib -Wl,-rpath=\$ORIGIN/../lib64 build/dev/gcc/src/tests/security/acl_dump_test.o -Lbuild/dev/gcc/src/gurt -Lbuild/dev/gcc/src/cart/swim -Lbuild/dev/gcc/src/cart -Lbuild/dev/gcc/src/common -L/usr/prereq/dev/pmdk/lib -L/usr/prereq/dev/isal/lib -L/usr/prereq/dev/isal_crypto/lib -Lbuild/dev/gcc/src/bio -Lbuild/dev/gcc/src/bio/smd -Lbuild/dev/gcc/src/vea -Lbuild/dev/gcc/src/vos -Lbuild/dev/gcc/src/mgmt -Lbuild/dev/gcc/src/pool -Lbuild/dev/gcc/src/container -Lbuild/dev/gcc/src/placement -Lbuild/dev/gcc/src/dtx -Lbuild/dev/gcc/src/object -Lbuild/dev/gcc/src/rebuild -Lbuild/dev/gcc/src/security -Lbuild/dev/gcc/src/client/api -Lbuild/dev/gcc/src/control -L/usr/prereq/dev/argobots/lib -L/usr/prereq/dev/protobufc/lib -lpmemobj -lisal -lisal_crypto -labt -lprotobuf-c -lhwloc -ldaos -ldaos_common -lgurt

/usr/prereq/dev/mercury/lib/libna.so.2: undefined reference to `fi_dupinfo@...'

/usr/prereq/dev/mercury/lib/libna.so.2: undefined reference to `fi_freeinfo@...'

/usr/prereq/dev/mercury/lib/libna.so.2: undefined reference to `fi_getinfo@...'

collect2: error: ld returned 1 exit status

scons: *** [build/dev/gcc/src/tests/security/acl_dump_test] Error 1

scons: building terminated because of errors.

The command '/bin/sh -c if [ "x$NOBUILD" = "x" ] ; then scons --build-deps=yes install PREFIX=/usr; fi' returned a non-zero code: 2

 

I have also tried pulling the version right after the pull request was merged, and building, with no success:


git clone https://github.com/daos-stack/daos/

cd daos

git checkout 5c887623f0013241d27b8daad1813a3444abf718

cd utils/docker

docker build --no-cache -t daos -f ./Dockerfile.centos.7 .

[...]

Step 28/34 : RUN if [ "x$NOBUILD" = "x" ] ; then scons --build-deps=yes install PREFIX=/usr; fi

 ---> Running in 6fa6d1725ecd

scons: Reading SConscript files ...

ImportError: No module named distro:

  File "/home/daos/daos/SConstruct", line 16:

    import daos_build

  File "/home/daos/daos/utils/daos_build.py", line 4:

    from env_modules import load_mpi

  File "/home/daos/daos/site_scons/env_modules.py", line 27:

    import distro



Please let me know if you have any further hints.

 

Regards,

Nicolau

141 - 160 of 1438