Topics

Install problem


bkatz@...
 

Hi there. I’m attempting to do an install into a Docker container running on top of CentOS 7 host. Close to the end of the process, I get the errors below and the install process aborts. Can someone shed any light on what the cause might be, and how to get around it?

Jan 30 13:27:28 daos1 journal: //usr/lib/libna.so.2: undefined reference to `fi_dupinfo@...'

Jan 30 13:27:28 daos1 journal: //usr/lib/libna.so.2: undefined reference to `fi_freeinfo@...'

Jan 30 13:27:28 daos1 journal: //usr/lib/libna.so.2: undefined reference to `fi_getinfo@...'

Jan 30 13:27:28 daos1 journal: collect2: error: ld returned 1 exit status

Jan 30 13:27:28 daos1 journal: scons: building terminated because of errors.

Jan 30 13:27:28 daos1 journal: scons: *** [build/src/tests/suite/io_conf/daos_gen_io_conf] Error 1

Jan 30 13:27:28 daos1 systemd: Scope libcontainer-29427-systemd-test-default-dependencies.scope has no PIDs. Refusing.

Jan 30 13:27:28 daos1 systemd: Scope libcontainer-29427-systemd-test-default-dependencies.scope has no PIDs. Refusing.

Jan 30 13:27:28 daos1 systemd: Created slice libcontainer_29427_systemd_test_default.slice.

Jan 30 13:27:28 daos1 systemd: Removed slice libcontainer_29427_systemd_test_default.slice.

Jan 30 13:27:28 daos1 systemd: Scope libcontainer-29443-systemd-test-default-dependencies.scope has no PIDs. Refusing.

Jan 30 13:27:28 daos1 systemd: Scope libcontainer-29443-systemd-test-default-dependencies.scope has no PIDs. Refusing.

Jan 30 13:27:28 daos1 systemd: Created slice libcontainer_29443_systemd_test_default.slice.

Jan 30 13:27:28 daos1 systemd: Removed slice libcontainer_29443_systemd_test_default.slice.

Jan 30 13:27:28 daos1 dockerd-current: time="2020-01-30T13:27:28.521914798-08:00" level=error msg="containerd: deleting container" error="exit status 1: \"container d64664c59abb475e8ee16508740f26487fb9dd2b070ff32779c2273a2ff5ea61 is not exist\\none or more of the container deletions failed\\n\""

Jan 30 13:27:28 daos1 NetworkManager[1929]: <info>  [1580419648.5699] manager: (veth8f6c8cc): new Veth device (/org/freedesktop/NetworkManager/Devices/42)

Jan 30 13:27:28 daos1 kernel: docker0: port 1(veth6dc55af) entered disabled state

Jan 30 13:27:28 daos1 kernel: docker0: port 1(veth6dc55af) entered disabled state

Jan 30 13:27:28 daos1 kernel: device veth6dc55af left promiscuous mode

Jan 30 13:27:28 daos1 kernel: docker0: port 1(veth6dc55af) entered disabled state

Jan 30 13:27:28 daos1 NetworkManager[1929]: <info>  [1580419648.6321] device (veth6dc55af): released from master device docker0

Jan 30 13:27:31 daos1 dockerd-current: time="2020-01-30T13:27:31.102343294-08:00" level=warning msg="d64664c59abb475e8ee16508740f26487fb9dd2b070ff32779c2273a2ff5ea61 cleanup: failed to unmount secrets: invalid argument"

Jan 30 13:28:17 daos1 dbus[1857]: [system] Activating service name='org.freedesktop.problems' (using servicehelper)

Jan 30 13:28:17 daos1 dbus[1857]: [system] Successfully activated service 'org.freedesktop.problems'

 


Thanks,

Bill


 



Olivier, Jeffrey V
 

Hi Bill,

 

Can you inform what version of daos you are using?  Is it latest master?   Also, do you have libfabric-devel package installed (DAOS doesn’t need this package to be installed).   Also, what build command are you using?  If you have a build log, that would also be helpful.

 

-Jeff

 

From: <daos@daos.groups.io> on behalf of Bill Katz <bkatz@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Thursday, January 30, 2020 at 2:56 PM
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: [daos] Install problem

 

Hi there. I’m attempting to do an install into a Docker container running on top of CentOS 7 host. Close to the end of the process, I get the errors below and the install process aborts. Can someone shed any light on what the cause might be, and how to get around it?

 

Jan 30 13:27:28 daos1 journal: //usr/lib/libna.so.2: undefined reference to `fi_dupinfo@...'

Jan 30 13:27:28 daos1 journal: //usr/lib/libna.so.2: undefined reference to `fi_freeinfo@...'

Jan 30 13:27:28 daos1 journal: //usr/lib/libna.so.2: undefined reference to `fi_getinfo@...'

Jan 30 13:27:28 daos1 journal: collect2: error: ld returned 1 exit status

Jan 30 13:27:28 daos1 journal: scons: building terminated because of errors.

Jan 30 13:27:28 daos1 journal: scons: *** [build/src/tests/suite/io_conf/daos_gen_io_conf] Error 1

Jan 30 13:27:28 daos1 systemd: Scope libcontainer-29427-systemd-test-default-dependencies.scope has no PIDs. Refusing.

Jan 30 13:27:28 daos1 systemd: Scope libcontainer-29427-systemd-test-default-dependencies.scope has no PIDs. Refusing.

Jan 30 13:27:28 daos1 systemd: Created slice libcontainer_29427_systemd_test_default.slice.

Jan 30 13:27:28 daos1 systemd: Removed slice libcontainer_29427_systemd_test_default.slice.

Jan 30 13:27:28 daos1 systemd: Scope libcontainer-29443-systemd-test-default-dependencies.scope has no PIDs. Refusing.

Jan 30 13:27:28 daos1 systemd: Scope libcontainer-29443-systemd-test-default-dependencies.scope has no PIDs. Refusing.

Jan 30 13:27:28 daos1 systemd: Created slice libcontainer_29443_systemd_test_default.slice.

Jan 30 13:27:28 daos1 systemd: Removed slice libcontainer_29443_systemd_test_default.slice.

Jan 30 13:27:28 daos1 dockerd-current: time="2020-01-30T13:27:28.521914798-08:00" level=error msg="containerd: deleting container" error="exit status 1: \"container d64664c59abb475e8ee16508740f26487fb9dd2b070ff32779c2273a2ff5ea61 is not exist\\none or more of the container deletions failed\\n\""

Jan 30 13:27:28 daos1 NetworkManager[1929]: <info>  [1580419648.5699] manager: (veth8f6c8cc): new Veth device (/org/freedesktop/NetworkManager/Devices/42)

Jan 30 13:27:28 daos1 kernel: docker0: port 1(veth6dc55af) entered disabled state

Jan 30 13:27:28 daos1 kernel: docker0: port 1(veth6dc55af) entered disabled state

Jan 30 13:27:28 daos1 kernel: device veth6dc55af left promiscuous mode

Jan 30 13:27:28 daos1 kernel: docker0: port 1(veth6dc55af) entered disabled state

Jan 30 13:27:28 daos1 NetworkManager[1929]: <info>  [1580419648.6321] device (veth6dc55af): released from master device docker0

Jan 30 13:27:31 daos1 dockerd-current: time="2020-01-30T13:27:31.102343294-08:00" level=warning msg="d64664c59abb475e8ee16508740f26487fb9dd2b070ff32779c2273a2ff5ea61 cleanup: failed to unmount secrets: invalid argument"

Jan 30 13:28:17 daos1 dbus[1857]: [system] Activating service name='org.freedesktop.problems' (using servicehelper)

Jan 30 13:28:17 daos1 dbus[1857]: [system] Successfully activated service 'org.freedesktop.problems'

 

 

Thanks,

Bill

 

 

 


Bill Katz <bkatz@...>
 

Thanks for the reply Jeff. I do not have libfabric-devel installed. I am installing from master. The command I ran is:

docker build -t daos -f Dockerfile.centos.7 github.com/daos-stack/daos#:utils/docker

I freely admit I'm not a Linux guru, so my apologies if I've missed something basic. 

Thanks,
Bill


Olivier, Jeffrey V
 

Hi Bill,

 

I’m able to reproduce the issue.   Now it’s just a matter of figuring out why it is happening.  I will file a ticket on it.

 

Thanks,

Jeff

 

From: daos@daos.groups.io [mailto:daos@daos.groups.io] On Behalf Of Bill Katz
Sent: Monday, February 3, 2020 12:37 PM
To: daos@daos.groups.io
Subject: Re: [daos] Install problem

 

Thanks for the reply Jeff. I do not have libfabric-devel installed. I am installing from master. The command I ran is:

docker build -t daos -f Dockerfile.centos.7 github.com/daos-stack/daos#:utils/docker

I freely admit I'm not a Linux guru, so my apologies if I've missed something basic. 

Thanks,
Bill


Olivier, Jeffrey V
 

I’ve landed a patch that should address this issue.

 

From: daos@daos.groups.io [mailto:daos@daos.groups.io] On Behalf Of Olivier, Jeffrey V
Sent: Wednesday, February 5, 2020 12:30 PM
To: daos@daos.groups.io
Cc: Olivier, Jeffrey V <jeffrey.v.olivier@...>
Subject: Re: [daos] Install problem

 

Hi Bill,

 

I’m able to reproduce the issue.   Now it’s just a matter of figuring out why it is happening.  I will file a ticket on it.

 

Thanks,

Jeff

 

From: daos@daos.groups.io [mailto:daos@daos.groups.io] On Behalf Of Bill Katz
Sent: Monday, February 3, 2020 12:37 PM
To: daos@daos.groups.io
Subject: Re: [daos] Install problem

 

Thanks for the reply Jeff. I do not have libfabric-devel installed. I am installing from master. The command I ran is:

docker build -t daos -f Dockerfile.centos.7 github.com/daos-stack/daos#:utils/docker

I freely admit I'm not a Linux guru, so my apologies if I've missed something basic. 

Thanks,
Bill


Bill Katz <bkatz@...>
 

Awesome. I'll revive my testbed machine and give it a try.

Thank you.


nicolau.manubens@...
 

Hello,

I am finding a similar error also when trying to build the DAOS docker image.

wget https://raw.githubusercontent.com/daos-stack/daos/master/utils/docker/Dockerfile.centos.7

docker build --no-cache -t daos -f ./Dockerfile.centos.7 .

The Dockerfile is being pulled from the master branch, e.g. commit 4cbb16cf8edc9ddf5c7503b4448bf897c8331ea3

The output follows:

gcc -o build/dev/gcc/src/tests/security/acl_dump_test -Wl,-rpath-link=build/dev/gcc/src/gurt -Wl,-rpath-link=build/dev/gcc/src/cart -Wl,--enable-new-dtags -Wl,-rpath-link=/home/daos/daos/build/dev/gcc/src/gurt -Wl,-rpath-link=/usr/prereq/dev/pmdk/lib -Wl,-rpath-link=/usr/prereq/dev/isal/lib -Wl,-rpath-link=/usr/prereq/dev/isal_crypto/lib -Wl,-rpath-link=/usr/prereq/dev/argobots/lib -Wl,-rpath-link=/usr/prereq/dev/protobufc/lib -Wl,-rpath-link=/usr/lib64 -Wl,-rpath=/usr/lib -Wl,-rpath=\$ORIGIN/../../home/daos/daos/build/dev/gcc/src/gurt -Wl,-rpath=\$ORIGIN/../prereq/dev/pmdk/lib -Wl,-rpath=\$ORIGIN/../prereq/dev/isal/lib -Wl,-rpath=\$ORIGIN/../prereq/dev/isal_crypto/lib -Wl,-rpath=\$ORIGIN/../prereq/dev/argobots/lib -Wl,-rpath=\$ORIGIN/../prereq/dev/protobufc/lib -Wl,-rpath=\$ORIGIN/../lib64 build/dev/gcc/src/tests/security/acl_dump_test.o -Lbuild/dev/gcc/src/gurt -Lbuild/dev/gcc/src/cart/swim -Lbuild/dev/gcc/src/cart -Lbuild/dev/gcc/src/common -L/usr/prereq/dev/pmdk/lib -L/usr/prereq/dev/isal/lib -L/usr/prereq/dev/isal_crypto/lib -Lbuild/dev/gcc/src/bio -Lbuild/dev/gcc/src/bio/smd -Lbuild/dev/gcc/src/vea -Lbuild/dev/gcc/src/vos -Lbuild/dev/gcc/src/mgmt -Lbuild/dev/gcc/src/pool -Lbuild/dev/gcc/src/container -Lbuild/dev/gcc/src/placement -Lbuild/dev/gcc/src/dtx -Lbuild/dev/gcc/src/object -Lbuild/dev/gcc/src/rebuild -Lbuild/dev/gcc/src/security -Lbuild/dev/gcc/src/client/api -Lbuild/dev/gcc/src/control -L/usr/prereq/dev/argobots/lib -L/usr/prereq/dev/protobufc/lib -lpmemobj -lisal -lisal_crypto -labt -lprotobuf-c -lhwloc -ldaos -ldaos_common -lgurt

/usr/prereq/dev/mercury/lib/libna.so.2: undefined reference to `fi_dupinfo@...'

/usr/prereq/dev/mercury/lib/libna.so.2: undefined reference to `fi_freeinfo@...'

/usr/prereq/dev/mercury/lib/libna.so.2: undefined reference to `fi_getinfo@...'

collect2: error: ld returned 1 exit status

scons: *** [build/dev/gcc/src/tests/security/acl_dump_test] Error 1

scons: building terminated because of errors.

The command '/bin/sh -c if [ "x$NOBUILD" = "x" ] ; then scons --build-deps=yes install PREFIX=/usr; fi' returned a non-zero code: 2

 

I have also tried pulling the version right after the pull request was merged, and building, with no success:


git clone https://github.com/daos-stack/daos/

cd daos

git checkout 5c887623f0013241d27b8daad1813a3444abf718

cd utils/docker

docker build --no-cache -t daos -f ./Dockerfile.centos.7 .

[...]

Step 28/34 : RUN if [ "x$NOBUILD" = "x" ] ; then scons --build-deps=yes install PREFIX=/usr; fi

 ---> Running in 6fa6d1725ecd

scons: Reading SConscript files ...

ImportError: No module named distro:

  File "/home/daos/daos/SConstruct", line 16:

    import daos_build

  File "/home/daos/daos/utils/daos_build.py", line 4:

    from env_modules import load_mpi

  File "/home/daos/daos/site_scons/env_modules.py", line 27:

    import distro



Please let me know if you have any further hints.

 

Regards,

Nicolau


maureen.jean@...
 

What version of libfabric are you using?   Try using libfabric >= 1.11

/usr/prereq/dev/mercury/lib/libna.so.2: undefined reference to `fi_dupinfo@...'

/usr/prereq/dev/mercury/lib/libna.so.2: undefined reference to `fi_freeinfo@...'

/usr/prereq/dev/mercury/lib/libna.so.2: undefined reference to `fi_getinfo@...'


nicolau.manubens@...
 
Edited

The dockerfile I am taking from master is installing libfabric 1.7 in the image. Should I modify the scons script in order to replace the libfabric version?


maureen.jean@...
 

Yes you need a later version of libfabric; preferably 1.11.   But you need a libfabric that supports ABI 1.3  (FABRIC 1.3 )


Olivier, Jeffrey V
 

The logic in utils/sl for scons should be detecting that the libfabric version installed is not suitable automatically and building a suitable version.   I’m trying it locally to see what is going on

 

-Jeff

 

From: <daos@daos.groups.io> on behalf of "maureen.jean@..." <maureen.jean@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Wednesday, November 11, 2020 at 8:14 AM
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: Re: [daos] Install problem

 

Yes you need a later version of libfabric; preferably 1.11.   But you need a libfabric that supports ABI 1.3  (FABRIC 1.3 )


nicolau.manubens@...
 

Thanks for your help.

I have tried the ubuntu and leap dockerfiles too. Leap worked fine. The ubuntu one failed with a similar error when compiling acl_dump_test. I leave a snippet of the error below.

Although I can continue with the leap one for now, it would still be good to have the centos one working for tests, as our final DAOS system will be deployed on machines with centos.

Nicolau


/usr/bin/ld: /usr/prereq/dev/mercury/lib/libmercury.so.2: undefined reference to `NA_Error_to_string'

/usr/bin/ld: /usr/prereq/dev/mercury/lib/libmercury.so.2: undefined reference to `NA_Addr_free'

/usr/bin/ld: /usr/prereq/dev/mercury/lib/libmercury.so.2: undefined reference to `NA_Mem_handle_create_segments'

/usr/bin/ld: /usr/prereq/dev/mercury/lib/libmercury.so.2: undefined reference to `NA_Op_create'

/usr/bin/ld: /usr/prereq/dev/mercury/lib/libmercury.so.2: undefined reference to `NA_Mem_handle_free'

[...]


fhoa@...
 

Did you find a workaround for this problem ? I am experiencing the same problem when trying to setup on an ubuntu 20.04.1 OS. 

Commands I tried to run:

$ Git clone https://github.com/daos-stack/daos
$ docker build --no-cache -t daos -f utils/docker/Dockerfile.ubuntu.20.04 --build-arg NOBUILD=1 .
$ docker run -it -d --privileged --name server -v ${daospath}:/home/daos/daos:Z -v /dev/hugepages:/dev/hugepages daos
$ docker exec server scons --build-deps=yes install PREFIX=/usr

This last command fails with similar error as above, namely:

"
gcc -o build/dev/gcc/src/tests/security/acl_dump_test -Wl,-rpath-link=build/dev/gcc/src/gurt -Wl,-rpath-link=build/dev/gcc/src/cart -Wl,--enable-new-dtags -Wl,-rpath-link=/home/daos/daos/build/dev/gcc/src/gurt -Wl,-rpath-link=/usr/prereq/dev/pmdk/lib -Wl,-rpath-link=/usr/prereq/dev/isal/lib -Wl,-rpath-link=/usr/prereq/dev/isal_crypto/lib -Wl,-rpath-link=/usr/prereq/dev/argobots/lib -Wl,-rpath-link=/usr/prereq/dev/protobufc/lib -Wl,-rpath-link=/usr/lib64 -Wl,-rpath=/usr/lib -Wl,-rpath=\$ORIGIN/../../home/daos/daos/build/dev/gcc/src/gurt -Wl,-rpath=\$ORIGIN/../prereq/dev/pmdk/lib -Wl,-rpath=\$ORIGIN/../prereq/dev/isal/lib -Wl,-rpath=\$ORIGIN/../prereq/dev/isal_crypto/lib -Wl,-rpath=\$ORIGIN/../prereq/dev/argobots/lib -Wl,-rpath=\$ORIGIN/../prereq/dev/protobufc/lib -Wl,-rpath=\$ORIGIN/../lib64 build/dev/gcc/src/tests/security/acl_dump_test.o -Lbuild/dev/gcc/src/gurt -Lbuild/dev/gcc/src/cart/swim -Lbuild/dev/gcc/src/cart -Lbuild/dev/gcc/src/common -L/usr/prereq/dev/pmdk/lib -L/usr/prereq/dev/isal/lib -L/usr/prereq/dev/isal_crypto/lib -Lbuild/dev/gcc/src/bio -Lbuild/dev/gcc/src/bio/smd -Lbuild/dev/gcc/src/vea -Lbuild/dev/gcc/src/vos -Lbuild/dev/gcc/src/mgmt -Lbuild/dev/gcc/src/pool -Lbuild/dev/gcc/src/container -Lbuild/dev/gcc/src/placement -Lbuild/dev/gcc/src/dtx -Lbuild/dev/gcc/src/object -Lbuild/dev/gcc/src/rebuild -Lbuild/dev/gcc/src/security -Lbuild/dev/gcc/src/client/api -Lbuild/dev/gcc/src/control -L/usr/prereq/dev/argobots/lib -L/usr/prereq/dev/protobufc/lib -lpmemobj -lisal -lisal_crypto -labt -lprotobuf-c -lhwloc -ldaos -ldaos_common -lgurt

/usr/bin/ld: warning: libna.so.2, needed by /usr/prereq/dev/mercury/lib/libmercury.so.2, not found (try using -rpath or -rpath-link)

/usr/bin/ld: /usr/prereq/dev/mercury/lib/libmercury.so.2: undefined reference to `NA_Error_to_string'

/usr/bin/ld: /usr/prereq/dev/mercury/lib/libmercury.so.2: undefined reference to `NA_Addr_free'

/usr/bin/ld: /usr/prereq/dev/mercury/lib/libmercury.so.2: undefined reference to `NA_Mem_handle_create_segments'

/usr/bin/ld: /usr/prereq/dev/mercury/lib/libmercury.so.2: undefined reference to `NA_Op_create'

/usr/bin/ld: /usr/prereq/dev/mercury/lib/libmercury.so.2: undefined reference to `NA_Mem_handle_free'

[...]

"