Issues creating a DAOS pool


Rene Salmon
 

Hi Daos list,

I am trying to bring up DAOS using various docs on the github page.  That said I am running into trouble while trying to create a DAOS Pool.

I have three DAOS servers and one client.
daos-1 = client
daos-[2-4] = servers

[user@daos-1 ~]$ orterun -np 1 --ompi-server file:/tmp/urifile.txt dmg create --size=2G

failed to create pool: -1005

-------------------------------------------------------

Primary job  terminated normally, but 1 process returned

a non-zero exit code. Per user-direction, the job has been aborted.

-------------------------------------------------------

--------------------------------------------------------------------------

orterun detected that one or more processes exited with non-zero status, thus causing

the job to be terminated. The first process to do so was:


  Process name: [[40764,1],0]

  Exit code:    1

--------------------------------------------------------------------------


Any ideas where to look for a hint?
Thanks

Rene


Chaarawi, Mohamad
 

On the issue below, it seems that the uri file that the server generates is not written to a place where the client can read:

/tmp/urifile.txt

 

I had an offline chat with Rene who will retry this after writing the uri file to a shared FS, but wanted to updated the mailing list on the issue.

 

Thanks,

Mohamad

 

From: <daos@daos.groups.io> on behalf of "Rene Salmon via Groups.Io" <salmonr@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Thursday, July 18, 2019 at 5:29 PM
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: [daos] Issues creating a DAOS pool

 

Hi Daos list,

 

I am trying to bring up DAOS using various docs on the github page.  That said I am running into trouble while trying to create a DAOS Pool.

 

I have three DAOS servers and one client.

daos-1 = client

daos-[2-4] = servers

 

[user@daos-1 ~]$ orterun -np 1 --ompi-server file:/tmp/urifile.txt dmg create --size=2G

failed to create pool: -1005

-------------------------------------------------------

Primary job  terminated normally, but 1 process returned

a non-zero exit code. Per user-direction, the job has been aborted.

-------------------------------------------------------

--------------------------------------------------------------------------

orterun detected that one or more processes exited with non-zero status, thus causing

the job to be terminated. The first process to do so was:

 

  Process name: [[40764,1],0]

  Exit code:    1

--------------------------------------------------------------------------

 

Any ideas where to look for a hint?

Thanks

 

Rene