Re: PMIx-less bootstrapping


Nabarro, Tom
 

Li Wei is on vacation, have corrected inconsistency, default port now 10001 matching that in the example configuration files.

 

Regards,

Tom Nabarro – DCG/ESAD

M: +44 (0)7786 260986

Skype: tom.nabarro

 

From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of Patrick Farrell
Sent: Thursday, December 12, 2019 5:44 PM
To: daos-devel@...; daos@daos.groups.io
Subject: Re: [daos] PMIx-less bootstrapping

 

Li Wei,

 

Just a minor note:
The daos_server_local.yml file was updated to include the access_points: for localhost:10001, but the daos_agent defaults to just "localhost" with no port.

 

It would be convenient if daos_agent defaulted to the same thing as the simple server config.  (It took me a bit of troubleshooting to realize the lack of access_points: in my agent was the problem, since the error message doesn't indicate at all what connection was being attempted - Just that a connection to localhost failed because it didn't include a port.)

 

Obviously, your note includes enough info - I just hadn't previously needed to use daos_agent.yml for single node configurations, so it took me a bit to realize I needed it now.

 

Thanks,

-Patrick


From: daos@daos.groups.io <daos@daos.groups.io> on behalf of Li, Wei G <wei.g.li@...>
Sent: Thursday, December 5, 2019 2:02 AM
To: daos-devel@... <daos-devel@...>; daos@daos.groups.io <daos@daos.groups.io>
Subject: [daos] PMIx-less bootstrapping

 

Hi folks,

With daos-stack/daos #1092, DAOS has switched to the new PMIx-less bootstrapping. A minimal list of noteworthy things:

  1 This change requires formatting a system from scratch.

  2 One needs to pick a server, say, server X, in a system
    to be the system's access point, by setting

      access_points: ['X_host:X_port']

    where 'X_host:X_port' is server X's management address,
    in _every_ daos_server.yml and _every_ daos_agent.yml
    for this system.

  3 orterun is no longer required. If one uses it, the URI
    file is no longer required.

  4 The singleton mode is removed. DAOS_SINGLETON_CLI is
    removed. CRT_ATTACH_INFO_PATH is no longer required by
    DAOS.

  5 Restarting the full system should work. But as a
    temporary limitation, restarting some servers may cause
    existing pools covering those servers to stop working.

daos-stack.github.io has been updated. A couple more notes can be found below.

Cheers,
Li Wei

--

For item 2, an example of 3 servers and 2 clients:

  boro-1
    daos_server.yml
      access_points: ['boro-1:11111']
      port: 11111
  boro-2
    daos_server.yml
      access_points: ['boro-1:11111']
      port: 22222
  boro-3
    daos_server.yml
      access_points: ['boro-1:11111']
      port: 33333

  boro-10
    daos_agent.yml
      access_points: ['boro-1:11111']
  boro-11
    daos_agent.yml
      access_points: ['boro-1:11111']

One can request a specific rank for a server by setting the rank parameter in this server's daos_server.yml. If unspecified, ranks are assigned by the access point (i.e., the Management Service) roughly based on the order in which the servers finish formatting.

For item 5, the restarted servers may not get pool map updates from those who have not restarted.


---------------------------------------------------------------------
Intel Corporation (UK) Limited
Registered No. 1134945 (England)
Registered Office: Pipers Way, Swindon SN3 1RJ
VAT No: 860 2173 47

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

Join daos@daos.groups.io to automatically receive all group messages.