Fault domain configuration


Farrell, Patrick Arthur <patrick.farrell@...>
 

Good afternoon,

Is there any written documentation on how to configure fault domains, etc, in DAOS?  Replication, in particular, requires targets be in different fault domains, but I can't immediately find anything in the documentation on how to configure the system to provide this info.

I'm intending to work backwards from what's done in the tests, but I'm still curious if there's something written available that I've missed.

Thanks,
paf 


Farrell, Patrick Arthur <patrick.farrell@...>
 

Nevermind, I figured this out - partly.

The failure domain input has not been added yet, so for now, each target is its own failure domain.  Well, not quite 'target'.

Looking at the code closely, I'm seeing that while I specify two targets in my startup file, and my server announces it's running with two targets, which is actually meant by "targets" in the pool creation code is *ranks*.  That's what the pool service calls 'targets' when it's counting targets for determining failure domains.

This is ... really confusing.  Will this ambiguity be cleaned up when fault domains are actually input, rather than just assigned with one per rank?  It would be really nice if the code could use the term 'target' to refer to one thing, rather than transparently switching from 'targets' being storage targets to 'targets' referring to server ranks.  I'm assuming this is connected to an old assumption of a 1-to-1 association between targets and ranks.

Regardless, I should be able to proceed, just making sure to set up distinct ranks and provide them to the pool service.

-Patrick

From: daos@daos.groups.io <daos@daos.groups.io> on behalf of Farrell, Patrick Arthur <patrick.farrell@...>
Sent: Wednesday, February 26, 2020 1:15 PM
To: daos@daos.groups.io <daos@daos.groups.io>
Subject: [daos] Fault domain configuration
 
Good afternoon,

Is there any written documentation on how to configure fault domains, etc, in DAOS?  Replication, in particular, requires targets be in different fault domains, but I can't immediately find anything in the documentation on how to configure the system to provide this info.

I'm intending to work backwards from what's done in the tests, but I'm still curious if there's something written available that I've missed.

Thanks,
paf 


Lombardi, Johann
 

Hi Patrick,

 

Good point on the confusion around “target”. Let me discuss this with the team and clean this up.

 

As for how to specify the fault domain, it is supposed to be done via the yaml file:

  • fault_path: /vcdu0/rack1/hostname
    where you can specify the where this server sits in the fault domain hierarchy
  • fault_cb: ./.daos/fd_callback
    to invoke an external script that will generate the path for this server instance

While fault domains are supported in the pool map and in the placement algorithm, the control plane does not implement fault_path/fault_cb yet. We will complete this for 1.2.

 

Cheers,

Johann

 

From: <daos@daos.groups.io> on behalf of "Farrell, Patrick Arthur" <patrick.farrell@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Wednesday 26 February 2020 at 22:31
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: Re: [daos] Fault domain configuration

 

Nevermind, I figured this out - partly.

 

The failure domain input has not been added yet, so for now, each target is its own failure domain.  Well, not quite 'target'.

 

Looking at the code closely, I'm seeing that while I specify two targets in my startup file, and my server announces it's running with two targets, which is actually meant by "targets" in the pool creation code is *ranks*.  That's what the pool service calls 'targets' when it's counting targets for determining failure domains.

 

This is ... really confusing.  Will this ambiguity be cleaned up when fault domains are actually input, rather than just assigned with one per rank?  It would be really nice if the code could use the term 'target' to refer to one thing, rather than transparently switching from 'targets' being storage targets to 'targets' referring to server ranks.  I'm assuming this is connected to an old assumption of a 1-to-1 association between targets and ranks.

 

Regardless, I should be able to proceed, just making sure to set up distinct ranks and provide them to the pool service.

 

-Patrick


From: daos@daos.groups.io <daos@daos.groups.io> on behalf of Farrell, Patrick Arthur <patrick.farrell@...>
Sent: Wednesday, February 26, 2020 1:15 PM
To: daos@daos.groups.io <daos@daos.groups.io>
Subject: [daos] Fault domain configuration

 

Good afternoon,

 

Is there any written documentation on how to configure fault domains, etc, in DAOS?  Replication, in particular, requires targets be in different fault domains, but I can't immediately find anything in the documentation on how to configure the system to provide this info.

 

I'm intending to work backwards from what's done in the tests, but I'm still curious if there's something written available that I've missed.

 

Thanks,

paf 

---------------------------------------------------------------------
Intel Corporation SAS (French simplified joint stock company)
Registered headquarters: "Les Montalets"- 2, rue de Paris,
92196 Meudon Cedex, France
Registration Number:  302 456 199 R.C.S. NANTERRE
Capital: 4,572,000 Euros

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


Patrick Farrell <paf@...>
 

Sorry to leave this one for so long.

Does this mean that currently the only way to work with fault domains is to set up distinct ranks, with one domain per rank being created automatically?  (That's fine if so, I am just making sure there is not a trick I'm missing somewhere.)

Thanks,
-Patrick

From: daos@daos.groups.io <daos@daos.groups.io> on behalf of Lombardi, Johann <johann.lombardi@...>
Sent: Thursday, March 5, 2020 1:07 AM
To: daos@daos.groups.io <daos@daos.groups.io>
Subject: Re: [daos] Fault domain configuration
 

Hi Patrick,

 

Good point on the confusion around “target”. Let me discuss this with the team and clean this up.

 

As for how to specify the fault domain, it is supposed to be done via the yaml file:

  • fault_path: /vcdu0/rack1/hostname
    where you can specify the where this server sits in the fault domain hierarchy
  • fault_cb: ./.daos/fd_callback
    to invoke an external script that will generate the path for this server instance

While fault domains are supported in the pool map and in the placement algorithm, the control plane does not implement fault_path/fault_cb yet. We will complete this for 1.2.

 

Cheers,

Johann

 

From: <daos@daos.groups.io> on behalf of "Farrell, Patrick Arthur" <patrick.farrell@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Wednesday 26 February 2020 at 22:31
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: Re: [daos] Fault domain configuration

 

Nevermind, I figured this out - partly.

 

The failure domain input has not been added yet, so for now, each target is its own failure domain.  Well, not quite 'target'.

 

Looking at the code closely, I'm seeing that while I specify two targets in my startup file, and my server announces it's running with two targets, which is actually meant by "targets" in the pool creation code is *ranks*.  That's what the pool service calls 'targets' when it's counting targets for determining failure domains.

 

This is ... really confusing.  Will this ambiguity be cleaned up when fault domains are actually input, rather than just assigned with one per rank?  It would be really nice if the code could use the term 'target' to refer to one thing, rather than transparently switching from 'targets' being storage targets to 'targets' referring to server ranks.  I'm assuming this is connected to an old assumption of a 1-to-1 association between targets and ranks.

 

Regardless, I should be able to proceed, just making sure to set up distinct ranks and provide them to the pool service.

 

-Patrick


From: daos@daos.groups.io <daos@daos.groups.io> on behalf of Farrell, Patrick Arthur <patrick.farrell@...>
Sent: Wednesday, February 26, 2020 1:15 PM
To: daos@daos.groups.io <daos@daos.groups.io>
Subject: [daos] Fault domain configuration

 

Good afternoon,

 

Is there any written documentation on how to configure fault domains, etc, in DAOS?  Replication, in particular, requires targets be in different fault domains, but I can't immediately find anything in the documentation on how to configure the system to provide this info.

 

I'm intending to work backwards from what's done in the tests, but I'm still curious if there's something written available that I've missed.

 

Thanks,

paf 

---------------------------------------------------------------------
Intel Corporation SAS (French simplified joint stock company)
Registered headquarters: "Les Montalets"- 2, rue de Paris,
92196 Meudon Cedex, France
Registration Number:  302 456 199 R.C.S. NANTERRE
Capital: 4,572,000 Euros

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


Lombardi, Johann
 

Correct, each DAOS server is considered as a separate fault domain in 0.9/1.0. 1.2 will support the fault_path/fault_cb parameters.

 

Cheers,

Johann

 

From: <daos@daos.groups.io> on behalf of Patrick Farrell <paf@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Tuesday 24 March 2020 at 21:19
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: Re: [daos] Fault domain configuration

 

Sorry to leave this one for so long.

 

Does this mean that currently the only way to work with fault domains is to set up distinct ranks, with one domain per rank being created automatically?  (That's fine if so, I am just making sure there is not a trick I'm missing somewhere.)

 

Thanks,

-Patrick


From: daos@daos.groups.io <daos@daos.groups.io> on behalf of Lombardi, Johann <johann.lombardi@...>
Sent: Thursday, March 5, 2020 1:07 AM
To: daos@daos.groups.io <daos@daos.groups.io>
Subject: Re: [daos] Fault domain configuration

 

Hi Patrick,

 

Good point on the confusion around “target”. Let me discuss this with the team and clean this up.

 

As for how to specify the fault domain, it is supposed to be done via the yaml file:

·         fault_path: /vcdu0/rack1/hostname
where you can specify the where this server sits in the fault domain hierarchy

·         fault_cb: ./.daos/fd_callback
to invoke an external script that will generate the path for this server instance

While fault domains are supported in the pool map and in the placement algorithm, the control plane does not implement fault_path/fault_cb yet. We will complete this for 1.2.

 

Cheers,

Johann

 

From: <daos@daos.groups.io> on behalf of "Farrell, Patrick Arthur" <patrick.farrell@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Wednesday 26 February 2020 at 22:31
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: Re: [daos] Fault domain configuration

 

Nevermind, I figured this out - partly.

 

The failure domain input has not been added yet, so for now, each target is its own failure domain.  Well, not quite 'target'.

 

Looking at the code closely, I'm seeing that while I specify two targets in my startup file, and my server announces it's running with two targets, which is actually meant by "targets" in the pool creation code is *ranks*.  That's what the pool service calls 'targets' when it's counting targets for determining failure domains.

 

This is ... really confusing.  Will this ambiguity be cleaned up when fault domains are actually input, rather than just assigned with one per rank?  It would be really nice if the code could use the term 'target' to refer to one thing, rather than transparently switching from 'targets' being storage targets to 'targets' referring to server ranks.  I'm assuming this is connected to an old assumption of a 1-to-1 association between targets and ranks.

 

Regardless, I should be able to proceed, just making sure to set up distinct ranks and provide them to the pool service.

 

-Patrick


From: daos@daos.groups.io <daos@daos.groups.io> on behalf of Farrell, Patrick Arthur <patrick.farrell@...>
Sent: Wednesday, February 26, 2020 1:15 PM
To: daos@daos.groups.io <daos@daos.groups.io>
Subject: [daos] Fault domain configuration

 

Good afternoon,

 

Is there any written documentation on how to configure fault domains, etc, in DAOS?  Replication, in particular, requires targets be in different fault domains, but I can't immediately find anything in the documentation on how to configure the system to provide this info.

 

I'm intending to work backwards from what's done in the tests, but I'm still curious if there's something written available that I've missed.

 

Thanks,

paf 

---------------------------------------------------------------------
Intel Corporation SAS (French simplified joint stock company)
Registered headquarters: "Les Montalets"- 2, rue de Paris,
92196 Meudon Cedex, France
Registration Number:  302 456 199 R.C.S. NANTERRE
Capital: 4,572,000 Euros

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

---------------------------------------------------------------------
Intel Corporation SAS (French simplified joint stock company)
Registered headquarters: "Les Montalets"- 2, rue de Paris,
92196 Meudon Cedex, France
Registration Number:  302 456 199 R.C.S. NANTERRE
Capital: 4,572,000 Euros

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.