Why not update groupmap when daos_server received RASSwimRankDead


尹秋霞
 

Hi, DAOS,
I found when daos_server received RASSwimRankDead,  daos_server updated membership imediately, but it put a false in reqGroupUpdate, then it would not pass the new groupmap to daos_engine. Would you tell me why?

Regards,
Qiu


Macdonald, Mjmac
 

Hi Qui.

 

I assume you are referring to this line of code: https://github.com/daos-stack/daos/blob/master/src/control/server/server_utils.go#L518

 

In this case, the false value indicates that the group update request does not need to be synchronous. You can see the request handler here: https://github.com/daos-stack/daos/blob/master/src/control/server/mgmt_system.go#L187

 

The reason for this is to allow for efficient batching of group updates during large-scale membership changes (e.g. system bringup or when many nodes are marked dead by SWIM). In this mode, the group update will happen within 500ms (maybe less, depending on when the ticker last fired).

 

Hope that helps.

mjmac

 

From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of ???
Sent: Monday, 11 July, 2022 09:19
To: daos@daos.groups.io
Subject: [daos] Why not update groupmap when daos_server received RASSwimRankDead

 

Hi, DAOS,

I found when daos_server received RASSwimRankDead,  daos_server updated membership imediately, but it put a false in reqGroupUpdate, then it would not pass the new groupmap to daos_engine. Would you tell me why?

 

Regards,

Qiu


尹秋霞
 

 Thank you,mhmac!
 So the reason for passing false is to allow for efficient batching of group updates.
 But is it OK that engines don't get the newest groupmap? 
 Sometimes, there may be no engine join message for a long time. In this case, servers will not pass the newest groupmap to engines. So the groupmap versions are different between servers and engines for a long time.
 During this, some messages will still be sent to the failed engine, because the latest groupmap has not been obtained. This will cause messages to fail due to timeout or other reasons.
 What do you think about this?







At 2022-07-12 02:23:19, "Macdonald, Mjmac" <mjmac.macdonald@...> wrote:

Hi Qui.

 

I assume you are referring to this line of code: https://github.com/daos-stack/daos/blob/master/src/control/server/server_utils.go#L518

 

In this case, the false value indicates that the group update request does not need to be synchronous. You can see the request handler here: https://github.com/daos-stack/daos/blob/master/src/control/server/mgmt_system.go#L187

 

The reason for this is to allow for efficient batching of group updates during large-scale membership changes (e.g. system bringup or when many nodes are marked dead by SWIM). In this mode, the group update will happen within 500ms (maybe less, depending on when the ticker last fired).

 

Hope that helps.

mjmac

 

From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of ???
Sent: Monday, 11 July, 2022 09:19
To: daos@daos.groups.io
Subject: [daos] Why not update groupmap when daos_server received RASSwimRankDead

 

Hi, DAOS,

I found when daos_server received RASSwimRankDead,  daos_server updated membership imediately, but it put a false in reqGroupUpdate, then it would not pass the new groupmap to daos_engine. Would you tell me why?

 

Regards,

Qiu


Macdonald, Mjmac
 

To be clear, once a group map update has been requested, it will happen within 500ms of the request. This is triggered by a timer, not by a join request. Every 500ms, the timer fires and a check happens to see if a group update has been requested. Any changes to the system membership that have occurred since the last group update will be included in this new group update. The alternative is that every single rank death/join event would result in its own RPC downcall into the engine, and this would be extremely inefficient at scale.

 

Hope that helps.

mjmac

 

From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of ???
Sent: Monday, 11 July, 2022 21:04
To: daos@daos.groups.io
Subject: Re: [daos] Why not update groupmap when daos_server received RASSwimRankDead

 

 Thank youmhmac

 So the reason for passing false is to allow for efficient batching of group updates.

 But is it OK that engines don't get the newest groupmap? 

 Sometimes, there may be no engine join message for a long time. In this case, servers will not pass the newest groupmap to engines. So the groupmap versions are different between servers and engines for a long time.

 During this, some messages will still be sent to the failed engine, because the latest groupmap has not been obtained. This will cause messages to fail due to timeout or other reasons.

 What do you think about this?

 

 

 

 

 

 

 

At 2022-07-12 02:23:19, "Macdonald, Mjmac" <mjmac.macdonald@...> wrote:

Hi Qui.

 

I assume you are referring to this line of code: https://github.com/daos-stack/daos/blob/master/src/control/server/server_utils.go#L518

 

In this case, the false value indicates that the group update request does not need to be synchronous. You can see the request handler here: https://github.com/daos-stack/daos/blob/master/src/control/server/mgmt_system.go#L187

 

The reason for this is to allow for efficient batching of group updates during large-scale membership changes (e.g. system bringup or when many nodes are marked dead by SWIM). In this mode, the group update will happen within 500ms (maybe less, depending on when the ticker last fired).

 

Hope that helps.

mjmac

 

From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of ???
Sent: Monday, 11 July, 2022 09:19
To: daos@daos.groups.io
Subject: [daos] Why not update groupmap when daos_server received RASSwimRankDead

 

Hi, DAOS,

I found when daos_server received RASSwimRankDead,  daos_server updated membership imediately, but it put a false in reqGroupUpdate, then it would not pass the new groupmap to daos_engine. Would you tell me why?

 

Regards,

Qiu


尹秋霞
 

Thanks, mjmac.
I think the code 
case sync := <-svc.groupUpdateReqs:
groupUpdateNeeded = true
if sync {
if err := svc.doGroupUpdate(parent, true); err != nil {
svc.log.Errorf("sync GroupUpdate failed: %s", err)
continue
}
}
groupUpdateNeeded = false


should be like this

case sync := <-svc.groupUpdateReqs:
groupUpdateNeeded = true
if sync {
if err := svc.doGroupUpdate(parent, true); err != nil {
svc.log.Errorf("sync GroupUpdate failed: %s", err)
continue
}
                               groupUpdateNeeded = false
}





At 2022-07-13 05:50:44, "Macdonald, Mjmac" <mjmac.macdonald@...> wrote:

To be clear, once a group map update has been requested, it will happen within 500ms of the request. This is triggered by a timer, not by a join request. Every 500ms, the timer fires and a check happens to see if a group update has been requested. Any changes to the system membership that have occurred since the last group update will be included in this new group update. The alternative is that every single rank death/join event would result in its own RPC downcall into the engine, and this would be extremely inefficient at scale.

 

Hope that helps.

mjmac

 

From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of ???
Sent: Monday, 11 July, 2022 21:04
To: daos@daos.groups.io
Subject: Re: [daos] Why not update groupmap when daos_server received RASSwimRankDead

 

 Thank youmhmac

 So the reason for passing false is to allow for efficient batching of group updates.

 But is it OK that engines don't get the newest groupmap? 

 Sometimes, there may be no engine join message for a long time. In this case, servers will not pass the newest groupmap to engines. So the groupmap versions are different between servers and engines for a long time.

 During this, some messages will still be sent to the failed engine, because the latest groupmap has not been obtained. This will cause messages to fail due to timeout or other reasons.

 What do you think about this?

 

 

 

 

 

 

 

At 2022-07-12 02:23:19, "Macdonald, Mjmac" <mjmac.macdonald@...> wrote:

Hi Qui.

 

I assume you are referring to this line of code: https://github.com/daos-stack/daos/blob/master/src/control/server/server_utils.go#L518

 

In this case, the false value indicates that the group update request does not need to be synchronous. You can see the request handler here: https://github.com/daos-stack/daos/blob/master/src/control/server/mgmt_system.go#L187

 

The reason for this is to allow for efficient batching of group updates during large-scale membership changes (e.g. system bringup or when many nodes are marked dead by SWIM). In this mode, the group update will happen within 500ms (maybe less, depending on when the ticker last fired).

 

Hope that helps.

mjmac

 

From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of ???
Sent: Monday, 11 July, 2022 09:19
To: daos@daos.groups.io
Subject: [daos] Why not update groupmap when daos_server received RASSwimRankDead

 

Hi, DAOS,

I found when daos_server received RASSwimRankDead,  daos_server updated membership imediately, but it put a false in reqGroupUpdate, then it would not pass the new groupmap to daos_engine. Would you tell me why?

 

Regards,

Qiu


Nabarro, Tom
 

After looking at the code I think I agree with this suggestion, otherwise the reqGroupUpdate(sync=false) call on src/control/server/server_utils.go L514 is ineffectual. Good catch!

 

From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of ???
Sent: Wednesday, July 13, 2022 2:47 AM
To: daos@daos.groups.io
Subject: Re: [daos] Why not update groupmap when daos_server received RASSwimRankDead

 

Thanks, mjmac.

I think the code 

case sync := <-svc.groupUpdateReqs:

                                    groupUpdateNeeded = true

                                    if sync {

                                                if err := svc.doGroupUpdate(parent, true); err != nil {

                                                            svc.log.Errorf("sync GroupUpdate failed: %s", err)

                                                            continue

                                                }

                                    }

                                    groupUpdateNeeded = false

 

should be like this

 

case sync := <-svc.groupUpdateReqs:

                                    groupUpdateNeeded = true

                                    if sync {

                                                if err := svc.doGroupUpdate(parent, true); err != nil {

                                                            svc.log.Errorf("sync GroupUpdate failed: %s", err)

                                                            continue

                                                }

                               groupUpdateNeeded = false

                                    }

 

 

 

 

 

At 2022-07-13 05:50:44, "Macdonald, Mjmac" <mjmac.macdonald@...> wrote:

To be clear, once a group map update has been requested, it will happen within 500ms of the request. This is triggered by a timer, not by a join request. Every 500ms, the timer fires and a check happens to see if a group update has been requested. Any changes to the system membership that have occurred since the last group update will be included in this new group update. The alternative is that every single rank death/join event would result in its own RPC downcall into the engine, and this would be extremely inefficient at scale.

 

Hope that helps.

mjmac

 

From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of ???
Sent: Monday, 11 July, 2022 21:04
To: daos@daos.groups.io
Subject: Re: [daos] Why not update groupmap when daos_server received RASSwimRankDead

 

 Thank youmhmac

 So the reason for passing false is to allow for efficient batching of group updates.

 But is it OK that engines don't get the newest groupmap? 

 Sometimes, there may be no engine join message for a long time. In this case, servers will not pass the newest groupmap to engines. So the groupmap versions are different between servers and engines for a long time.

 During this, some messages will still be sent to the failed engine, because the latest groupmap has not been obtained. This will cause messages to fail due to timeout or other reasons.

 What do you think about this?

 

 

 

 

 

 

 

At 2022-07-12 02:23:19, "Macdonald, Mjmac" <mjmac.macdonald@...> wrote:

Hi Qui.

 

I assume you are referring to this line of code: https://github.com/daos-stack/daos/blob/master/src/control/server/server_utils.go#L518

 

In this case, the false value indicates that the group update request does not need to be synchronous. You can see the request handler here: https://github.com/daos-stack/daos/blob/master/src/control/server/mgmt_system.go#L187

 

The reason for this is to allow for efficient batching of group updates during large-scale membership changes (e.g. system bringup or when many nodes are marked dead by SWIM). In this mode, the group update will happen within 500ms (maybe less, depending on when the ticker last fired).

 

Hope that helps.

mjmac

 

From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of ???
Sent: Monday, 11 July, 2022 09:19
To: daos@daos.groups.io
Subject: [daos] Why not update groupmap when daos_server received RASSwimRankDead

 

Hi, DAOS,

I found when daos_server received RASSwimRankDead,  daos_server updated membership imediately, but it put a false in reqGroupUpdate, then it would not pass the new groupmap to daos_engine. Would you tell me why?

 

Regards,

Qiu