Re: Why not update groupmap when daos_server received RASSwimRankDead


尹秋霞
 

 Thank you,mhmac!
 So the reason for passing false is to allow for efficient batching of group updates.
 But is it OK that engines don't get the newest groupmap? 
 Sometimes, there may be no engine join message for a long time. In this case, servers will not pass the newest groupmap to engines. So the groupmap versions are different between servers and engines for a long time.
 During this, some messages will still be sent to the failed engine, because the latest groupmap has not been obtained. This will cause messages to fail due to timeout or other reasons.
 What do you think about this?







At 2022-07-12 02:23:19, "Macdonald, Mjmac" <mjmac.macdonald@...> wrote:

Hi Qui.

 

I assume you are referring to this line of code: https://github.com/daos-stack/daos/blob/master/src/control/server/server_utils.go#L518

 

In this case, the false value indicates that the group update request does not need to be synchronous. You can see the request handler here: https://github.com/daos-stack/daos/blob/master/src/control/server/mgmt_system.go#L187

 

The reason for this is to allow for efficient batching of group updates during large-scale membership changes (e.g. system bringup or when many nodes are marked dead by SWIM). In this mode, the group update will happen within 500ms (maybe less, depending on when the ticker last fired).

 

Hope that helps.

mjmac

 

From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of ???
Sent: Monday, 11 July, 2022 09:19
To: daos@daos.groups.io
Subject: [daos] Why not update groupmap when daos_server received RASSwimRankDead

 

Hi, DAOS,

I found when daos_server received RASSwimRankDead,  daos_server updated membership imediately, but it put a false in reqGroupUpdate, then it would not pass the new groupmap to daos_engine. Would you tell me why?

 

Regards,

Qiu

Join daos@daos.groups.io to automatically receive all group messages.