Re: Why not update groupmap when daos_server received RASSwimRankDead
Macdonald, Mjmac
To be clear, once a group map update has been requested, it will happen within 500ms of the request. This is triggered by a timer, not by a join request. Every 500ms, the timer fires and a check happens to see if a group update has been requested. Any changes to the system membership that have occurred since the last group update will be included in this new group update. The alternative is that every single rank death/join event would result in its own RPC downcall into the engine, and this would be extremely inefficient at scale.
Hope that helps. mjmac
From: daos@daos.groups.io <daos@daos.groups.io>
On Behalf Of ???
Sent: Monday, 11 July, 2022 21:04 To: daos@daos.groups.io Subject: Re: [daos] Why not update groupmap when daos_server received RASSwimRankDead
Thank you,mhmac! So the reason for passing false is to allow for efficient batching of group updates. But is it OK that engines don't get the newest groupmap? Sometimes, there may be no engine join message for a long time. In this case, servers will not pass the newest groupmap to engines. So the groupmap versions are different between servers and engines for a long time. During this, some messages will still be sent to the failed engine, because the latest groupmap has not been obtained. This will cause messages to fail due to timeout or other reasons. What do you think about this?
At 2022-07-12 02:23:19, "Macdonald, Mjmac" <mjmac.macdonald@...> wrote:
|
|