Re: When the network port recovers from a fault, the corresponding rank cannot receive the groupupdate message, resulting in the failure to join the group normally,rpc msg OOG.


Thank you for your recovery, your understanding is correct.
At that time our sequence of operations was: 1) Unplug the network cable 2) Kill the process corresponding to the network cable 3) Reinsert the network cable 4) Restore the rank through "dmg system start --ranks" 5) DER_OOG appears on the recovered rank log

We noticed that this problem is inevitable,

