|
Re: When the network port recovers from a fault, the corresponding rank cannot receive the groupupdate message, resulting in the failure to join the group normally,rpc msg OOG.
The cart context of the normal process needs to be initialized so that it can communicate normally with the failed recovery process.
Thank you for your reply!
The cart context of the normal process needs to be initialized so that it can communicate normally with the failed recovery process.
Thank you for your reply!
|
By
dagouxiong2015@...
·
#1642
·
|
|
Re: When the network port recovers from a fault, the corresponding rank cannot receive the groupupdate message, resulting in the failure to join the group normally,rpc msg OOG.
Thanks for the clarification. Since the engine is killed and restarted in step 4, I am not sure to understand why network contexts would need to be reinitialized. Could you please create a jira ticket
Thanks for the clarification. Since the engine is killed and restarted in step 4, I am not sure to understand why network contexts would need to be reinitialized. Could you please create a jira ticket
|
By
Lombardi, Johann
·
#1641
·
|
|
Re: When the network port recovers from a fault, the corresponding rank cannot receive the groupupdate message, resulting in the failure to join the group normally,rpc msg OOG.
Gǎnxiè nín de huīfù, nǐ de lǐjiě shì duì de. Dāngshí wǒmen zhíxíng de cāozuò shì: 1) Bá diào wǎngxiàn 2) shā diào wǎngxiàn duìyìng de jìnchéng 3) chóngxīn chārù
Gǎnxiè nín de huīfù, nǐ de lǐjiě shì duì de. Dāngshí wǒmen zhíxíng de cāozuò shì: 1) Bá diào wǎngxiàn 2) shā diào wǎngxiàn duìyìng de jìnchéng 3) chóngxīn chārù
|
By
dagouxiong2015@...
·
#1639
·
|
|
Re: When the network port recovers from a fault, the corresponding rank cannot receive the groupupdate message, resulting in the failure to join the group normally,rpc msg OOG.
Hi there,
IIUC, you are running into a bug where a DAOS engine is not able to rejoin the system / CART group if you “just” unplug & replug the network cable. You then noticed that you could
Hi there,
IIUC, you are running into a bug where a DAOS engine is not able to rejoin the system / CART group if you “just” unplug & replug the network cable. You then noticed that you could
|
By
Lombardi, Johann
·
#1638
·
|
|
When the network port recovers from a fault, the corresponding rank cannot receive the groupupdate message, resulting in the failure to join the group normally,rpc msg OOG.
We tried the mercury demo and found that initializing hg can solve this OOG problem,
and then analyzed the code of the cart,
and found that the cart context applied a global context, and it is used
We tried the mercury demo and found that initializing hg can solve this OOG problem,
and then analyzed the code of the cart,
and found that the cart context applied a global context, and it is used
|
By
dagouxiong2015@...
·
#1637
·
|
|
Re: Question about 3D Xpoint DIMM
Hi there,
The DAOS architecture won’t fundamentally change and the plan is to become more flexible in the configurations we support. We will continue to store metadata and data on different
Hi there,
The DAOS architecture won’t fundamentally change and the plan is to become more flexible in the configurations we support. We will continue to store metadata and data on different
|
By
Lombardi, Johann
·
#1636
·
|
|
Re: Question about 3D Xpoint DIMM
DAOS continues to be a strategic part of the Intel software portfolio and we remain committed to supporting our customers and the DAOS community. In parallel, we are accelerating efforts that have
DAOS continues to be a strategic part of the Intel software portfolio and we remain committed to supporting our customers and the DAOS community. In parallel, we are accelerating efforts that have
|
By
Nabarro, Tom
·
#1635
·
|
|
Question about 3D Xpoint DIMM
Intel recently announced that it will no longer provide 3D Xpoint DIMMs, so how will this affect DAOS?
Intel recently announced that it will no longer provide 3D Xpoint DIMMs, so how will this affect DAOS?
|
By
段世博
·
#1634
·
|
|
Announcement: DAOS 2.0.3 is generally available
DAOS team would like to announce the release of DAOS Version 2.0.3.
This is a maintenance release containing bug fixes and improvements.
The DAOS 2.0.3 release contains the following updates
DAOS team would like to announce the release of DAOS Version 2.0.3.
This is a maintenance release containing bug fixes and improvements.
The DAOS 2.0.3 release contains the following updates
|
By
Poddubnyy, Ivan
·
#1633
·
|
|
Would you tell me the statle leader how to start an election after becoming follower
Hi ,DAOS:
I'm looking at the raft's code.
func (r *Raft) run() {
for {
// Check if we are doing a shutdown
select {
case <-r.shutdownCh:
// Clear the leader to prevent
Hi ,DAOS:
I'm looking at the raft's code.
func (r *Raft) run() {
for {
// Check if we are doing a shutdown
select {
case <-r.shutdownCh:
// Clear the leader to prevent
|
By
尹秋霞
·
#1632
·
|
|
Re: Why not update groupmap when daos_server received RASSwimRankDead
After looking at the code I think I agree with this suggestion, otherwise the reqGroupUpdate(sync=false) call on src/control/server/server_utils.go L514 is ineffectual. Good catch!
From:
After looking at the code I think I agree with this suggestion, otherwise the reqGroupUpdate(sync=false) call on src/control/server/server_utils.go L514 is ineffectual. Good catch!
From:
|
By
Nabarro, Tom
·
#1631
·
|
|
Re: Why not update groupmap when daos_server received RASSwimRankDead
Thanks, mjmac.
I think the code
case sync := <-svc.groupUpdateReqs:
groupUpdateNeeded = true
if sync {
if err := svc.doGroupUpdate(parent, true); err != nil {
svc.log.Errorf("sync
Thanks, mjmac.
I think the code
case sync := <-svc.groupUpdateReqs:
groupUpdateNeeded = true
if sync {
if err := svc.doGroupUpdate(parent, true); err != nil {
svc.log.Errorf("sync
|
By
尹秋霞
·
#1630
·
|
|
Re: Why not update groupmap when daos_server received RASSwimRankDead
To be clear, once a group map update has been requested, it will happen within 500ms of the request. This is triggered by a timer, not by a join request. Every 500ms, the timer fires and a check
To be clear, once a group map update has been requested, it will happen within 500ms of the request. This is triggered by a timer, not by a join request. Every 500ms, the timer fires and a check
|
By
Macdonald, Mjmac
·
#1629
·
|
|
Re: Why not update groupmap when daos_server received RASSwimRankDead
Thank you,mhmac!
So the reason for passing false is to allow for efficient batching of group updates.
But is it OK that engines don't get the newest groupmap?
Sometimes, there may be no engine
Thank you,mhmac!
So the reason for passing false is to allow for efficient batching of group updates.
But is it OK that engines don't get the newest groupmap?
Sometimes, there may be no engine
|
By
尹秋霞
·
#1628
·
|
|
Re: Why not update groupmap when daos_server received RASSwimRankDead
Hi Qui.
I assume you are referring to this line of code:https://github.com/daos-stack/daos/blob/master/src/control/server/server_utils.go#L518
In this case, the false value indicates that the
Hi Qui.
I assume you are referring to this line of code:https://github.com/daos-stack/daos/blob/master/src/control/server/server_utils.go#L518
In this case, the false value indicates that the
|
By
Macdonald, Mjmac
·
#1627
·
|
|
Why not update groupmap when daos_server received RASSwimRankDead
Hi, DAOS,
I found when daos_server received RASSwimRankDead, daos_server updated membership imediately, but it put a false in reqGroupUpdate, then it would not pass the new groupmap to daos_engine.
Hi, DAOS,
I found when daos_server received RASSwimRankDead, daos_server updated membership imediately, but it put a false in reqGroupUpdate, then it would not pass the new groupmap to daos_engine.
|
By
尹秋霞
·
#1626
·
|
|
DAOS Community Update / July'22
Hi there,
Please find below the DAOS community newsletter for July 2022.
A copy of this newsletter is available on thewiki.
Past Events
ISC’22 IXPUG (June 2nd at 9am CEST)
DAOS Features
Hi there,
Please find below the DAOS community newsletter for July 2022.
A copy of this newsletter is available on thewiki.
Past Events
ISC’22 IXPUG (June 2nd at 9am CEST)
DAOS Features
|
By
Lombardi, Johann
·
#1625
·
|
|
Re: DPI_SPACE query after extending pool
Yes, that’s the reservation I mentioned, and the NVMe reservation has been removed in master and 2.2.
Thanks
-Niu
From: <daos@daos.groups.io> on behalf of "Tuffli, Chuck"
Yes, that’s the reservation I mentioned, and the NVMe reservation has been removed in master and 2.2.
Thanks
-Niu
From: <daos@daos.groups.io> on behalf of "Tuffli, Chuck"
|
By
Niu, Yawei
·
#1624
·
|
|
Re: DPI_SPACE query after extending pool
Apologies. I was wrong. The space drops with pool creation, not with container creation.
Creating a pool
# dmg pool create --ranks=0,3 --size=500G kiddie
Creating DAOS pool with automatic storage
Apologies. I was wrong. The space drops with pool creation, not with container creation.
Creating a pool
# dmg pool create --ranks=0,3 --size=500G kiddie
Creating DAOS pool with automatic storage
|
By
Tuffli, Chuck
·
#1623
·
|
|
Re: DPI_SPACE query after extending pool
Could you double check if creating container would cause NVMe free space dropping? If it’s true, please open a ticket for further investigation. I can’t think of why container creation could
Could you double check if creating container would cause NVMe free space dropping? If it’s true, please open a ticket for further investigation. I can’t think of why container creation could
|
By
Niu, Yawei
·
#1622
·
|