Re: When the network port recovers from a fault, the corresponding rank cannot receive the groupupdate message, resulting in the failure to join the group normally,rpc msg OOG.
dagouxiong2015@...
The cart context of the normal process needs to be initialized so that it can communicate normally with the failed recovery process.
Thank you for your reply!
|
|
Re: When the network port recovers from a fault, the corresponding rank cannot receive the groupupdate message, resulting in the failure to join the group normally,rpc msg OOG.
Lombardi, Johann
Thanks for the clarification. Since the engine is killed and restarted in step 4, I am not sure to understand why network contexts would need to be reinitialized. Could you please create a jira ticket with the logs? In the meantime, you should be able to iterate over all the network contexts in cart (we keep an array with all the contexts there IIRC).
Cheers, Johann
From:
<daos@daos.groups.io> on behalf of "dagouxiong2015@..." <dagouxiong2015@...>
Thank you for your recovery, your understanding is correct. --------------------------------------------------------------------- This e-mail and any attachments may contain confidential material for
|
|
Re: When the network port recovers from a fault, the corresponding rank cannot receive the groupupdate message, resulting in the failure to join the group normally,rpc msg OOG.
dagouxiong2015@...
Gǎnxiè nín de huīfù, nǐ de lǐjiě shì duì de. Dāngshí wǒmen zhíxíng de cāozuò shì: 1) Bá diào wǎngxiàn 2) shā diào wǎngxiàn duìyìng de jìnchéng 3) chóngxīn chārù wǎngxiàn 4) tōngguò “dmg system start --ranks” huīfù duìyìng de rank 5) huīfù de rank rìzhì shàng chūxiàn DER_OOG
展开
翻译结果Thank you for your recovery, your understanding is correct.
At that time our sequence of operations was: 1) Unplug the network cable 2) Kill the process corresponding to the network cable 3) Reinsert the network cable 4) Restore the rank through "dmg system start --ranks" 5) DER_OOG appears on the recovered rank log We noticed that this problem is inevitable,
|
|
Re: When the network port recovers from a fault, the corresponding rank cannot receive the groupupdate message, resulting in the failure to join the group normally,rpc msg OOG.
Lombardi, Johann
Hi there,
IIUC, you are running into a bug where a DAOS engine is not able to rejoin the system / CART group if you “just” unplug & replug the network cable. You then noticed that you could work around this issue by reinitializing the cart contexts, but don’t know how to do that across the board for all network contexts used by the engine. Is that correct?
Cheers, Johann
From:
<daos@daos.groups.io> on behalf of "dagouxiong2015@..." <dagouxiong2015@...>
We tried the mercury demo and found that initializing hg can solve this OOG problem,
struct dss_module_info { crt_context_t dmi_ctx; --------------------------------------------------------------------- This e-mail and any attachments may contain confidential material for
|
|
When the network port recovers from a fault, the corresponding rank cannot receive the groupupdate message, resulting in the failure to join the group normally,rpc msg OOG.
dagouxiong2015@...
We tried the mercury demo and found that initializing hg can solve this OOG problem,
and then analyzed the code of the cart, and found that the cart context applied a global context, and it is used by all other ranks rpc msg. When we want to initialize the cart context for a rank, instead of all ranks,what can we do? struct dss_module_info {
crt_context_t dmi_ctx;
|
|
Re: Question about 3D Xpoint DIMM
Lombardi, Johann
Hi there,
The DAOS architecture won’t fundamentally change and the plan is to become more flexible in the configurations we support. We will continue to store metadata and data on different devices and use direct load/store for the metadata. The DAOS metadata will be stored on either persistent (e.g. apache/barlow/crow pass, battery-backed DRAM or future SSD products supporting CXL.mem) or volatile (e.g. DRAM or CXL.mem) devices. The persistent option is what DAOS supports today. As for the volatile one, there will be an extra step on write operations to keep a copy of the metadata in sync on CLX.io/NVMe SSDs. This work was already underway with community partners and is going to be accelerated. We will share more on this soon.
Once done, this change should allow DAOS to run on a wider range of hardware while maintaining our performance leadership.
Cheers, Johann
From:
<daos@daos.groups.io> on behalf of "Nabarro, Tom" <tom.nabarro@...>
DAOS continues to be a strategic part of the Intel software portfolio and we remain committed to supporting our customers and the DAOS community. In parallel, we are accelerating efforts that have already been under way for DAOS to utilize other storage technologies to store metadata on SSDs through NVMe and CXL interfaces.
From: daos@daos.groups.io <daos@daos.groups.io>
On Behalf Of ???
Intel recently announced that it will no longer provide 3D Xpoint DIMMs, so how will this affect DAOS? --------------------------------------------------------------------- This e-mail and any attachments may contain confidential material for
|
|
Re: Question about 3D Xpoint DIMM
DAOS continues to be a strategic part of the Intel software portfolio and we remain committed to supporting our customers and the DAOS community. In parallel, we are accelerating efforts that have already been under way for DAOS to utilize other storage technologies to store metadata on SSDs through NVMe and CXL interfaces.
From: daos@daos.groups.io <daos@daos.groups.io>
On Behalf Of ???
Intel recently announced that it will no longer provide 3D Xpoint DIMMs, so how will this affect DAOS?
|
|
Question about 3D Xpoint DIMM
段世博
Intel recently announced that it will no longer provide 3D Xpoint DIMMs, so how will this affect DAOS?
|
|
Announcement: DAOS 2.0.3 is generally available
Poddubnyy, Ivan
DAOS team would like to announce the release of DAOS Version 2.0.3.
This is a maintenance release containing bug fixes and improvements.
The DAOS 2.0.3 release contains the following updates on top of DAOS 2.0.2:
The complete list of changes can be found here: https://docs.daos.io/v2.0/release/release_notes/.
There are several resources available for the release:
RPM Repositories: https://packages.daos.io/v2.0/ Admin Guide: https://docs.daos.io/v2.0/admin/hardware/ User Guide: https://docs.daos.io/v2.0/user/workflow/ Architecture Overview: https://docs.daos.io/v2.0/overview/architecture/ Source Code: https://github.com/daos-stack/daos/releases/
As always, feel free to use this mailing list for any issues you may find with the release or our JIRA bug tracking system, available at https://daosio.atlassian.net/jira or on our Slack channel at https://daos-stack.slack.com.
Thank you,
Ivan Poddubnyy DAOS Customer Enablement and Support Manager Super Compute Storage Architecture and Development Division Intel
|
|
Would you tell me the statle leader how to start an election after becoming follower
尹秋霞
Hi ,DAOS: I'm looking at the raft's code. func (r *Raft) run() { for { // Check if we are doing a shutdown select { case <-r.shutdownCh: // Clear the leader to prevent forwarding r.setLeader("") return default: } // Enter into a sub-FSM switch r.getState() { case Follower: r.runFollower() case Candidate: r.runCandidate() case Leader: r.runLeader() } } } When leader became follower, the function runLeader() must return, so that the follower state machine can be run. If the runLeader() need to return, the function leaderLoop must return. But the leaderLoop function returns only when r.shutdownCh has a value. So I wonder the statle leader how to start or join the election after becoming the follower?
|
|
Re: Why not update groupmap when daos_server received RASSwimRankDead
After looking at the code I think I agree with this suggestion, otherwise the reqGroupUpdate(sync=false) call on src/control/server/server_utils.go L514 is ineffectual. Good catch!
From: daos@daos.groups.io <daos@daos.groups.io>
On Behalf Of ???
Thanks, mjmac. I think the code case sync := <-svc.groupUpdateReqs: groupUpdateNeeded = true if sync { if err := svc.doGroupUpdate(parent, true); err != nil { svc.log.Errorf("sync GroupUpdate failed: %s", err) continue } } groupUpdateNeeded = false
should be like this
case sync := <-svc.groupUpdateReqs: groupUpdateNeeded = true if sync { if err := svc.doGroupUpdate(parent, true); err != nil { svc.log.Errorf("sync GroupUpdate failed: %s", err) continue } groupUpdateNeeded = false }
At 2022-07-13 05:50:44, "Macdonald, Mjmac" <mjmac.macdonald@...> wrote:
|
|
Re: Why not update groupmap when daos_server received RASSwimRankDead
尹秋霞
Thanks, mjmac. I think the code case sync := <-svc.groupUpdateReqs: groupUpdateNeeded = true if sync { if err := svc.doGroupUpdate(parent, true); err != nil { svc.log.Errorf("sync GroupUpdate failed: %s", err) continue } } groupUpdateNeeded = false should be like this case sync := <-svc.groupUpdateReqs: groupUpdateNeeded = true if sync { if err := svc.doGroupUpdate(parent, true); err != nil { svc.log.Errorf("sync GroupUpdate failed: %s", err) continue } groupUpdateNeeded = false }
At 2022-07-13 05:50:44, "Macdonald, Mjmac" <mjmac.macdonald@...> wrote:
|
|
Re: Why not update groupmap when daos_server received RASSwimRankDead
Macdonald, Mjmac
To be clear, once a group map update has been requested, it will happen within 500ms of the request. This is triggered by a timer, not by a join request. Every 500ms, the timer fires and a check happens to see if a group update has been requested. Any changes to the system membership that have occurred since the last group update will be included in this new group update. The alternative is that every single rank death/join event would result in its own RPC downcall into the engine, and this would be extremely inefficient at scale.
Hope that helps. mjmac
From: daos@daos.groups.io <daos@daos.groups.io>
On Behalf Of ???
Thank you,mhmac! So the reason for passing false is to allow for efficient batching of group updates. But is it OK that engines don't get the newest groupmap? Sometimes, there may be no engine join message for a long time. In this case, servers will not pass the newest groupmap to engines. So the groupmap versions are different between servers and engines for a long time. During this, some messages will still be sent to the failed engine, because the latest groupmap has not been obtained. This will cause messages to fail due to timeout or other reasons. What do you think about this?
At 2022-07-12 02:23:19, "Macdonald, Mjmac" <mjmac.macdonald@...> wrote:
|
|
Re: Why not update groupmap when daos_server received RASSwimRankDead
尹秋霞
Thank you,mhmac! So the reason for passing false is to allow for efficient batching of group updates. But is it OK that engines don't get the newest groupmap? Sometimes, there may be no engine join message for a long time. In this case, servers will not pass the newest groupmap to engines. So the groupmap versions are different between servers and engines for a long time. During this, some messages will still be sent to the failed engine, because the latest groupmap has not been obtained. This will cause messages to fail due to timeout or other reasons. What do you think about this?
At 2022-07-12 02:23:19, "Macdonald, Mjmac" <mjmac.macdonald@...> wrote:
|
|
Re: Why not update groupmap when daos_server received RASSwimRankDead
Macdonald, Mjmac
Hi Qui.
I assume you are referring to this line of code: https://github.com/daos-stack/daos/blob/master/src/control/server/server_utils.go#L518
In this case, the false value indicates that the group update request does not need to be synchronous. You can see the request handler here: https://github.com/daos-stack/daos/blob/master/src/control/server/mgmt_system.go#L187
The reason for this is to allow for efficient batching of group updates during large-scale membership changes (e.g. system bringup or when many nodes are marked dead by SWIM). In this mode, the group update will happen within 500ms (maybe less, depending on when the ticker last fired).
Hope that helps. mjmac
From: daos@daos.groups.io <daos@daos.groups.io>
On Behalf Of ???
Hi, DAOS, I found when daos_server received RASSwimRankDead, daos_server updated membership imediately, but it put a false in reqGroupUpdate, then it would not pass the new groupmap to daos_engine. Would you tell me why?
Regards, Qiu
|
|
Why not update groupmap when daos_server received RASSwimRankDead
尹秋霞
Hi, DAOS, I found when daos_server received RASSwimRankDead, daos_server updated membership imediately, but it put a false in reqGroupUpdate, then it would not pass the new groupmap to daos_engine. Would you tell me why? Regards, Qiu
|
|
DAOS Community Update / July'22
Lombardi, Johann
Hi there,
Please find below the DAOS community newsletter for July 2022. A copy of this newsletter is available on the wiki.
Past Events
DAOS Features for Next Generation Platforms https://www.ixpug.org/events/isc22-ixpug-workshop Mohamad Chaarawi (Intel)
DAOS: Nextgen Storage Stack for HPC and AI https://sites.google.com/view/essa-2022/ Johann Lombardi (Intel)
One big happy family: sharing the S3 layer between Ceph, CORTX, and DAOS
https://iosea-project.eu/event/emoss-22-workshop/
Upcoming Events
Requirements and Challenges Associated with the World's Fastest Storage Platform https://sites.google.com/view/essa-2022/ Jeff Olivier (Intel)
Release
R&D
News
--------------------------------------------------------------------- This e-mail and any attachments may contain confidential material for
|
|
Re: DPI_SPACE query after extending pool
Niu, Yawei
Yes, that’s the reservation I mentioned, and the NVMe reservation has been removed in master and 2.2.
Thanks -Niu
From: <daos@daos.groups.io> on behalf of "Tuffli, Chuck" <chuck.tuffli@...>
Apologies. I was wrong. The space drops with pool creation, not with container creation.
Creating a pool # dmg pool create --ranks=0,3 --size=500G kiddie Creating DAOS pool with automatic storage allocation: 500 GB total, 6,94 tier ratio Pool created with 6.00%,94.00% storage tier ratio ------------------------------------------------- UUID : 6fb2c3ce-5406-4a6b-aa48-a5e73d66ef14 Service Ranks : 0 Storage Ranks : [0,3] Total Size : 500 GB Storage tier 0 (SCM) : 30 GB (15 GB / rank) Storage tier 1 (NVMe): 470 GB (235 GB / rank)
Query fresh pool, no containers
# daos pool query kiddie Pool 6fb2c3ce-5406-4a6b-aa48-a5e73d66ef14, ntarget=16, disabled=0, leader=0, version=1 Pool space info: - Target(VOS) count:16 - Storage tier 0 (SCM): Total size: 30 GB Free: 30 GB, min:1.9 GB, max:1.9 GB, mean:1.9 GB - Storage tier 1 (NVMe): Total size: 470 GB Free: 435 GB, min:27 GB, max:27 GB, mean:27 GB Rebuild idle, 0 objs, 0 recs
35 GB from NVMe and zero(-ish) from Optane. Create a container and re-query
# daos container create --pool=kiddie --type=posix --label=whiz Container UUID : 532aec33-9387-4aea-965d-91d6c039c3b9 Container Label: whiz Container Type : POSIX
Successfully created container 532aec33-9387-4aea-965d-91d6c039c3b9 # sleep 180 # daos pool query kiddie Pool 6fb2c3ce-5406-4a6b-aa48-a5e73d66ef14, ntarget=16, disabled=0, leader=0, version=1 Pool space info: - Target(VOS) count:16 - Storage tier 0 (SCM): Total size: 30 GB Free: 30 GB, min:1.9 GB, max:1.9 GB, mean:1.9 GB - Storage tier 1 (NVMe): Total size: 470 GB Free: 435 GB, min:27 GB, max:27 GB, mean:27 GB Rebuild idle, 0 objs, 0 recs
From: daos@daos.groups.io <daos@daos.groups.io> on behalf of Niu, Yawei <yawei.niu@...>
Could you double check if creating container would cause NVMe free space dropping? If it’s true, please open a ticket for further investigation. I can’t think of why container creation could consume NVMe space for now.
Thanks -Niu
From: daos@daos.groups.io <daos@daos.groups.io> on behalf of Tuffli, Chuck <chuck.tuffli@...> If the space change isn't caused by reservation, what else might be causing this? What other things might I check? From: daos@daos.groups.io <daos@daos.groups.io> on behalf of Niu, Yawei <yawei.niu@...>
Hi, Chuck
The reserved space is per pool, it’s not relevant with container creation, so I think the space change you observed after container creation isn’t caused by space reservation.
FYI, we’ve just changed the space reservation a bit, NVMe reservation has been removed from current master and 2.2, only SCM reservation is kept.
As for space query, current client space query reports only total space and free space, I think that’s a common practice for most systems. I think It could be improved to report detailed usage like how much space is used for reservation in the future, thanks for the input.
Thanks -Niu
From: daos@daos.groups.io <daos@daos.groups.io> on behalf of Tuffli, Chuck <chuck.tuffli@...> Thank you, Niu
After thinking about what you said and running some additional experiments, I believe everything is working as you described.
I created a 500 GB pool and added a POSIX container. After creating the container, the free NVMe space dropped from 470 GB to 435 GB which roughly lines up with the 2 GB reserved per NVMe drive (this pool has 16 drives).
The free space dropped to 406 GB after writing 27 GB of file data to the container. After extending the pool, the free space increased to 815 GB (roughly linear). Following WangDi's suggestion, I waited several minutes and afterwards, observed the free space climb to 841 GB. This last number matches my expectation that free space should more than double after extending the pool with an additional storage node.
Can clients query DAOS to figure out how much storage it is using itself (e.g. reserved space)? As I can see now, reporting used space based on the difference between total and free doesn't convey quite the right message to consumers of this storage. From: daos@daos.groups.io <daos@daos.groups.io> on behalf of Niu, Yawei <yawei.niu@...>
Hi, Chuck
The “used space” (total – free) is kind of OP (over-provisioning), you know that DAOS server has to reserve some space on both SCM and NVMe to ensure punch, container/object destroy, GC and aggregation not fail for ENOSPACE.
The size of this sys reservation is roughly: SCM: 5% of SCM total space; NVMe: 2% of NVMe total space; and the minimum reservation size is 2GB per pool target (for both SCM and NVMe). The reservation will be disabled if the pool size is tiny (when each pool target SCM and NVMe size is less than 5GB, we regard it as tiny pool, which is usually used for testing), so the operations I mentioned above could fail on the tiny pool when it’s running short of space.
There is an open ticket for reducing the OP, but it’s not on our schedule yet.
Thanks -Niu
From: daos@daos.groups.io <daos@daos.groups.io> on behalf of Wang, Di <di.wang@...> Hello Chunk
That is strange. According the output of dmg pool query, 83 objects were deleted after extend, so some space should be reclaimed.
“dmg pool query kiddie ……… Rebuild done, 83 objs, 0 recs “
before extend: s_total(30000021504,470000000000) s_free(29994849224,434968621056) after extend: s_total(60000043008,940000000000) s_free(59994841512,869937672192)
Hmm It seems SCM is ok, only NVME space are doubled after extend. If you do not see NVME free space get back after a few mins, probably need create a ticket.
Thanks WangDi
On 6/1/22, 10:56 AM, "daos@daos.groups.io on behalf of Tuffli, Chuck" <daos@daos.groups.io on behalf of chuck.tuffli@...> wrote:
Wangdi
When I checked this morning, the pool had been idle four days, but the values from daos_pool_query() have not changed.
As for object class, I'm not sure. Pool creation didn't specify a class. Here is the container query output: # daos cont query ERROR: daos: pool and container ID must be specified if --path not used ]# daos cont query kiddie whiz Container UUID : 5c61770a-2b56-4922-b95b-d025fa4d0527 Container Label : whiz Container Type : POSIX Pool UUID : 700bf1b6-38b8-467e-9f91-7131138210ba Number of snapshots : 0 Latest Persistent Snapshot : 0x0 Container redundancy factor: 0 Object Class : UNKNOWN Chunk Size : 1.0 MiB
From: daos@daos.groups.io <daos@daos.groups.io> on behalf of Wang, Di <di.wang@...>
Hello Chuck
Pool extend might migrate the data to the new pool target, then the original data will be delete asynchronous, so those space might be reclaimed a few mins later if the system is not busy.
You probably should do your daos_pool_query() a bit later. Btw: what are those objects class in your pool?
Thanks Wangdi
On 5/25/22, 2:54 PM, "daos@daos.groups.io on behalf of Tuffli, Chuck" <daos@daos.groups.io on behalf of chuck.tuffli@...> wrote:
# dmg pool query kiddie Pool 700bf1b6-38b8-467e-9f91-7131138210ba, ntarget=32, disabled=0, leader=0, version=18 Pool space info: - Target(VOS) count:32 - Storage tier 0 (SCM): Total size: 60 GB Free: 60 GB, min:1.9 GB, max:1.9 GB, mean:1.9 GB - Storage tier 1 (NVMe): Total size: 940 GB Free: 870 GB, min:27 GB, max:27 GB, mean:27 GB Rebuild done, 83 objs, 0 recs From: daos@daos.groups.io <daos@daos.groups.io> on behalf of Nabarro, Tom <tom.nabarro@...>
Hello Chuck,
Could you please run `dmg pool query` on the pool and show the results, this will give you a bit more info on pool usage.
Regards, Tom
From: daos@daos.groups.io <daos@daos.groups.io>
On Behalf Of Tuffli, Chuck
I've been experimenting with extending a pool but don't quite understand the results. Any insights would be most appreciated.
The cluster is running with DAOS v2.0.2 and consists of a client and a pair of servers/storage nodes. To simulate adding a server to the cluster, I created a pool by specifying the ranks associated with one of the servers. I.e.: # dmg system query --verbose Rank UUID Control Address Fault Domain State Reason ---- ---- --------------- ------------ ----- ------ 0 654345f9-249c-48b1-b6dc-ec08dbf2aded x.150.0.3:10001 /d006 Joined 1 b384771a-ddbc-491a-8807-8d86544d7c2f x.150.0.4:10001 /d010 Joined 2 01c672cf-3365-476f-87ec-41a15a44e946 x.150.0.4:10001 /d010 Joined 3 93a8d382-b970-408a-9c21-e01c35265e77 x.150.0.3:10001 /d006 Joined # dmg pool create --ranks=0,3 --size=500G kiddie
I used the pool extend command to simulate adding a server: # dmg pool extend --ranks=1,2 kiddie
My application queried the pool size before and after the extension using daos_pool_query( ... DPI_SPACE ...). The numbers below are the info.pi_space.ps_space values for (DAOS_MEDIA_SCM, DAOS_MEDIA_NVME). before extend: s_total(30000021504,470000000000) s_free(29994849224,434968621056) after extend: s_total(60000043008,940000000000) s_free(59994841512,869937672192)
The total pool sized doubled (good), but the used space (i.e., s_total - s_free) also doubled. Naively, I expected the used space to remain the same as the pool has a redundancy factor of zero. Doing some arithmetic on the above works out to the used space being 35.036 GB before the expansion and 70.068 GB after. Note that, for the moment, I'm choosing to ignore that the used size is several orders of magnitude bigger that the data written (~600 KB).
Where did I goof in this methodology? TIA.
--chuck
|
|
Re: DPI_SPACE query after extending pool
Tuffli, Chuck
Apologies. I was wrong. The space drops with pool creation, not with container creation.
Creating a pool # dmg pool create --ranks=0,3 --size=500G kiddie Creating DAOS pool with automatic storage allocation: 500 GB total, 6,94 tier ratio Pool created with 6.00%,94.00% storage tier ratio ------------------------------------------------- UUID : 6fb2c3ce-5406-4a6b-aa48-a5e73d66ef14 Service Ranks : 0 Storage Ranks : [0,3] Total Size : 500 GB Storage tier 0 (SCM) : 30 GB (15 GB / rank) Storage tier 1 (NVMe): 470 GB (235 GB / rank)
Query fresh pool, no containers
# daos pool query kiddie Pool 6fb2c3ce-5406-4a6b-aa48-a5e73d66ef14, ntarget=16, disabled=0, leader=0, version=1 Pool space info: - Target(VOS) count:16 - Storage tier 0 (SCM): Total size: 30 GB Free: 30 GB, min:1.9 GB, max:1.9 GB, mean:1.9 GB - Storage tier 1 (NVMe): Total size: 470 GB Free: 435 GB, min:27 GB, max:27 GB, mean:27 GB Rebuild idle, 0 objs, 0 recs
35 GB from NVMe and zero(-ish) from Optane. Create a container and re-query
# daos container create --pool=kiddie --type=posix --label=whiz Container UUID : 532aec33-9387-4aea-965d-91d6c039c3b9 Container Label: whiz Container Type : POSIX
Successfully created container 532aec33-9387-4aea-965d-91d6c039c3b9 # sleep 180 # daos pool query kiddie Pool 6fb2c3ce-5406-4a6b-aa48-a5e73d66ef14, ntarget=16, disabled=0, leader=0, version=1 Pool space info: - Target(VOS) count:16 - Storage tier 0 (SCM): Total size: 30 GB Free: 30 GB, min:1.9 GB, max:1.9 GB, mean:1.9 GB - Storage tier 1 (NVMe): Total size: 470 GB Free: 435 GB, min:27 GB, max:27 GB, mean:27 GB Rebuild idle, 0 objs, 0 recs From: daos@daos.groups.io <daos@daos.groups.io> on behalf of Niu, Yawei <yawei.niu@...>
Sent: Monday, June 13, 2022 6:00 PM To: daos@daos.groups.io <daos@daos.groups.io> Subject: Re: [daos] DPI_SPACE query after extending pool Could you double check if creating container would cause NVMe free space dropping? If it’s true, please open a ticket for further investigation. I can’t think of why container creation could consume NVMe space for now.
Thanks -Niu
From: daos@daos.groups.io <daos@daos.groups.io> on behalf of Tuffli, Chuck <chuck.tuffli@...> If the space change isn't caused by reservation, what else might be causing this? What other things might I check? From: daos@daos.groups.io <daos@daos.groups.io> on behalf of Niu, Yawei <yawei.niu@...>
Hi, Chuck
The reserved space is per pool, it’s not relevant with container creation, so I think the space change you observed after container creation isn’t caused by space reservation.
FYI, we’ve just changed the space reservation a bit, NVMe reservation has been removed from current master and 2.2, only SCM reservation is kept.
As for space query, current client space query reports only total space and free space, I think that’s a common practice for most systems. I think It could be improved to report detailed usage like how much space is used for reservation in the future, thanks for the input.
Thanks -Niu
From: daos@daos.groups.io <daos@daos.groups.io> on behalf of Tuffli, Chuck <chuck.tuffli@...> Thank you, Niu
After thinking about what you said and running some additional experiments, I believe everything is working as you described.
I created a 500 GB pool and added a POSIX container. After creating the container, the free NVMe space dropped from 470 GB to 435 GB which roughly lines up with the 2 GB reserved per NVMe drive (this pool has 16 drives).
The free space dropped to 406 GB after writing 27 GB of file data to the container. After extending the pool, the free space increased to 815 GB (roughly linear). Following WangDi's suggestion, I waited several minutes and afterwards, observed the free space climb to 841 GB. This last number matches my expectation that free space should more than double after extending the pool with an additional storage node.
Can clients query DAOS to figure out how much storage it is using itself (e.g. reserved space)? As I can see now, reporting used space based on the difference between total and free doesn't convey quite the right message to consumers of this storage. From: daos@daos.groups.io <daos@daos.groups.io> on behalf of Niu, Yawei <yawei.niu@...>
Hi, Chuck
The “used space” (total – free) is kind of OP (over-provisioning), you know that DAOS server has to reserve some space on both SCM and NVMe to ensure punch, container/object destroy, GC and aggregation not fail for ENOSPACE.
The size of this sys reservation is roughly: SCM: 5% of SCM total space; NVMe: 2% of NVMe total space; and the minimum reservation size is 2GB per pool target (for both SCM and NVMe). The reservation will be disabled if the pool size is tiny (when each pool target SCM and NVMe size is less than 5GB, we regard it as tiny pool, which is usually used for testing), so the operations I mentioned above could fail on the tiny pool when it’s running short of space.
There is an open ticket for reducing the OP, but it’s not on our schedule yet.
Thanks -Niu
From: daos@daos.groups.io <daos@daos.groups.io> on behalf of Wang, Di <di.wang@...> Hello Chunk
That is strange. According the output of dmg pool query, 83 objects were deleted after extend, so some space should be reclaimed.
“dmg pool query kiddie ……… Rebuild done, 83 objs, 0 recs “
before extend: s_total(30000021504,470000000000) s_free(29994849224,434968621056) after extend: s_total(60000043008,940000000000) s_free(59994841512,869937672192)
Hmm It seems SCM is ok, only NVME space are doubled after extend. If you do not see NVME free space get back after a few mins, probably need create a ticket.
Thanks WangDi
On 6/1/22, 10:56 AM, "daos@daos.groups.io on behalf of Tuffli, Chuck" <daos@daos.groups.io on behalf of chuck.tuffli@...> wrote:
Wangdi
When I checked this morning, the pool had been idle four days, but the values from daos_pool_query() have not changed.
As for object class, I'm not sure. Pool creation didn't specify a class. Here is the container query output: # daos cont query ERROR: daos: pool and container ID must be specified if --path not used ]# daos cont query kiddie whiz Container UUID : 5c61770a-2b56-4922-b95b-d025fa4d0527 Container Label : whiz Container Type : POSIX Pool UUID : 700bf1b6-38b8-467e-9f91-7131138210ba Number of snapshots : 0 Latest Persistent Snapshot : 0x0 Container redundancy factor: 0 Object Class : UNKNOWN Chunk Size : 1.0 MiB
From: daos@daos.groups.io <daos@daos.groups.io> on behalf of Wang, Di <di.wang@...>
Hello Chuck
Pool extend might migrate the data to the new pool target, then the original data will be delete asynchronous, so those space might be reclaimed a few mins later if the system is not busy.
You probably should do your daos_pool_query() a bit later. Btw: what are those objects class in your pool?
Thanks Wangdi
On 5/25/22, 2:54 PM, "daos@daos.groups.io on behalf of Tuffli, Chuck" <daos@daos.groups.io on behalf of chuck.tuffli@...> wrote:
# dmg pool query kiddie Pool 700bf1b6-38b8-467e-9f91-7131138210ba, ntarget=32, disabled=0, leader=0, version=18 Pool space info: - Target(VOS) count:32 - Storage tier 0 (SCM): Total size: 60 GB Free: 60 GB, min:1.9 GB, max:1.9 GB, mean:1.9 GB - Storage tier 1 (NVMe): Total size: 940 GB Free: 870 GB, min:27 GB, max:27 GB, mean:27 GB Rebuild done, 83 objs, 0 recs From: daos@daos.groups.io <daos@daos.groups.io> on behalf of Nabarro, Tom <tom.nabarro@...>
Hello Chuck,
Could you please run `dmg pool query` on the pool and show the results, this will give you a bit more info on pool usage.
Regards, Tom
From: daos@daos.groups.io <daos@daos.groups.io>
On Behalf Of Tuffli, Chuck
I've been experimenting with extending a pool but don't quite understand the results. Any insights would be most appreciated.
The cluster is running with DAOS v2.0.2 and consists of a client and a pair of servers/storage nodes. To simulate adding a server to the cluster, I created a pool by specifying the ranks associated with one of the servers. I.e.: # dmg system query --verbose Rank UUID Control Address Fault Domain State Reason ---- ---- --------------- ------------ ----- ------ 0 654345f9-249c-48b1-b6dc-ec08dbf2aded x.150.0.3:10001 /d006 Joined 1 b384771a-ddbc-491a-8807-8d86544d7c2f x.150.0.4:10001 /d010 Joined 2 01c672cf-3365-476f-87ec-41a15a44e946 x.150.0.4:10001 /d010 Joined 3 93a8d382-b970-408a-9c21-e01c35265e77 x.150.0.3:10001 /d006 Joined # dmg pool create --ranks=0,3 --size=500G kiddie
I used the pool extend command to simulate adding a server: # dmg pool extend --ranks=1,2 kiddie
My application queried the pool size before and after the extension using daos_pool_query( ... DPI_SPACE ...). The numbers below are the info.pi_space.ps_space values for (DAOS_MEDIA_SCM, DAOS_MEDIA_NVME). before extend: s_total(30000021504,470000000000) s_free(29994849224,434968621056) after extend: s_total(60000043008,940000000000) s_free(59994841512,869937672192)
The total pool sized doubled (good), but the used space (i.e., s_total - s_free) also doubled. Naively, I expected the used space to remain the same as the pool has a redundancy factor of zero. Doing some arithmetic on the above works out to the used space being 35.036 GB before the expansion and 70.068 GB after. Note that, for the moment, I'm choosing to ignore that the used size is several orders of magnitude bigger that the data written (~600 KB).
Where did I goof in this methodology? TIA.
--chuck
|
|
Re: DPI_SPACE query after extending pool
Niu, Yawei
Could you double check if creating container would cause NVMe free space dropping? If it’s true, please open a ticket for further investigation. I can’t think of why container creation could consume NVMe space for now.
Thanks -Niu
From: daos@daos.groups.io <daos@daos.groups.io> on behalf of Tuffli, Chuck <chuck.tuffli@...> If the space change isn't caused by reservation, what else might be causing this? What other things might I check? From: daos@daos.groups.io <daos@daos.groups.io> on behalf of Niu, Yawei <yawei.niu@...>
Hi, Chuck
The reserved space is per pool, it’s not relevant with container creation, so I think the space change you observed after container creation isn’t caused by space reservation.
FYI, we’ve just changed the space reservation a bit, NVMe reservation has been removed from current master and 2.2, only SCM reservation is kept.
As for space query, current client space query reports only total space and free space, I think that’s a common practice for most systems. I think It could be improved to report detailed usage like how much space is used for reservation in the future, thanks for the input.
Thanks -Niu
From: daos@daos.groups.io <daos@daos.groups.io> on behalf of Tuffli, Chuck <chuck.tuffli@...> Thank you, Niu
After thinking about what you said and running some additional experiments, I believe everything is working as you described.
I created a 500 GB pool and added a POSIX container. After creating the container, the free NVMe space dropped from 470 GB to 435 GB which roughly lines up with the 2 GB reserved per NVMe drive (this pool has 16 drives).
The free space dropped to 406 GB after writing 27 GB of file data to the container. After extending the pool, the free space increased to 815 GB (roughly linear). Following WangDi's suggestion, I waited several minutes and afterwards, observed the free space climb to 841 GB. This last number matches my expectation that free space should more than double after extending the pool with an additional storage node.
Can clients query DAOS to figure out how much storage it is using itself (e.g. reserved space)? As I can see now, reporting used space based on the difference between total and free doesn't convey quite the right message to consumers of this storage. From: daos@daos.groups.io <daos@daos.groups.io> on behalf of Niu, Yawei <yawei.niu@...>
Hi, Chuck
The “used space” (total – free) is kind of OP (over-provisioning), you know that DAOS server has to reserve some space on both SCM and NVMe to ensure punch, container/object destroy, GC and aggregation not fail for ENOSPACE.
The size of this sys reservation is roughly: SCM: 5% of SCM total space; NVMe: 2% of NVMe total space; and the minimum reservation size is 2GB per pool target (for both SCM and NVMe). The reservation will be disabled if the pool size is tiny (when each pool target SCM and NVMe size is less than 5GB, we regard it as tiny pool, which is usually used for testing), so the operations I mentioned above could fail on the tiny pool when it’s running short of space.
There is an open ticket for reducing the OP, but it’s not on our schedule yet.
Thanks -Niu
From: daos@daos.groups.io <daos@daos.groups.io> on behalf of Wang, Di <di.wang@...> Hello Chunk
That is strange. According the output of dmg pool query, 83 objects were deleted after extend, so some space should be reclaimed.
“dmg pool query kiddie ……… Rebuild done, 83 objs, 0 recs “
before extend: s_total(30000021504,470000000000) s_free(29994849224,434968621056) after extend: s_total(60000043008,940000000000) s_free(59994841512,869937672192)
Hmm It seems SCM is ok, only NVME space are doubled after extend. If you do not see NVME free space get back after a few mins, probably need create a ticket.
Thanks WangDi
On 6/1/22, 10:56 AM, "daos@daos.groups.io on behalf of Tuffli, Chuck" <daos@daos.groups.io on behalf of chuck.tuffli@...> wrote:
Wangdi
When I checked this morning, the pool had been idle four days, but the values from daos_pool_query() have not changed.
As for object class, I'm not sure. Pool creation didn't specify a class. Here is the container query output: # daos cont query ERROR: daos: pool and container ID must be specified if --path not used ]# daos cont query kiddie whiz Container UUID : 5c61770a-2b56-4922-b95b-d025fa4d0527 Container Label : whiz Container Type : POSIX Pool UUID : 700bf1b6-38b8-467e-9f91-7131138210ba Number of snapshots : 0 Latest Persistent Snapshot : 0x0 Container redundancy factor: 0 Object Class : UNKNOWN Chunk Size : 1.0 MiB
From: daos@daos.groups.io <daos@daos.groups.io> on behalf of Wang, Di <di.wang@...>
Hello Chuck
Pool extend might migrate the data to the new pool target, then the original data will be delete asynchronous, so those space might be reclaimed a few mins later if the system is not busy.
You probably should do your daos_pool_query() a bit later. Btw: what are those objects class in your pool?
Thanks Wangdi
On 5/25/22, 2:54 PM, "daos@daos.groups.io on behalf of Tuffli, Chuck" <daos@daos.groups.io on behalf of chuck.tuffli@...> wrote:
# dmg pool query kiddie Pool 700bf1b6-38b8-467e-9f91-7131138210ba, ntarget=32, disabled=0, leader=0, version=18 Pool space info: - Target(VOS) count:32 - Storage tier 0 (SCM): Total size: 60 GB Free: 60 GB, min:1.9 GB, max:1.9 GB, mean:1.9 GB - Storage tier 1 (NVMe): Total size: 940 GB Free: 870 GB, min:27 GB, max:27 GB, mean:27 GB Rebuild done, 83 objs, 0 recs From: daos@daos.groups.io <daos@daos.groups.io> on behalf of Nabarro, Tom <tom.nabarro@...>
Hello Chuck,
Could you please run `dmg pool query` on the pool and show the results, this will give you a bit more info on pool usage.
Regards, Tom
From: daos@daos.groups.io <daos@daos.groups.io>
On Behalf Of Tuffli, Chuck
I've been experimenting with extending a pool but don't quite understand the results. Any insights would be most appreciated.
The cluster is running with DAOS v2.0.2 and consists of a client and a pair of servers/storage nodes. To simulate adding a server to the cluster, I created a pool by specifying the ranks associated with one of the servers. I.e.: # dmg system query --verbose Rank UUID Control Address Fault Domain State Reason ---- ---- --------------- ------------ ----- ------ 0 654345f9-249c-48b1-b6dc-ec08dbf2aded x.150.0.3:10001 /d006 Joined 1 b384771a-ddbc-491a-8807-8d86544d7c2f x.150.0.4:10001 /d010 Joined 2 01c672cf-3365-476f-87ec-41a15a44e946 x.150.0.4:10001 /d010 Joined 3 93a8d382-b970-408a-9c21-e01c35265e77 x.150.0.3:10001 /d006 Joined # dmg pool create --ranks=0,3 --size=500G kiddie
I used the pool extend command to simulate adding a server: # dmg pool extend --ranks=1,2 kiddie
My application queried the pool size before and after the extension using daos_pool_query( ... DPI_SPACE ...). The numbers below are the info.pi_space.ps_space values for (DAOS_MEDIA_SCM, DAOS_MEDIA_NVME). before extend: s_total(30000021504,470000000000) s_free(29994849224,434968621056) after extend: s_total(60000043008,940000000000) s_free(59994841512,869937672192)
The total pool sized doubled (good), but the used space (i.e., s_total - s_free) also doubled. Naively, I expected the used space to remain the same as the pool has a redundancy factor of zero. Doing some arithmetic on the above works out to the used space being 35.036 GB before the expansion and 70.068 GB after. Note that, for the moment, I'm choosing to ignore that the used size is several orders of magnitude bigger that the data written (~600 KB).
Where did I goof in this methodology? TIA.
--chuck
|
|