Re: DPI_SPACE query after extending pool


Tuffli, Chuck
 

If the space change isn't caused by reservation, what else might be causing this? What other things might I check?


From: daos@daos.groups.io <daos@daos.groups.io> on behalf of Niu, Yawei <yawei.niu@...>
Sent: Wednesday, June 8, 2022 5:55 PM
To: daos@daos.groups.io <daos@daos.groups.io>
Subject: Re: [daos] DPI_SPACE query after extending pool
 

Hi, Chuck

 

The reserved space is per pool, it’s not relevant with container creation, so I think the space change you observed after container creation isn’t caused by space reservation.

 

FYI, we’ve just changed the space reservation a bit, NVMe reservation has been removed from current master and 2.2, only SCM reservation is kept.

 

As for space query, current client space query reports only total space and free space, I think that’s a common practice for most systems. I think It could be improved to report detailed usage like how much space is used for reservation in the future, thanks for the input.

 

Thanks

-Niu

 

From: daos@daos.groups.io <daos@daos.groups.io> on behalf of Tuffli, Chuck <chuck.tuffli@...>
Date: Thursday, June 9, 2022 at 3:22 AM
To: daos@daos.groups.io <daos@daos.groups.io>
Subject: Re: [daos] DPI_SPACE query after extending pool

Thank you, Niu

 

After thinking about what you said and running some additional experiments, I believe everything is working as you described.

 

I created a 500 GB pool and added a POSIX container. After creating the container, the free NVMe space dropped from 470 GB to 435 GB which roughly lines up with the 2 GB reserved per NVMe drive (this pool has 16 drives).

 

The free space dropped to 406 GB after writing 27 GB of file data to the container. After extending the pool, the free space increased to 815 GB (roughly linear). Following WangDi's suggestion, I waited several minutes and afterwards, observed the free space climb to 841 GB. This last number matches my expectation that free space should more than double after extending the pool with an additional storage node.

 

Can clients query DAOS to figure out how much storage it is using itself (e.g. reserved space)? As I can see now, reporting used space based on the difference between total and free doesn't convey quite the right message to consumers of this storage.


From: daos@daos.groups.io <daos@daos.groups.io> on behalf of Niu, Yawei <yawei.niu@...>
Sent: Wednesday, June 1, 2022 6:16 PM
To: daos@daos.groups.io <daos@daos.groups.io>
Subject: Re: [daos] DPI_SPACE query after extending pool

 

Hi, Chuck

 

The “used space” (total – free) is kind of OP (over-provisioning), you know that DAOS server has to reserve some space on both SCM and NVMe to ensure punch, container/object destroy, GC and aggregation not fail for ENOSPACE.

 

The size of this sys reservation is roughly: SCM: 5% of SCM total space; NVMe: 2% of NVMe total space; and the minimum reservation size is 2GB per pool target (for both SCM and NVMe). The reservation will be disabled if the pool size is tiny  (when each pool target SCM and NVMe size is less than 5GB, we regard it as tiny pool, which is usually used for testing), so the operations I mentioned above could fail on the tiny pool when it’s running short of space.

 

There is an open ticket for reducing the OP, but it’s not on our schedule yet.

 

Thanks

-Niu

 

From: daos@daos.groups.io <daos@daos.groups.io> on behalf of Wang, Di <di.wang@...>
Date: Thursday, June 2, 2022 at 4:35 AM
To: daos@daos.groups.io <daos@daos.groups.io>
Subject: Re: [daos] DPI_SPACE query after extending pool

Hello Chunk

 

That is strange.  According the output of dmg pool query, 83 objects were deleted after extend, so some space  should be reclaimed.

 

dmg pool query kiddie

………

   Rebuild done, 83 objs, 0 recs

 

before extend:

s_total(30000021504,470000000000) s_free(29994849224,434968621056)

after extend:

s_total(60000043008,940000000000) s_free(59994841512,869937672192)

 

Hmm It seems SCM is ok, only NVME space are doubled after extend.  If you do not see NVME free space  get back after a few mins, probably need create a ticket.

 

Thanks

WangDi

 

 

 

On 6/1/22, 10:56 AM, "daos@daos.groups.io on behalf of Tuffli, Chuck" <daos@daos.groups.io on behalf of chuck.tuffli@...> wrote:

 

Wangdi

 

When I checked this morning, the pool had been idle four days, but the values from daos_pool_query() have not changed.

 

As for object class, I'm not sure. Pool creation didn't specify a class. Here is the container query output:

# daos cont query

ERROR: daos: pool and container ID must be specified if --path not used

]# daos cont query kiddie whiz

  Container UUID             : 5c61770a-2b56-4922-b95b-d025fa4d0527

  Container Label            : whiz

  Container Type             : POSIX

  Pool UUID                  : 700bf1b6-38b8-467e-9f91-7131138210ba

  Number of snapshots        : 0

  Latest Persistent Snapshot : 0x0

  Container redundancy factor: 0

  Object Class               : UNKNOWN

  Chunk Size                 : 1.0 MiB

 

 


From: daos@daos.groups.io <daos@daos.groups.io> on behalf of Wang, Di <di.wang@...>
Sent: Wednesday, May 25, 2022 4:33 PM
To: daos@daos.groups.io <daos@daos.groups.io>
Subject: Re: [daos] DPI_SPACE query after extending pool

 

Hello Chuck

 

Pool extend might migrate the data to the new pool target, then the original data will be delete asynchronous, so those space might be reclaimed a few mins later if the system is not busy.

 

You probably should do your daos_pool_query() a bit later. Btw: what are those objects class in your pool?

 

Thanks

Wangdi

 

On 5/25/22, 2:54 PM, "daos@daos.groups.io on behalf of Tuffli, Chuck" <daos@daos.groups.io on behalf of chuck.tuffli@...> wrote:

 

# dmg pool query kiddie

Pool 700bf1b6-38b8-467e-9f91-7131138210ba, ntarget=32, disabled=0, leader=0, version=18

Pool space info:

- Target(VOS) count:32

- Storage tier 0 (SCM):

  Total size: 60 GB

  Free: 60 GB, min:1.9 GB, max:1.9 GB, mean:1.9 GB

- Storage tier 1 (NVMe):

  Total size: 940 GB

  Free: 870 GB, min:27 GB, max:27 GB, mean:27 GB

Rebuild done, 83 objs, 0 recs


From: daos@daos.groups.io <daos@daos.groups.io> on behalf of Nabarro, Tom <tom.nabarro@...>
Sent: Tuesday, May 24, 2022 2:24 PM
To: daos@daos.groups.io <daos@daos.groups.io>
Subject: Re: [daos] DPI_SPACE query after extending pool

 

Hello Chuck,

 

Could you please run `dmg pool query` on the pool and show the results, this will give you a bit more info on pool usage.

 

Regards,

Tom

 

From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of Tuffli, Chuck
Sent: Tuesday, May 24, 2022 9:05 PM
To: daos@daos.groups.io
Subject: [daos] DPI_SPACE query after extending pool

 

I've been experimenting with extending a pool but don't quite understand the results. Any insights would be most appreciated.

 

The cluster is running with DAOS v2.0.2 and consists of a client and a pair of servers/storage nodes. To simulate adding a server to the cluster, I created a pool by specifying the ranks associated with one of the servers. I.e.:

# dmg system query --verbose

Rank UUID                                 Control Address  Fault Domain State  Reason

---- ----                                 ---------------  ------------ -----  ------

0    654345f9-249c-48b1-b6dc-ec08dbf2aded x.150.0.3:10001  /d006        Joined

1    b384771a-ddbc-491a-8807-8d86544d7c2f x.150.0.4:10001  /d010        Joined

2    01c672cf-3365-476f-87ec-41a15a44e946 x.150.0.4:10001  /d010        Joined

3    93a8d382-b970-408a-9c21-e01c35265e77 x.150.0.3:10001  /d006        Joined

# dmg pool create --ranks=0,3 --size=500G kiddie

 

I used the pool extend command to simulate adding a server:

# dmg pool extend --ranks=1,2 kiddie
Extend command succeeded

 

My application queried the pool size before and after the extension using daos_pool_query( ... DPI_SPACE ...). The numbers below are the info.pi_space.ps_space values for (DAOS_MEDIA_SCM, DAOS_MEDIA_NVME).

before extend:

s_total(30000021504,470000000000) s_free(29994849224,434968621056)

after extend:

s_total(60000043008,940000000000) s_free(59994841512,869937672192)

 

The total pool sized doubled (good), but the used space (i.e., s_total - s_free) also doubled. Naively, I expected the used space to remain the same as the pool has a redundancy factor of zero. Doing some arithmetic on the above works out to the used space being 35.036 GB before the expansion and 70.068 GB after. Note that, for the moment, I'm choosing to ignore that the used size is several orders of magnitude bigger that the data written (~600 KB).

 

Where did I goof in this methodology? TIA.

 

--chuck

 

Join {daos@daos.groups.io to automatically receive all group messages.