Re: New to DAOS: Storage Pools
Harms, Kevin
In my opinion, I would not say that DAOS has tiers. It does support a configuration for small writes landing in PMEM and then a later aggregation service will batch them into the NVME. However, there is no general data movement service inside DAOS that moves data between different tiers of storage. There are data mover tools, but those are run manually in order to move data in and out of DAOS.
kevin ________________________________________ From: daos@daos.groups.io <daos@daos.groups.io> on behalf of TheCTEgroup <david@...> Sent: Wednesday, May 25, 2022 1:12 PM To: daos@daos.groups.io Subject: [daos] New to DAOS: Storage Pools As I was going through the documentation reading about the storage pools I saw the diagram that shows storage pool 1, 2, and 3 all a combination of SCM PMEM or NVMe, but I couldn’t find any detail on whether those pools have a tier priority, which I assume they do, and if they all had to be DAOS storage containers or if one could be an archive tier via S3 (cloud perhaps?) or some NAS. I know anything is possible, but just looking at whether is would be practical based on front end performance requirements balanced with long term retention to a potentially lower cost tier of storage thank you.
|
|||
|
|||
Re: DPI_SPACE query after extending pool
Wang, Di
Hello Chuck
Pool extend might migrate the data to the new pool target, then the original data will be delete asynchronous, so those space might be reclaimed a few mins later if the system is not busy.
You probably should do your daos_pool_query() a bit later. Btw: what are those objects class in your pool?
Thanks Wangdi
On 5/25/22, 2:54 PM, "daos@daos.groups.io on behalf of Tuffli, Chuck" <daos@daos.groups.io on behalf of
chuck.tuffli@...> wrote:
# dmg pool query kiddie Pool 700bf1b6-38b8-467e-9f91-7131138210ba, ntarget=32, disabled=0, leader=0, version=18 Pool space info: - Target(VOS) count:32 - Storage tier 0 (SCM): Total size: 60 GB Free: 60 GB, min:1.9 GB, max:1.9 GB, mean:1.9 GB - Storage tier 1 (NVMe): Total size: 940 GB Free: 870 GB, min:27 GB, max:27 GB, mean:27 GB Rebuild done, 83 objs, 0 recs From: daos@daos.groups.io <daos@daos.groups.io> on behalf of Nabarro, Tom <tom.nabarro@...>
Hello Chuck,
Could you please run `dmg pool query` on the pool and show the results, this will give you a bit more info on pool usage.
Regards, Tom
From: daos@daos.groups.io <daos@daos.groups.io>
On Behalf Of Tuffli, Chuck
I've been experimenting with extending a pool but don't quite understand the results. Any insights would be most appreciated.
The cluster is running with DAOS v2.0.2 and consists of a client and a pair of servers/storage nodes. To simulate adding a server to the cluster, I created a pool by specifying the ranks associated with one of the servers. I.e.: # dmg system query --verbose Rank UUID Control Address Fault Domain State Reason ---- ---- --------------- ------------ ----- ------ 0 654345f9-249c-48b1-b6dc-ec08dbf2aded x.150.0.3:10001 /d006 Joined 1 b384771a-ddbc-491a-8807-8d86544d7c2f x.150.0.4:10001 /d010 Joined 2 01c672cf-3365-476f-87ec-41a15a44e946 x.150.0.4:10001 /d010 Joined 3 93a8d382-b970-408a-9c21-e01c35265e77 x.150.0.3:10001 /d006 Joined # dmg pool create --ranks=0,3 --size=500G kiddie
I used the pool extend command to simulate adding a server: # dmg pool extend --ranks=1,2 kiddie
My application queried the pool size before and after the extension using daos_pool_query( ... DPI_SPACE ...). The numbers below are the info.pi_space.ps_space values for (DAOS_MEDIA_SCM, DAOS_MEDIA_NVME). before extend: s_total(30000021504,470000000000) s_free(29994849224,434968621056) after extend: s_total(60000043008,940000000000) s_free(59994841512,869937672192)
The total pool sized doubled (good), but the used space (i.e., s_total - s_free) also doubled. Naively, I expected the used space to remain the same as the pool has a redundancy factor of zero. Doing some arithmetic on the above works out to the used space being 35.036 GB before the expansion and 70.068 GB after. Note that, for the moment, I'm choosing to ignore that the used size is several orders of magnitude bigger that the data written (~600 KB).
Where did I goof in this methodology? TIA.
--chuck
|
|||
|
|||
Re: DPI_SPACE query after extending pool
Tuffli, Chuck
# dmg pool query kiddie
Pool 700bf1b6-38b8-467e-9f91-7131138210ba, ntarget=32, disabled=0, leader=0, version=18
Pool space info:
- Target(VOS) count:32
- Storage tier 0 (SCM):
Total size: 60 GB
Free: 60 GB, min:1.9 GB, max:1.9 GB, mean:1.9 GB
- Storage tier 1 (NVMe):
Total size: 940 GB
Free: 870 GB, min:27 GB, max:27 GB, mean:27 GB
Rebuild done, 83 objs, 0 recs
From: daos@daos.groups.io <daos@daos.groups.io> on behalf of Nabarro, Tom <tom.nabarro@...>
Sent: Tuesday, May 24, 2022 2:24 PM To: daos@daos.groups.io <daos@daos.groups.io> Subject: Re: [daos] DPI_SPACE query after extending pool Hello Chuck,
Could you please run `dmg pool query` on the pool and show the results, this will give you a bit more info on pool usage.
Regards, Tom
From: daos@daos.groups.io <daos@daos.groups.io>
On Behalf Of Tuffli, Chuck
I've been experimenting with extending a pool but don't quite understand the results. Any insights would be most appreciated.
The cluster is running with DAOS v2.0.2 and consists of a client and a pair of servers/storage nodes. To simulate adding a server to the cluster, I created a pool by specifying the ranks associated with one of the servers. I.e.: # dmg system query --verbose Rank UUID Control Address Fault Domain State Reason ---- ---- --------------- ------------ ----- ------ 0 654345f9-249c-48b1-b6dc-ec08dbf2aded x.150.0.3:10001 /d006 Joined 1 b384771a-ddbc-491a-8807-8d86544d7c2f x.150.0.4:10001 /d010 Joined 2 01c672cf-3365-476f-87ec-41a15a44e946 x.150.0.4:10001 /d010 Joined 3 93a8d382-b970-408a-9c21-e01c35265e77 x.150.0.3:10001 /d006 Joined # dmg pool create --ranks=0,3 --size=500G kiddie
I used the pool extend command to simulate adding a server: # dmg pool extend --ranks=1,2 kiddie
My application queried the pool size before and after the extension using daos_pool_query( ... DPI_SPACE ...). The numbers below are the info.pi_space.ps_space values for (DAOS_MEDIA_SCM, DAOS_MEDIA_NVME). before extend: s_total(30000021504,470000000000) s_free(29994849224,434968621056) after extend: s_total(60000043008,940000000000) s_free(59994841512,869937672192)
The total pool sized doubled (good), but the used space (i.e., s_total - s_free) also doubled. Naively, I expected the used space to remain the same as the pool has a redundancy factor of zero. Doing some arithmetic on the above works out to the used space being 35.036 GB before the expansion and 70.068 GB after. Note that, for the moment, I'm choosing to ignore that the used size is several orders of magnitude bigger that the data written (~600 KB).
Where did I goof in this methodology? TIA.
--chuck
|
|||
|
|||
New to DAOS: Storage Pools
As I was going through the documentation reading about the storage pools I saw the diagram that shows storage pool 1, 2, and 3 all a combination of SCM PMEM or NVMe, but I couldn’t find any detail on whether those pools have a tier priority, which I assume they do, and if they all had to be DAOS storage containers or if one could be an archive tier via S3 (cloud perhaps?) or some NAS. I know anything is possible, but just looking at whether is would be practical based on front end performance requirements balanced with long term retention to a potentially lower cost tier of storage
|
|||
|
|||
Re: DAOS 2.0 installation instructions?
Lofstead, Gerald F II
Thanks! Those were well hidden. Let us take a look and we’ll follow-up with more questions if we have any.
Best,
Jay
From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of
Lombardi, Johann
Sent: Wednesday, May 25, 2022 6:42 AM To: daos@daos.groups.io Subject: [EXTERNAL] Re: [daos] DAOS 2.0 installation instructions?
Adrian, I don’t think this link is correct 😊
Jay, we have the quickstart guides for RHEL/clones and SuSE (see https://docs.daos.io/v2.0/QSG/setup_rhel/) and the more detailed admin guide (see https://docs.daos.io/v2.0/admin/predeployment_check/). Have you already come across those links? Please let us know how we can help.
Cheers, Johann
From:
<daos@daos.groups.io> on behalf of JACKSON Adrian <a.jackson@...>
Hi Jay,
Have you seen the documentation(s) at https://docs.daos.io/v2.0/QSG/ ? Do you want something beyond that?
cheers
adrianj -- Tel: +44 131 6506470 skype: remoteadrianj The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. Is e buidheann carthannais a th’ ann an Oilthigh Dhùn Èideann, clàraichte an Alba, àireamh clàraidh SC005336.
--------------------------------------------------------------------- This e-mail and any attachments may contain confidential material for
|
|||
|
|||
Re: DAOS 2.0 installation instructions?
Lombardi, Johann
Adrian, I don’t think this link is correct 😊
Jay, we have the quickstart guides for RHEL/clones and SuSE (see https://docs.daos.io/v2.0/QSG/setup_rhel/) and the more detailed admin guide (see https://docs.daos.io/v2.0/admin/predeployment_check/). Have you already come across those links? Please let us know how we can help.
Cheers, Johann
From:
<daos@daos.groups.io> on behalf of JACKSON Adrian <a.jackson@...>
Hi Jay,
Have you seen the documentation(s) at https://docs.daos.io/v2.0/QSG/ ? Do you want something beyond that?
cheers
adrianj -- Tel: +44 131 6506470 skype: remoteadrianj The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. Is e buidheann carthannais a th’ ann an Oilthigh Dhùn Èideann, clàraichte an Alba, àireamh clàraidh SC005336.
--------------------------------------------------------------------- This e-mail and any attachments may contain confidential material for
|
|||
|
|||
Re: DAOS 2.0 installation instructions?
JACKSON Adrian
Hi Jay,
Have you seen the documentation(s) at https://docs.daos.io/v2.0/QSG/ ? Do you want something beyond that? cheers adrianj -- Tel: +44 131 6506470 skype: remoteadrianj The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. Is e buidheann carthannais a th’ ann an Oilthigh Dhùn Èideann, clàraichte an Alba, àireamh clàraidh SC005336.
|
|||
|
|||
DAOS 2.0 installation instructions?
Lofstead, Gerald F II
The documentation on the site is minimally helpful. Is there a good guide for installing DAOS 2.0?
Our context: We have 4 nodes with Optane PMEM devices and we want to make a mini-install that uses those devices for the storage and metadata. We also want to do another against NVMe devices to see what kind of storage devices we want to buy for larger scale deployments.
Thanks
Jay
|
|||
|
|||
Re: DPI_SPACE query after extending pool
Hello Chuck,
Could you please run `dmg pool query` on the pool and show the results, this will give you a bit more info on pool usage.
Regards, Tom
From: daos@daos.groups.io <daos@daos.groups.io>
On Behalf Of Tuffli, Chuck
Sent: Tuesday, May 24, 2022 9:05 PM To: daos@daos.groups.io Subject: [daos] DPI_SPACE query after extending pool
I've been experimenting with extending a pool but don't quite understand the results. Any insights would be most appreciated.
The cluster is running with DAOS v2.0.2 and consists of a client and a pair of servers/storage nodes. To simulate adding a server to the cluster, I created a pool by specifying the ranks associated with one of the servers. I.e.: # dmg system query --verbose Rank UUID Control Address Fault Domain State Reason ---- ---- --------------- ------------ ----- ------ 0 654345f9-249c-48b1-b6dc-ec08dbf2aded x.150.0.3:10001 /d006 Joined 1 b384771a-ddbc-491a-8807-8d86544d7c2f x.150.0.4:10001 /d010 Joined 2 01c672cf-3365-476f-87ec-41a15a44e946 x.150.0.4:10001 /d010 Joined 3 93a8d382-b970-408a-9c21-e01c35265e77 x.150.0.3:10001 /d006 Joined # dmg pool create --ranks=0,3 --size=500G kiddie
I used the pool extend command to simulate adding a server: # dmg pool extend --ranks=1,2 kiddie
My application queried the pool size before and after the extension using daos_pool_query( ... DPI_SPACE ...). The numbers below are the info.pi_space.ps_space values for (DAOS_MEDIA_SCM, DAOS_MEDIA_NVME). before extend: s_total(30000021504,470000000000) s_free(29994849224,434968621056) after extend: s_total(60000043008,940000000000) s_free(59994841512,869937672192)
The total pool sized doubled (good), but the used space (i.e., s_total - s_free) also doubled. Naively, I expected the used space to remain the same as the pool has a redundancy factor of zero. Doing some arithmetic on the above works out to the used space being 35.036 GB before the expansion and 70.068 GB after. Note that, for the moment, I'm choosing to ignore that the used size is several orders of magnitude bigger that the data written (~600 KB).
Where did I goof in this methodology? TIA.
--chuck
|
|||
|
|||
DPI_SPACE query after extending pool
Tuffli, Chuck
I've been experimenting with extending a pool but don't quite understand the results. Any insights would be most appreciated.
The cluster is running with DAOS v2.0.2 and consists of a client and a pair of servers/storage nodes. To simulate adding a server to the cluster, I created a pool by specifying the ranks associated with one of the servers. I.e.:
# dmg system query --verbose
Rank UUID Control Address Fault Domain State Reason
---- ---- --------------- ------------ ----- ------
0 654345f9-249c-48b1-b6dc-ec08dbf2aded x.150.0.3:10001 /d006 Joined
1 b384771a-ddbc-491a-8807-8d86544d7c2f x.150.0.4:10001 /d010
Joined
2 01c672cf-3365-476f-87ec-41a15a44e946 x.150.0.4:10001 /d010
Joined
3 93a8d382-b970-408a-9c21-e01c35265e77 x.150.0.3:10001 /d006
Joined
# dmg pool create --ranks=0,3 --size=500G kiddie
I used the pool extend command to simulate adding a server:
# dmg pool extend --ranks=1,2 kiddie
Extend command succeeded
My application queried the pool size before and after the extension using daos_pool_query( ... DPI_SPACE ...). The numbers below are the info.pi_space.ps_space values for (DAOS_MEDIA_SCM, DAOS_MEDIA_NVME).
before extend:
s_total(30000021504,470000000000) s_free(29994849224,434968621056)
after extend:
s_total(60000043008,940000000000) s_free(59994841512,869937672192)
The total pool sized doubled (good), but the used space (i.e., s_total - s_free) also doubled. Naively, I expected the used space to remain the same as the pool has a redundancy factor of zero. Doing some arithmetic on the above works out to the used space
being 35.036 GB before the expansion and 70.068 GB after. Note that, for the moment, I'm choosing to ignore that the used size is several orders of magnitude bigger that the data written (~600 KB).
Where did I goof in this methodology? TIA.
--chuck
|
|||
|
|||
Re: Storage Usage
zp_8483@...
We should check if '/usr/sbin' in environment variable PATH or not before append it.
// ExecReq executes the supplied Request by starting a child process // to service the request. Returns a Response if successful.
func ExecReq(parent context.Context, log logging.Logger, binPath string, req *Request) (res *Response, err error) {
if req == nil {
return nil, errors.New("nil request")
}
ctx, killChild := context.WithCancel(parent)
defer killChild()
child := exec.CommandContext(ctx, binPath)
child.Stderr = &cmdLogger{
logFn: log.Error,
prefix: binPath,
}
toChild, err := child.StdinPipe()
if err != nil {
return nil, err
}
fromChild, err := child.StdoutPipe()
if err != nil {
return nil, err
}
conn := NewStdioConn("server", binPath, fromChild, toChild)
// ensure that /usr/sbin is in $PATH
os.Setenv("PATH", os.Getenv("PATH")+":/usr/sbin")
child.Env = os.Environ()
|
|||
|
|||
Re: Slack invite?
Etienne Menguy
Hey,
As far as I remind you receive an invitation when you subscribe to the mailing list.
Etienne
From: daos@daos.groups.io <daos@daos.groups.io>
On Behalf Of pavelg via groups.io
Sent: jeudi 12 mai 2022 10:56 To: daos@daos.groups.io Subject: [daos] Slack invite?
Hi all,
|
|||
|
|||
I can not improve my client's performance by increasing client's thread
shadow_vector@...
Hi everyone:
Recently, i use daos array interface to do some test, i found that the client's performance would be improved by increasing the client’s thread num. And the server does not reach the bottleneck. I check the cores used by client, all the cores are busy. Is there something i need pay attention to when using the interface? Best Regards!
|
|||
|
|||
DAOS Community Update / May'22
Lombardi, Johann
Hi there,
Please find below the DAOS community newsletter for May 2022.
Past Events
DAOS Use at Argonne Kevin Harms (Argonne) Johann Lombardi (Intel) Mohamad Chaarawi (Intel) https://energyhpc.rice.edu/program/
Accelerating Data-driven Workflows with DAOS Johann Lombardi (Intel)
Upcoming Events
DAOS Next Generation Storage https://www.exascaleproject.org/event/ecp-community-bof-days-2022/ (registration is open) Kevin Harms (ANL) Mohamad Chaarawi (Intel) Johann Lombardi (Intel)
Advanced Storage and Memory Hierarchy in AI and HPC with DAOS Storage Andrey Kudryavtsev (Intel)
Advanced Storage and Memory Hierarchy in AI and HPC with DAOS Storage Andrey Kudryavtsev (Intel)
Accelerating AI with DAOS Storage Johann Lombardi (Intel)
Accelerating HPC and AI with DAOS Storage https://app.swapcard.com/widget/event/isc-high-performance-2022/planning/UGxhbm5pbmdfODYxMTYx Kevin Harms (ANL) Adrian Jackson (EPCC) Michael Hennecke (Intel) Mohamad Chaarawi (Intel) Johann Lombardi (Intel)
DAOS Features for Next Generation Platforms https://app.swapcard.com/widget/event/isc-high-performance-2022/planning/UGxhbm5pbmdfODYxMjIy Mohamad Chaarawi (Intel)
DAOS demonstration, Fireside chats, …
DAOS: Nextgen Storage Stack for HPC and AI https://sites.google.com/view/essa-2022/ Johann Lombardi (Intel)
Release
R&D
News
See https://events.linuxfoundation.org/sodacode/ for more information. --------------------------------------------------------------------- This e-mail and any attachments may contain confidential material for
|
|||
|
|||
Re: Is there any problem at blobstore load err
Niu, Yawei
Right, that’s probably something could be improved in the future. Maybe we could move the blobstore creation from io engine (on first start) to control plane (on storage format?). Thanks for pointing it out.
Thanks -Niu
From: daos@daos.groups.io <daos@daos.groups.io> on behalf of shadow_vector@... <shadow_vector@...> Hi everyone:
|
|||
|
|||
DAOS BoF at ISC'22
Lombardi, Johann
Hi there,
The DAOS BoF at ISC’22 (Hamburg) will be held on May 30th from 4pm to 5pm in Hall E. Please save the date! The format of the BoF will be short lightening talks followed by discussions. If you would like to participate in the lightening talks, please reach out to me, Kevin or Michael.
Cheers, Johann
--------------------------------------------------------------------- This e-mail and any attachments may contain confidential material for
|
|||
|
|||
Is there any problem at blobstore load err
shadow_vector@...
Hi everyone:
In create_bio_bdev function, if load_blobstore is failed, DAOS would init it. I'm a little worry about it. Whether there is a possibility of an exception when creating bdev or loading blobstore although the device is fine and with an old blobstore created by daos? If this exception happends, the blobstore would be inited and all data in the device would be removed. Is there any problem? Best Regards!
|
|||
|
|||
Re: dfuse mount error in centos 7
Wang, Shilong
Yes, libioil works in conjunction with DFuse and allows the interception of POSIX I/O calls.
From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of
Groot
Sent: Wednesday, April 13, 2022 10:42 AM To: daos@daos.groups.io Subject: Re: [daos] dfuse mount error in centos 7
OK, thanks. And I want to know if I must mount the dfuse before using the libioil interception library?
|
|||
|
|||
Re: The file removal performance is very low by mdtest
Wang, Shilong
Try to explicitly set the object class for files and directories – the defaults are not providing the best performance for mdtest-like workloads: --dfs.oclass=S1 --dfs.dir_oclass=SX
From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of
Groot
Sent: Wednesday, April 13, 2022 11:15 AM To: daos@daos.groups.io Subject: [daos] The file removal performance is very low by mdtest
I install the DAOS and use the mdtest to do some test with -a DFS (one server and one client with 20 cores, the server use 256G DCPM without nvme ssd). But I find the file removal performance is very low and the result shown as follow and I want to know
if it is right? Or how to do the performance turning? access_points: ['server-1'] port: 10001 transport_config: allow_insecure: false client_cert_dir: /etc/daos/certs/clients ca_cert: /etc/daos/certs/daosCA.crt cert: /etc/daos/certs/server.crt key: /etc/daos/certs/server.key provider: ofi+verbs;ofi_rxm control_log_mask: INFO control_log_file: /tmp/daos_server.log helper_log_file: /tmp/daos_admin.log telemetry_port: 9191
engines: - targets: 20 nr_xs_helpers: 20 fabric_iface: ib0 fabric_iface_port: 31316 log_mask: ERR log_file: /tmp/daos_engine_0.log scm_mount: /mnt/daos0 scm_class: dcpm scm_list: [/dev/pmem0]
|
|||
|
|||
Re: object classes (was: The file removal performance is very low by mdtest)
Hennecke, Michael
Hi,
Without data protection, SX (striping across all storage targets) and S1 (using only a single storage target) are your two main options. With replication or erasure coding, there are more object classes that could be used. The object page on github has some background:
https://github.com/daos-stack/daos/tree/release/2.0/src/object
Note that this page isn’t 100% current with the latest object layout changes in 2.0, but it’s a good starting point. We have a gap in our user documentation around this. I’m working to create some more user-level documentation around this topic right now.
Best, Michael
From: daos@daos.groups.io <daos@daos.groups.io>
On Behalf Of Groot
Sent: Thursday, 14 April 2022 09:42 To: daos@daos.groups.io Subject: Re: [daos] The file removal performance is very low by mdtest
Thanks, I want to know where I can get the description of --dfs.oclass and --dfs.dir_oclass?Do they contain other parameters except S1 and SX? Intel Deutschland GmbH
|
|||
|