Date   

bio chunk assert failed when hot plug

ZeHan Wang
 

Hey, I encountered a coredump while running continuous hot-unplug tests recently. First, I unplugged two NVMe disks from the same rank, then plugged them back in. Then, I unplugged one disk from each of two different ranks and plugged them back in. Finally, when I unplugged one disk from each of two different nodes, the coredump occurred. I observed some strange phenomena: the content of the chunk was completely destroyed, the type of the BIOD was not update but fetch, and the status code of biod->bd_result was -2007. How can I reproduce this scenario and what could be causing this issue? Is there a solution?





bio chunk assert failed when hot plug

ZeHan Wang
 

"Hey, I encountered a coredump while running continuous hot-unplug tests recently. First, I unplugged two NVMe disks from the same rank, then plugged them back in. Then, I unplugged one disk from each of two different ranks and plugged them back in. Finally, when I unplugged one disk from each of two different nodes, the coredump occurred. I observed some strange phenomena: the content of the chunk was completely destroyed, the type of the BIOD was not update but fetch, and the status code of biod->bd_result was -2007. How can I reproduce this scenario and what could be causing this issue?"  Is there a solution?






Re: Hi,when I use EC, data write amplification occurs.,and I'm not sure whether this is normal

yougeng789@...
 

Thank you for your reply! Maybe there is a problem with my verification. When I changed the verification method, the conclusion I expected appeared. In fact, I was not sure whether the data expansion of about 3.4 times is a normal phenomenon under the specification of EC_2P2GX, and now I am sure.


Re: Hi,when I use EC, data write amplification occurs.,and I'm not sure whether this is normal

Liu, Xuezhao
 

Hi,

 

To be fair should compare EC_2P2 with RP_3, as both can tolerate two shards lost (RP_2 can only lose one shard so it can be compared with EC_2P1).

And for EC data storage efficiency, it is related with EC cell size and the IO size.

DAOS 2.4’s default EC cell size is 64KB (if iod_size is 1), for EC_2P2 -

For full stipe write for example 128KB, then the data is stored on the 4 shards each with 64KB, then it is about 2 times (actually is a little higher as with a few overhead).

For partial write for example write 64KB, then it is stored same as RP_3, which is about 3 times.

BTW, can use “ec_cell_sz” property to set cell size when create pool/container.

 

Thanks,

Xuezhao

 

From: <daos@daos.groups.io> on behalf of "yougeng789@..." <yougeng789@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Thursday, June 8, 2023 at 15:33
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: [daos] Hi,when I use EC, data write amplification occurs.,and I'm not sure whether this is normal

 

When I use the specification of RP_2GX, the space used by the system is 2.2 times the actual size of the data, but when I use the specification of EC_2P2GX, the space used by the system is about 3.4 times the size of the actual data, which is quite different from the theoretical data of EC. 
Now I'm not sure if this is normal


Re: vos_iterate unexpect repeat executed func fill_rec

Niu, Yawei
 

Hi,

 

As the code comment mentioned, we avoided reading NVMe data in fill_rec() since it doesn’t support yield, so fill_rec() won’t call copy_data_cb() over NVMe data. (You can see there is an assert on media type before copy_data_cb() is called).

 

Thanks

-Niu

 

From: daos@daos.groups.io <daos@daos.groups.io> on behalf of 王云鹏 <13419073430@...>
Date: Wednesday, June 7, 2023 at 4:21 PM
To: daos@daos.groups.io <daos@daos.groups.io>
Subject: [daos] vos_iterate unexpect repeat executed func fill_rec

Hi , Recently I have encountered some problems during reintegrate. And It seems that during fill_rec will yield and then cause vos_iterate retry.   Is there a solution for this?

 

 

The following is where the thread yields  (probably)

fill_rec ====>  copy_data_cb  ====>  vos_iter_copy ====> recx_iter_copy====> bio_read ====> iod_map_iovs ====> iod_fifo_in ====> ABT_cond_wait

 

 

 

 


vos_iterate unexpect repeat executed func fill_rec

王云鹏
 

Hi , Recently I have encountered some problems during reintegrate. And It seems that during fill_rec will yield and then cause vos_iterate retry.   Is there a solution for this?


The following is where the thread yields  (probably):
fill_rec ====>  copy_data_cb  ====>  vos_iter_copy ====> recx_iter_copy====> bio_read ====> iod_map_iovs ====> iod_fifo_in ====> ABT_cond_wait





Re: Delay releasing space #chat

Zhen, Liang
 

Hi Landen, you can enable it by using macro CONT_DESTROY_SYNC_WAIT, but it is for debugging and not tested by CI.  

 

Liang

 

From: daos@daos.groups.io <daos@daos.groups.io> on behalf of landen.tian@... <landen.tian@...>
Date: Friday, May 26, 2023 at 10:12 AM
To: daos@daos.groups.io <daos@daos.groups.io>
Subject: [daos] Delay releasing space #chat

When I create a container and write tons of data, I just delete the container. I find the space isn't released right now. Wait a few minutes, daos releases space finally.
I guess space releasing is like GC
it is delayed.
There is a command to release space right now.


Delay releasing space #chat

landen.tian@...
 
Edited

When I create a container and write tons of data, I just delete the container. I find the space isn't released right now. Wait a few minutes, daos releases space finally.
I guess space releasing is like GC,it is delayed.
Is there a command/way to release space right now.


Re: I couldn't start daos_server and didn't make sure if it's related to the internet port #rocky

Jacque, Kris
 

Hi there,

 

These messages shouldn’t be causing any errors in your connectivity. They are debug-level messages generated while scanning the network hardware topology. Ignoring some virtual devices or subdevices is typical on any system, so I don’t think you have anything to worry about there.

 

If daos_server failed to start, the cause likely appears in the final error message you got on exit. If you could provide more details about the failure, it would be helpful. It seems likely something is wrong in your config file.

 

Thanks,

Kris

 

From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of yougeng789@...
Sent: Thursday, April 6, 2023 9:28 PM
To: daos@daos.groups.io
Subject: [daos] I couldn't start daos_server and didn't make sure if it's related to the internet port #rocky

 

In the configuration file daos_server.yml of fabric_iface, I did not use bond, and then log always display network card related errors.as follows


I couldn't start daos_server and didn't make sure if it's related to the internet port #rocky

yougeng789@...
 

In the configuration file daos_server.yml of fabric_iface, I did not use bond, and then log always display network card related errors.as follows




Re: Question about daos read bandwidth

Zhen, Liang
 

The current DAOS does not have any cache, I think it’s probably because you are reading from data extent which has no data, in that case, DAOS will not transfer anything through the network, you might see higher bandwidth then the capability of network/storage.

 

Liang

 

From: daos@daos.groups.io <daos@daos.groups.io> on behalf of 段世博 <duanshibo.d@...>
Date: Wednesday, April 5, 2023 at 10:32 AM
To: daos@daos.groups.io <daos@daos.groups.io>
Subject: [daos] Question about daos read bandwidth

[Edited Message Follows]

Will daos have a cache for 128KB reads, or is there a prefetch operation? When I use spdk bdev for testing, I found that the bandwidth of DAOS sequential reads exceeds the bandwidth of SSD devices.


Question about daos read bandwidth

段世博
 
Edited

Will daos have a cache for 128KB reads, or is there a prefetch operation? When I use spdk bdev for testing, I found that the bandwidth of DAOS sequential reads exceeds the bandwidth of SSD devices.


Re: Qustion about Questions about data placement

Johann
 

You can monitor the output of pool query that reports the space usage on PMEM and SSD separately.

That been said, we don’t have a metric reporting the total amount of data migrated by aggregation for each pool. We should add that since it can be helpful to differentiate the bandwidth used by regular I/O vs aggregation when analyzing performance issues.

 

Cheers,

Johann

 

From: <daos@daos.groups.io> on behalf of 段世博 <duanshibo.d@...>
Reply to: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Wednesday 8 March 2023 at 08:04
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: Re: [daos] Qustion about Questions about data placement

 

Is there a way to know how much data has been migrated from PMEM to SSD

---------------------------------------------------------------------
Intel Corporation SAS (French simplified joint stock company)
Registered headquarters: "Les Montalets"- 2, rue de Paris,
92196 Meudon Cedex, France
Registration Number:  302 456 199 R.C.S. NANTERRE
Capital: 5 208 026.16 Euros

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


Re: Qustion about Questions about data placement

段世博
 

Is there a way to know how much data has been migrated from PMEM to SSD


Re: Qustion about Questions about data placement

Johann
 

Extents smaller than 4KiB that cannot be aggregated with other contiguous extents remain in PMEM and are not migrated to SSDs.

As for overwrites, extents that are eventually not readable any longer (i.e., completely overwritten or truncated and no snapshots) are deleted in the background. This is true for both extents on SSDs and PMEM.

 

Cheers,

Johann

 

From: <daos@daos.groups.io> on behalf of 段世博 <duanshibo.d@...>
Reply to: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Tuesday 7 March 2023 at 16:24
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: Re: [daos] Qustion about Questions about data placement

 

Thank you very much for your answer!
I have another question: whether all the data written to PMEM will be written to SSD, for example under Zipfian workload, if the data still in PMEM is overwritten by new writes, will the old data still be written to SSD

---------------------------------------------------------------------
Intel Corporation SAS (French simplified joint stock company)
Registered headquarters: "Les Montalets"- 2, rue de Paris,
92196 Meudon Cedex, France
Registration Number:  302 456 199 R.C.S. NANTERRE
Capital: 5 208 026.16 Euros

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


Re: Qustion about Questions about data placement

段世博
 

Thank you very much for your answer!
I have another question: whether all the data written to PMEM will be written to SSD, for example under Zipfian workload, if the data still in PMEM is overwritten by new writes, will the old data still be written to SSD


Re: Qustion about Questions about data placement

Johann
 

By default, any contiguous extents strictly smaller than 4KiB are written to SCM and the ones bigger than or equal to 4KiB are written to SSDs.

The 4KiB threshold is configurable starting DAOS v2.0 at the pool level via the “policy” property.

 

$ dmg pool get-prop test | grep placement

Tier placement policy (policy)                   type=io_size

$ dmg pool set-prop test policy:type=io_size/th1=16384

pool set-prop succeeded

$ dmg pool get-prop test | grep placement

Tier placement policy (policy)                   type=io_size/th1= 16384

 

HTH

 

Cheers,

Johann

---------------------------------------------------------------------
Intel Corporation SAS (French simplified joint stock company)
Registered headquarters: "Les Montalets"- 2, rue de Paris,
92196 Meudon Cedex, France
Registration Number:  302 456 199 R.C.S. NANTERRE
Capital: 5 208 026.16 Euros

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


Macdonald, Mjmac
 

In this case, the core problem is that there is an empty storage tier in the configuration, and the config parser doesn’t handle this correctly. Created https://daosio.atlassian.net/browse/DAOS-12826 to address the defect. Once the empty storage tier is removed, the command should fail with a more sensible error when the system does not have any PMem modules installed.

 

mjmac

 

From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of Lombardi, Johann
Sent: Monday, 6 March, 2023 05:26
To: daos@daos.groups.io; Nabarro, Tom <tom.nabarro@...>
Subject: Re: [daos] DAOS 2.3.103 #chat #docker #installation #2.3.103 #rocky #ubuntu

 

Hi there,

 

scm prepare is only required when using Optane PMEM. Since you use DRAM, you don’t need to run scm prepare.

That being said, it would be great for scm prepare to fail nicely in this case, @Nabarro, Tom?

 

Cheers,

Johann

 

From: <daos@daos.groups.io> on behalf of "salma.salem@..." <salma.salem@...>
Reply to: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Monday 6 March 2023 at 11:16
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: [daos] DAOS 2.3.103 #chat #docker #installation #2.3.103 #rocky #ubuntu

 

I'm trying to set up daos 2.3.103 using the EL8 Dockerfile but I ended up with this error when trying to run scm prepare


this is what my configuration file looks like




I also tried this with the ubuntu file and stopped at the same point but my focus now is to have the rocky version working.
Has anyone encountered this error before?

---------------------------------------------------------------------
Intel Corporation SAS (French simplified joint stock company)
Registered headquarters: "Les Montalets"- 2, rue de Paris,
92196 Meudon Cedex, France
Registration Number:  302 456 199 R.C.S. NANTERRE
Capital: 5 208 026.16 Euros

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


Johann
 

Hi there,

 

scm prepare is only required when using Optane PMEM. Since you use DRAM, you don’t need to run scm prepare.

That being said, it would be great for scm prepare to fail nicely in this case, @Nabarro, Tom?

 

Cheers,

Johann

 

From: <daos@daos.groups.io> on behalf of "salma.salem@..." <salma.salem@...>
Reply to: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Monday 6 March 2023 at 11:16
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: [daos] DAOS 2.3.103 #chat #docker #installation #2.3.103 #rocky #ubuntu

 

I'm trying to set up daos 2.3.103 using the EL8 Dockerfile but I ended up with this error when trying to run scm prepare


this is what my configuration file looks like




I also tried this with the ubuntu file and stopped at the same point but my focus now is to have the rocky version working.
Has anyone encountered this error before?

---------------------------------------------------------------------
Intel Corporation SAS (French simplified joint stock company)
Registered headquarters: "Les Montalets"- 2, rue de Paris,
92196 Meudon Cedex, France
Registration Number:  302 456 199 R.C.S. NANTERRE
Capital: 5 208 026.16 Euros

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


salma.salem@...
 

I'm trying to set up daos 2.3.103 using the EL8 Dockerfile but I ended up with this error when trying to run scm prepare


this is what my configuration file looks like




I also tried this with the ubuntu file and stopped at the same point but my focus now is to have the rocky version working.
Has anyone encountered this error before?