bio chunk assert failed when hot plug
ZeHan Wang
Hey, I encountered a coredump while running continuous hot-unplug tests recently. First, I unplugged two NVMe disks from the same rank, then plugged them back in. Then, I unplugged one disk from each of two different ranks and plugged them back in. Finally, when I unplugged one disk from each of two different nodes, the coredump occurred. I observed some strange phenomena: the content of the chunk was completely destroyed, the type of the BIOD was not update but fetch, and the status code of biod->bd_result was -2007. How can I reproduce this scenario and what could be causing this issue? Is there a solution?
|
|
bio chunk assert failed when hot plug
ZeHan Wang
"Hey, I encountered a coredump while running continuous hot-unplug tests recently. First, I unplugged two NVMe disks from the same rank, then plugged them back in. Then, I unplugged one disk from each of two different ranks and plugged them back in. Finally, when I unplugged one disk from each of two different nodes, the coredump occurred. I observed some strange phenomena: the content of the chunk was completely destroyed, the type of the BIOD was not update but fetch, and the status code of biod->bd_result was -2007. How can I reproduce this scenario and what could be causing this issue?" Is there a solution?
|
|
Re: Hi,when I use EC, data write amplification occurs.,and I'm not sure whether this is normal
yougeng789@...
Thank you for your reply! Maybe there is a problem with my verification. When I changed the verification method, the conclusion I expected appeared. In fact, I was not sure whether the data expansion of about 3.4 times is a normal phenomenon under the specification of EC_2P2GX, and now I am sure.
|
|
Re: Hi,when I use EC, data write amplification occurs.,and I'm not sure whether this is normal
Liu, Xuezhao
Hi,
To be fair should compare EC_2P2 with RP_3, as both can tolerate two shards lost (RP_2 can only lose one shard so it can be compared with EC_2P1). And for EC data storage efficiency, it is related with EC cell size and the IO size. DAOS 2.4’s default EC cell size is 64KB (if iod_size is 1), for EC_2P2 - For full stipe write for example 128KB, then the data is stored on the 4 shards each with 64KB, then it is about 2 times (actually is a little higher as with a few overhead). For partial write for example write 64KB, then it is stored same as RP_3, which is about 3 times. BTW, can use “ec_cell_sz” property to set cell size when create pool/container.
Thanks, Xuezhao
From: <daos@daos.groups.io> on behalf of "yougeng789@..." <yougeng789@...>
When I use the specification of RP_2GX, the space used by the system is 2.2 times the actual size of the data, but when I use the specification of EC_2P2GX, the space used by the system is about 3.4 times the size of the actual data, which
is quite different from the theoretical data of EC. |
|
Re: vos_iterate unexpect repeat executed func fill_rec
Niu, Yawei
Hi,
As the code comment mentioned, we avoided reading NVMe data in fill_rec() since it doesn’t support yield, so fill_rec() won’t call copy_data_cb() over NVMe data. (You can see there is an assert on media type before copy_data_cb() is called).
Thanks -Niu
From: daos@daos.groups.io <daos@daos.groups.io> on behalf of
王云鹏 <13419073430@...> Hi , Recently I have encountered some problems during reintegrate. And It seems that during fill_rec will yield and then cause vos_iterate retry. Is there a solution for this?
The following is where the thread yields (probably): fill_rec ====> copy_data_cb ====> vos_iter_copy ====> recx_iter_copy====> bio_read ====> iod_map_iovs ====> iod_fifo_in ====> ABT_cond_wait
|
|
vos_iterate unexpect repeat executed func fill_rec
王云鹏
Hi , Recently I have encountered some problems during reintegrate. And It seems that during fill_rec will yield and then cause vos_iterate retry. Is there a solution for this? The following is where the thread yields (probably): fill_rec ====> copy_data_cb ====> vos_iter_copy ====> recx_iter_copy====> bio_read ====> iod_map_iovs ====> iod_fifo_in ====> ABT_cond_wait |
|
Re: Delay releasing space
#chat
Zhen, Liang
Hi Landen, you can enable it by using macro CONT_DESTROY_SYNC_WAIT, but it is for debugging and not tested by CI.
Liang
From:
daos@daos.groups.io <daos@daos.groups.io> on behalf of landen.tian@... <landen.tian@...> When I create a container and write tons of data, I just delete the container. I find the space isn't released right now. Wait a few minutes, daos releases space finally. |
|
Delay releasing space
#chat
When I create a container and write tons of data, I just delete the container. I find the space isn't released right now. Wait a few minutes, daos releases space finally.
I guess space releasing is like GC,it is delayed. Is there a command/way to release space right now. |
|
Re: I couldn't start daos_server and didn't make sure if it's related to the internet port
#rocky
Jacque, Kris
Hi there,
These messages shouldn’t be causing any errors in your connectivity. They are debug-level messages generated while scanning the network hardware topology. Ignoring some virtual devices or subdevices is typical on any system, so I don’t think you have anything to worry about there.
If daos_server failed to start, the cause likely appears in the final error message you got on exit. If you could provide more details about the failure, it would be helpful. It seems likely something is wrong in your config file.
Thanks, Kris
From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of
yougeng789@...
Sent: Thursday, April 6, 2023 9:28 PM To: daos@daos.groups.io Subject: [daos] I couldn't start daos_server and didn't make sure if it's related to the internet port #rocky
In the configuration file daos_server.yml of fabric_iface, I did not use bond, and then log always
display network card related errors.as follows |
|
I couldn't start daos_server and didn't make sure if it's related to the internet port
#rocky
yougeng789@...
In the configuration file daos_server.yml of fabric_iface, I did not use bond, and then log always display network card related errors.as follows |
|
Re: Question about daos read bandwidth
Zhen, Liang
The current DAOS does not have any cache, I think it’s probably because you are reading from data extent which has no data, in that case, DAOS will not transfer anything through the network, you might see higher bandwidth then the capability of network/storage.
Liang
From:
daos@daos.groups.io <daos@daos.groups.io> on behalf of
段世博 <duanshibo.d@...> [Edited Message Follows] Will daos have a cache for 128KB reads, or is there a prefetch operation? When I use spdk bdev for testing, I found that the bandwidth of DAOS sequential reads exceeds the bandwidth of SSD devices. |
|
Question about daos read bandwidth
Will daos have a cache for 128KB reads, or is there a prefetch operation? When I use spdk bdev for testing, I found that the bandwidth of DAOS sequential reads exceeds the bandwidth of SSD devices.
|
|
Re: Qustion about Questions about data placement
Johann
You can monitor the output of pool query that reports the space usage on PMEM and SSD separately. That been said, we don’t have a metric reporting the total amount of data migrated by aggregation for each pool. We should add that since it can be helpful to differentiate the bandwidth used by regular I/O vs aggregation when analyzing performance issues.
Cheers, Johann
From: <daos@daos.groups.io> on behalf of
段世博 <duanshibo.d@...>
Is there a way to know how much data has been migrated from PMEM to SSD --------------------------------------------------------------------- This e-mail and any attachments may contain confidential material for |
|
Re: Qustion about Questions about data placement
段世博
Is there a way to know how much data has been migrated from PMEM to SSD
|
|
Re: Qustion about Questions about data placement
Johann
Extents smaller than 4KiB that cannot be aggregated with other contiguous extents remain in PMEM and are not migrated to SSDs. As for overwrites, extents that are eventually not readable any longer (i.e., completely overwritten or truncated and no snapshots) are deleted in the background. This is true for both extents on SSDs and PMEM.
Cheers, Johann
From: <daos@daos.groups.io> on behalf of
段世博 <duanshibo.d@...>
Thank you very much for your answer! --------------------------------------------------------------------- This e-mail and any attachments may contain confidential material for |
|
Re: Qustion about Questions about data placement
段世博
Thank you very much for your answer!
I have another question: whether all the data written to PMEM will be written to SSD, for example under Zipfian workload, if the data still in PMEM is overwritten by new writes, will the old data still be written to SSD |
|
Re: Qustion about Questions about data placement
Johann
By default, any contiguous extents strictly smaller than 4KiB are written to SCM and the ones bigger than or equal to 4KiB are written to SSDs. The 4KiB threshold is configurable starting DAOS v2.0 at the pool level via the “policy” property.
$ dmg pool get-prop test | grep placement Tier placement policy (policy) type=io_size $ dmg pool set-prop test policy:type=io_size/th1=16384 pool set-prop succeeded $ dmg pool get-prop test | grep placement Tier placement policy (policy) type=io_size/th1= 16384
HTH
Cheers, Johann --------------------------------------------------------------------- This e-mail and any attachments may contain confidential material for |
|
Re: DAOS 2.3.103
#rocky
#ubuntu
#chat
#docker
#installation
Macdonald, Mjmac
In this case, the core problem is that there is an empty storage tier in the configuration, and the config parser doesn’t handle this correctly. Created https://daosio.atlassian.net/browse/DAOS-12826 to address the defect. Once the empty storage tier is removed, the command should fail with a more sensible error when the system does not have any PMem modules installed.
mjmac
From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of
Lombardi, Johann
Sent: Monday, 6 March, 2023 05:26 To: daos@daos.groups.io; Nabarro, Tom <tom.nabarro@...> Subject: Re: [daos] DAOS 2.3.103 #chat #docker #installation #2.3.103 #rocky #ubuntu
Hi there,
scm prepare is only required when using Optane PMEM. Since you use DRAM, you don’t need to run scm prepare. That being said, it would be great for scm prepare to fail nicely in this case, @Nabarro, Tom?
Cheers, Johann
From: <daos@daos.groups.io> on behalf of "salma.salem@..."
<salma.salem@...>
I'm trying to set up daos 2.3.103 using the EL8 Dockerfile but I ended up with this error when trying to run scm prepare --------------------------------------------------------------------- This e-mail and any attachments may contain confidential material for |
|
Re: DAOS 2.3.103
#rocky
#ubuntu
#chat
#docker
#installation
Johann
Hi there,
scm prepare is only required when using Optane PMEM. Since you use DRAM, you don’t need to run scm prepare. That being said, it would be great for scm prepare to fail nicely in this case, @Nabarro, Tom?
Cheers, Johann
From: <daos@daos.groups.io> on behalf of "salma.salem@..." <salma.salem@...>
I'm trying to set up daos 2.3.103 using the EL8 Dockerfile but I ended up with this error when trying to run scm prepare --------------------------------------------------------------------- This e-mail and any attachments may contain confidential material for |
|
DAOS 2.3.103
#rocky
#ubuntu
#chat
#docker
#installation
salma.salem@...
I'm trying to set up daos 2.3.103 using the EL8 Dockerfile but I ended up with this error when trying to run scm prepare
this is what my configuration file looks like I also tried this with the ubuntu file and stopped at the same point but my focus now is to have the rocky version working. Has anyone encountered this error before? |
|