Re: current DAOS master deadlocks in daos_test when using verbs;ofi_rxm
Oganezov, Alexander A
Thanks for info Kevan,
We will update it locally and once it passes internal testing we will make build.config update
From: firstname.lastname@example.org <email@example.com>
On Behalf Of Kevan Rehm
There is a bug in the version of ofi that CaRT is picking up in its build.config file. A new pthread was added in verbs;ofi_rm that handles unmap memory events so that the NIC can be notified when the user unmaps memory that is registered with the NIC. See https://github.com/ofiwg/libfabric/issues/5580 for details on how the deadlock occurs, it happens every time in the Array test section.
Sean Hefty suggested updating ofi to https://github.com/ofiwg/libfabric/commit/3d01df7716d099ba222f99865345d6767ae9e686 in order to fix the problem. It was merged on Jan 18, 2020.
I did something slightly different, I downloaded a fresh daos, then quickly did a ‘git rebase master’ in the ofi subdirectory before the compilation began, and the problem is definitely fixed, daos_test no longer hangs at the same point.
cart/build.config needs to be updated to this new commit or newer in order to avoid the problem.