Re: current DAOS master deadlocks in daos_test when using verbs;ofi_rxm


Oganezov, Alexander A
 

Thanks for info Kevan,

 

We will update it locally and once it passes internal testing we will make build.config update

 

~~Alex.

 

From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of Kevan Rehm
Sent: Tuesday, January 28, 2020 1:22 PM
To: daos@daos.groups.io
Subject: [daos] current DAOS master deadlocks in daos_test when using verbs;ofi_rxm

 

All,

 

There is a bug in the version of ofi that CaRT is picking up in its build.config file.  A new pthread was added in verbs;ofi_rm that handles unmap memory events so that the NIC can be notified when the user unmaps memory that is registered with the NIC.  See https://github.com/ofiwg/libfabric/issues/5580 for details on how the deadlock occurs, it happens every time in the Array test section.

 

Sean Hefty suggested updating ofi to https://github.com/ofiwg/libfabric/commit/3d01df7716d099ba222f99865345d6767ae9e686 in order to fix the problem.  It was merged on Jan 18, 2020.

 

I did something slightly different, I downloaded a fresh daos, then quickly did a ‘git rebase master’ in the ofi subdirectory before the compilation began, and the problem is definitely fixed, daos_test no longer hangs at the same point.

 

cart/build.config needs to be updated to this new commit or newer in order to avoid the problem.

 

Regards, Kevan

Join daos@daos.groups.io to automatically receive all group messages.