Date   

POSIX file system tests.

Colin Ngam
 

Greetings,

 

Just a quick question.

 

What regression tests is executed to test DAOS Posix functionality.

 

Thanks.

 

Colin

 


Re: Need advice debugging "No route to host" failures

Kevan Rehm
 

Sean,

If you open a libfabric ticket, please post it back here so that we can follow.

Thanks, Kevan

On 4/17/20, 6:59 PM, "daos@daos.groups.io on behalf of Hefty, Sean" <daos@daos.groups.io on behalf of sean.hefty@...> wrote:

> Interesting finds. I will let Sean comment on expected OFI behavior in case of running
> out of hugepages, but to me it sounds like it should have switched and used regular
> memory in future calls.

Yes, this sounds ideal. I'll look into this.

- Sean


Re: How to figure out the all the daos_io_server are talking?

Nabarro, Tom
 

So when you use ctrl-c we are unceremoniously SIGKILLing the IO servers (graceful shutdown currently doesn’t work for various reasons and the control plane change to enable it was reverted). The system membership won’t get updated until a query is issued so when you restart and rejoin happens, the status is still “Started” and you get the below message. This is a cosmetic error and notifies us that the state wasn’t updated on stop. Correct states will be reported on a system query.

 

I’m working on updates in this area to decouple flow-control of instances on the same host in order to lift associated limitations and enable independent start and reformat of instances regardless of state of other instances on the same host.

 

I will fix this error in the process.

 

Tom

 

From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of Colin Ngam
Sent: Saturday, April 18, 2020 1:40 AM
To: daos@daos.groups.io
Subject: Re: [daos] How to figure out the all the daos_io_server are talking?

 

Hi Tom,

 

So, I can repeat, stop and restart the servers:

 

DEBUG 19:32:47.524108 mgmt_system.go:147: updated system member: rank 2, addr 172.30.222.230:10001, Started->Started

ERROR: unexpected same state in rank 2 update (Started->Started)

DEBUG 19:32:47.609783 mgmt_system.go:147: updated system member: rank 3, addr 172.30.222.230:10001, Started->Started

 

However, if I shutdown the server, execute the “dmg -l delphi-006 system query –verbose” it knows that the servers have terminated (it does send ping out):

DEBUG 19:33:24.566669 ctl_system.go:173: Responding to SystemQuery RPC

DEBUG 19:34:59.186204 mgmt_system.go:147: updated system member: rank 4, addr 172.30.222.236:10001, Stopped->Started

DEBUG 19:34:59.271000 mgmt_system.go:147: updated system member: rank 5, addr 172.30.222.236:10001, Stopped->Started

 

May be some heartbeat etc.

 

Thanks.

 

Colin

 

From: <daos@daos.groups.io> on behalf of Colin Ngam <colin.ngam@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Friday, April 17, 2020 at 6:51 PM
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: Re: [daos] How to figure out the all the daos_io_server are talking?

 

It is not a 100% hit either ..

 

DEBUG 18:47:46.314720 ctl_system.go:173: Responding to SystemQuery RPC

DEBUG 18:49:02.859993 mgmt_system.go:147: updated system member: rank 4, addr 172.30.222.236:10001, Stopped->Started

DEBUG 18:49:02.945670 mgmt_system.go:147: updated system member: rank 5, addr 172.30.222.236:10001, Stopped->Started

 

From: <daos@daos.groups.io> on behalf of "Nabarro, Tom" <tom.nabarro@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Friday, April 17, 2020 at 5:23 PM
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: Re: [daos] How to figure out the all the daos_io_server are talking?

 

That’s unexpected, I will try to reproduce over the weekend.

 

From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of Colin Ngam
Sent: Friday, April 17, 2020 9:05 PM
To: daos@daos.groups.io
Subject: Re: [daos] How to figure out the all the daos_io_server are talking?

 

Hi Tom,

 

I have a host that is rank 4 and 5. I killed the servers Control-C and restarted the serves after maybe 3-5 minutes. The access-host did not see any server going away. After restart I see:

 

DEBUG 14:56:53.800937 mgmt_system.go:147: updated system member: rank 4, addr 172.30.222.236:10001, Started->Started

ERROR: unexpected same state in rank 4 update (Started->Started)

DEBUG 14:56:53.888180 mgmt_system.go:147: updated system member: rank 5, addr 172.30.222.236:10001, Started->Started

ERROR: unexpected same state in rank 5 update (Started->Started)

 

Is this good or bad?

 

Today’s master tree.

 

Thanks.

 

Colin

 

From: "Ngam, Colin" <colin.ngam@...>
Date: Wednesday, April 1, 2020 at 10:05 AM
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: Re: [daos] How to figure out the all the daos_io_server are talking?

 

Hi Tom,

 

Thanks Tom. I will see if I can replicate the issue.

 

Colin

 

From: <daos@daos.groups.io> on behalf of "Nabarro, Tom" <tom.nabarro@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Wednesday, April 1, 2020 at 8:14 AM
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: Re: [daos] How to figure out the all the daos_io_server are talking?

 

The superblock stores allocated ranks:

[tanabarr@wolf-71 daos_m]$ cat /mnt/daos0/superblock

version: 0

uuid: 8359d5d6-1d8f-430c-90db-205fc07113bf

system: daos_server

rank: 2

validrank: true

ms: false

createms: false

bootstrapms: false

 

On start-up, non-MS daos_server will supply the superblock rank in the join request to the system.

The iosrv/mgmt maintains rank:uuid mappings and should allow the rank to rejoin the system given the expected uuid is supplied along with the rank in the request.

 

I can’t recreate the behavior you are describing at the moment, if I stop then start a non-MS daos_server, they join with the same rank they were previously assigned. (I assume you are not reformatting)

 

if you can repeat that behavior, please send me the logs. We should be able to get some relevant information from the mgmt module in the server log from the function ds_mgmt_join_handler().

 

Regards,

Tom Nabarro – DCG/ESAD

M: +44 (0)7786 260986

Skype: tom.nabarro

 

From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of Colin Ngam
Sent: Tuesday, March 31, 2020 4:17 PM
To: daos@daos.groups.io
Subject: Re: [daos] How to figure out the all the daos_io_server are talking?

 

Hi,

 

I also noticed the following:

 

My configuration file does not have any “rank” specified. 2 daos_io_server per host. They came up and I see them as Rank 0 .. 3. All fine. I killed Rank 2 and 3 (same host) Control-C, restart daos_server, they came up registering as Rank 4 and 5. Is Rank persistent? If it is, what’s the association between Rank Number and daos_io_server? Host name IB GUID etc?

 

I will try to replicate and send trace when the opportunity arise here.

 

Thanks.

 

Colin

 

From: <daos@daos.groups.io> on behalf of "Nabarro, Tom" <tom.nabarro@...>
Reply-To: "
daos@daos.groups.io" <daos@daos.groups.io>
Date: Tuesday, March 31, 2020 at 10:07 AM
To: "
daos@daos.groups.io" <daos@daos.groups.io>
Subject: Re: [daos] How to figure out the all the daos_io_server are talking?

 

Thanks for the positive feedback, much appreciated.

 

We are looking to add information such as swim activity and other metrics to the dmg system query capability.

 

Regards,

Tom Nabarro – DCG/ESAD

M: +44 (0)7786 260986

Skype: tom.nabarro

 

From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of Colin Ngam
Sent: Tuesday, March 31, 2020 4:00 PM
To:
daos@daos.groups.io
Subject: Re: [daos] How to figure out the all the daos_io_server are talking?

 

Excellent!

 

I have seen messages that shows correct daos_io_server joined but 1 of the daos_io_server has no swim messages. I assume that if you set DEBUG mode they will all have swim messages flowing .. I will check nice time around.

 

The dmg command has feature grown nicely.

 

Thanks Tom.

 

Colin

 

From: <daos@daos.groups.io> on behalf of "Nabarro, Tom" <tom.nabarro@...>
Reply-To: "
daos@daos.groups.io" <daos@daos.groups.io>
Date: Tuesday, March 31, 2020 at 9:41 AM
To: "
daos@daos.groups.io" <daos@daos.groups.io>
Subject: Re: [daos] How to figure out the all the daos_io_server are talking?

 

“dmg -l <access_point_host> system query --verbose” will give you details of ranks that have previously joined the system including current state. This is an area of active development.

 

https://daos-stack.github.io/admin/administration/ “system operations“ section

 

Regards,

Tom Nabarro – DCG/ESAD

M: +44 (0)7786 260986

Skype: tom.nabarro

 

From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of Colin Ngam
Sent: Tuesday, March 31, 2020 12:32 AM
To:
daos@daos.groups.io
Subject: [daos] How to figure out the all the daos_io_server are talking?

 

Hi,

 

The daos_io_server say is it up and may not be  communicating with others. I set the DEBUG flag and watch for swim messages.

 

Is there a way to figure out that all the daos_io_server are healthy and functional?

 

Thanks.

 

Colin

---------------------------------------------------------------------
Intel Corporation (UK) Limited
Registered No. 1134945 (England)
Registered Office: Pipers Way, Swindon SN3 1RJ
VAT No: 860 2173 47

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

---------------------------------------------------------------------
Intel Corporation (UK) Limited
Registered No. 1134945 (England)
Registered Office: Pipers Way, Swindon SN3 1RJ
VAT No: 860 2173 47

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

---------------------------------------------------------------------
Intel Corporation (UK) Limited
Registered No. 1134945 (England)
Registered Office: Pipers Way, Swindon SN3 1RJ
VAT No: 860 2173 47

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

 

---------------------------------------------------------------------
Intel Corporation (UK) Limited
Registered No. 1134945 (England)
Registered Office: Pipers Way, Swindon SN3 1RJ
VAT No: 860 2173 47

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

---------------------------------------------------------------------
Intel Corporation (UK) Limited
Registered No. 1134945 (England)
Registered Office: Pipers Way, Swindon SN3 1RJ
VAT No: 860 2173 47

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


Re: Need advice debugging "No route to host" failures

Nabarro, Tom
 

In my daos_server.yml file the nr_hugepages parameter is now set to 8192.   This resulted in 16384 pages being allocated, because each SPDK run (2 daos_io_servers) allocated 8192 pages.   The comments in daos_server.yml are a little fuzzy, they could be read either way, did this behave correctly, should the total number of pages allocated be double the value of nr_hugepages?

 

>> yes , the value specified in server config file will be allocated for each IO server process

 

Originally I had nr_hugepages set to 4096 in the daos_server.yml file.  Using sysctl I had set vm.nr_hugepages to 16384, and /proc/meminfo showed I had 16384 huge pages.   But when I started the daos server, SPDK released half of my huge pages, I was left with only 8192, or 4096 per server.   If I configure the machine for 16384, SPDK should not lower it, correct?   It should raise the number of huge pages if the config file says so, but it should never lower the value set for the machine if it is greater than what is set in nr_hugepages.

 

>> it makes sense that if you set 4096 and had 2 IO servers configured then SPDK setup would allocate 8192 pages and reset what you have previously set with sysctl. We can look at altering the behavior if necessary. The control-plane is currently a bit selfish and sometimes assumes it is the sole user of persistent memory, we need to work at making it more cooperative.

I had to do the math to figure out how many hugepages my clients needed, divide that by two, then add thatett to nr_hugepages in the yml file to ensure that SPDK would allocate enough pages both for the two servers and my clients.   That seems complicated.

 

>> I think so far we have assumed the clients would be run on a separate host from the servers.

 

 

Regards,

Tom Nabarro – DCG/ESAD

M: +44 (0)7786 260986

Skype: tom.nabarro

 

From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of Kevan Rehm
Sent: Friday, April 17, 2020 10:25 PM
To: daos@daos.groups.io
Subject: Re: [daos] Need advice debugging "No route to host" failures

 

Alex,

 

Okay, I figured out what was happening here.   I have a couple of questions; I wonder if you think this is mostly a configuration error, or a bug.   Or at least maybe there is a way to make the problem get reported more clearly so that I don’t have to debug libfabric to figure it out…

 

First the memory failure.   I was running the 36 clients on one of the server nodes.  The libfabric function ofi_bufpool_grow() is coded to try the first time it needs memory to get a huge page.  If that succeeds, it will always use hugepages in subsequent calls.   If that fails, it will instead use regular memory in all future calls.   Note that if libfabric runs out of hugepages, it does NOT then switch and use regular memory in future calls.

 

In my case there were huge pages defined, but between the 36 clients they used up all the available hugepages not used by the servers, they still needed more, and so libfabric died with “Cannot allocate memory”.   Do you think that is acceptable, or should it have switched to using regular memory when the hugepages were gone?

 

In my daos_server.yml file the nr_hugepages parameter is now set to 8192.   This resulted in 16384 pages being allocated, because each SPDK run (2 daos_io_servers) allocated 8192 pages.   The comments in daos_server.yml are a little fuzzy, they could be read either way, did this behave correctly, should the total number of pages allocated be double the value of nr_hugepages?

 

Originally I had nr_hugepages set to 4096 in the daos_server.yml file.  Using sysctl I had set vm.nr_hugepages to 16384, and /proc/meminfo showed I had 16384 huge pages.   But when I started the daos server, SPDK released half of my huge pages, I was left with only 8192, or 4096 per server.   If I configure the machine for 16384, SPDK should not lower it, correct?   It should raise the number of huge pages if the config file says so, but it should never lower the value set for the machine if it is greater than what is set in nr_hugepages.

 

I had to do the math to figure out how many hugepages my clients needed, divide that by two, then add that to nr_hugepages in the yml file to ensure that SPDK would allocate enough pages both for the two servers and my clients.   That seems complicated.

 

Your thoughts would be appreciated,

 

Thanks, Kevan

 

From: <daos@daos.groups.io> on behalf of "Oganezov, Alexander A" <alexander.a.oganezov@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Friday, April 17, 2020 at 2:51 PM
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: Re: [daos] Need advice debugging "No route to host" failures

 

Hi Kevan,

 

In our scenarios/cases so far we do not see any error before “No route to host”, so it sounds like you might be hitting a different issue there.

Can you provide more information as to what your clients are and which params are being used?

 

Based on your description, yes, it does not look like FI_UNIVERSE_SIZE would make any difference there given the default size of 256 which you do not cross in your case.

 

Also, to verify, when you see those errors you also have the Xuezhaos patch, from below, correct?

 

As to allocation failures – we have not had any reports of that yet using latest daos (which includes xuezhaos patch).

 

Thanks,

~~Alex.

 

 

From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of Kevan Rehm
Sent: Friday, April 17, 2020 12:27 PM
To: daos@daos.groups.io
Subject: Re: [daos] Need advice debugging "No route to host" failures

 

Alex,

 

I see “No route to host” a lot, including right at this moment, with only 36 clients talking to 2 daos_io_servers.   Your FI_UNIVERSE_SIZE variable probably wouldn’t have an effect in a config this small, correct?

 

The way this always seems to happen is that some other libfabric error occurs first,  e.g. see below, which causes libfabric to close the connection with the server and remove the server’s address from the libfabric AV table.  Mercury doesn’t give up when that happens, it keeps retrying when the retry cannot possibly succeed because the client no longer has the server’s address.   

 

I have been meaning to track down this issue, but other issues keep getting in my way.   I know there’s a spot in libfabric where it shuts down the connection, but then returns FI_EAGAIN, and that is the beginning of at least one of the “No route to host” loops in Mercury.   I am chasing another one at the moment that looks a little different, see below

 

I think we need some code in Mercury to detect when the connection to the server has been shut down, and cancel the affected requests immediately, so that the logs don’t clutter with extraneous downstream errors.

 

This current failure is actually a memory allocation failure in libfabric.   You can see from the bolded messages that if Mercury would retry this request or any other request, it is guaranteed to get “No route to host”.  Does this look familiar to you?   If not, I will start inserting debug messages into libfabric.

 

Thanks for keeping us in the loop,

 

Kevan

 

libfabric:140355:verbs:core:ofi_check_ep_type():624<info> Supported: FI_EP_DGRAM

libfabric:140355:verbs:core:ofi_check_ep_type():624<info> Requested: FI_EP_MSG

libfabric:140355:core:mr:ofi_intercept_mmap():368<debug> intercepted mmap start (nil) len 4194304

libfabric:140355:core:core:ofi_bufpool_grow():87<debug> Allocation failed: Cannot allocate memory

libfabric:140355:verbs:domain:vrb_ep_close():383<info> EP 0x3ed0520 is being closed

libfabric:140355:ofi_rxm:ep_ctrl:rxm_cmap_del_handle():159<debug> marking connection handle: 0x3e18c80 for deletion

libfabric:140355:ofi_rxm:ep_ctrl:rxm_cmap_del_handle():162<debug> [CM] handle: 0x3e18c80 RXM_CMAP_IDLE -> RXM_CMAP_SHUTDOWN

04/17-14:02:59.70 delphi-004 DAOS[140355/140355] hg   ERR  # NA -- Error -- /delphi/krehm/daos/_build.external/mercury/src/na/na_ofi.c:3877

 # na_ofi_msg_send_unexpected(): fi_tsend() unexpected failed, rc: -12 (Cannot allocate memory)

04/17-14:02:59.70 delphi-004 DAOS[140355/140355] hg   ERR  # HG -- Error -- /delphi/krehm/daos/_build.external/mercury/src/mercury_core.c:1985

 # hg_core_forward_na(): Could not post send for input buffer (NA_PROTOCOL_ERROR)

04/17-14:02:59.70 delphi-004 DAOS[140355/140355] hg   DBUG # NA -- Debug -- /delphi/krehm/daos/_build.external/mercury/src/na/na_ofi.c:4626

 # na_ofi_cancel(): Canceling operation ID 0x1d92ca0

04/17-14:02:59.70 delphi-004 DAOS[140355/140355] hg   DBUG # NA -- Debug -- /delphi/krehm/daos/_build.external/mercury/src/na/na_ofi.c:4667

 # na_ofi_cancel(): fi_cancel() rc: 0 (Success)

04/17-14:02:59.70 delphi-004 DAOS[140355/140355] hg   ERR  # HG -- Error -- /delphi/krehm/daos/_build.external/mercury/src/mercury_core.c:4393

 # HG_Core_forward(): Could not forward buffer

04/17-14:02:59.70 delphi-004 DAOS[140355/140355] hg   ERR  # HG -- Error -- /delphi/krehm/daos/_build.external/mercury/src/mercury.c:1919

 # HG_Forward(): Could not forward call (HG_PROTOCOL_ERROR)

04/17-14:02:59.70 delphi-004 DAOS[140355/140355] hg   ERR  src/cart/src/cart/crt_hg.c:1208 crt_hg_req_send(0x3ed0040)  [opc=0x4010001 xid=0x5c rank:tag=0:6] HG_Forward failed, hg_ret: 11

04/17-14:02:59.70 delphi-004 DAOS[140355/140355] hg   DBUG src/cart/src/cart/crt_hg.c:1230 crt_hg_req_send(0x3ed0040)  [opc=0x4010001 xid=0x5c rank:tag=0:6] decref to 5.

04/17-14:02:59.70 delphi-004 DAOS[140355/140355] rpc  ERR  src/cart/src/cart/crt_rpc.c:1034 crt_req_send_immediately(0x3ed0040)  [opc=0x4010001 xid=0x5c rank:tag=0:6] crt_hg_req_send failed, rc: -1020

04/17-14:02:59.70 delphi-004 DAOS[140355/140355] rpc  ERR  src/cart/src/cart/crt_rpc.c:517 crt_req_hg_addr_lookup_cb(0x3ed0040)  [opc=0x4010001 xid=0x5c rank:tag=0:6] crt_req_send_internal() failed, rc -1020

04/17-14:02:59.70 delphi-004 DAOS[140355/140355] rpc  ERR  src/cart/src/cart/crt_context.c:302 crt_rpc_complete(0x3ed0040)  [opc=0x4010001 xid=0x5c rank:tag=0:6] RPC failed; rc: -1020

04/17-14:02:59.70 delphi-004 DAOS[140355/140355] object ERR  src/object/cli_shard.c:263 dc_rw_cb() RPC 1 failed: -1020

 

 

From: <daos@daos.groups.io> on behalf of "Oganezov, Alexander A" <alexander.a.oganezov@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Friday, April 17, 2020 at 10:44 AM
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: Re: [daos] Need advice debugging "No route to host" failures

 

Hi Kevan & others,

 

As we’ve been debugging few other “no route to host” failures here, one thing that also turns out needs to be set on large-scale systems is FI_UNIVERSE_SIZE envariable, which in RXM-based providers (verbs included) dictates internal size of address vector table for peers.

 

With Xuezhaos patch below (already part of latest 0.9 DAOS) and FI_UNIVERSE_SIZE=2048  setting we were able to avoid “No route to host” issues on all known failures so far.

 

This issue is also being tracked in libfabric as https://github.com/ofiwg/libfabric/issues/5843

 

Thanks,

~~Alex.

 

 

From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of Liu, Xuezhao
Sent: Thursday, April 9, 2020 7:24 AM
To: daos@daos.groups.io
Subject: Re: [daos] Need advice debugging "No route to host" failures

 

Hi Kevan,

 

Great, thanks for verifying and the helpful logs.

 

On the server side, what should be the proper behavior if a client tries to connect twice using the same IPoIB port?   Should the server reject the second connection attempt?   Close the first connection and then open the second?

Server should not reject the 2nd connection attempt, the current handling (with the patch) is it will check its local cache (a hash-table) and reuse it if possible.

 

Xuezhao

 

From: <daos@daos.groups.io> on behalf of Kevan Rehm <kevan.rehm@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Thursday, April 9, 2020 at 10:19 PM
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: Re: [daos] Need advice debugging "No route to host" failures

 

Xuezhao,

 

Yes, it works, thank you very much!    I have been trying to figure that one out for several days now! 😊

 

Kevan

 

From: <daos@daos.groups.io> on behalf of "Liu, Xuezhao" <xuezhao.liu@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Thursday, April 9, 2020 at 8:34 AM
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: Re: [daos] Need advice debugging "No route to host" failures

 

Hi Kevan,

 

The log you posted is very helpful. It helped to find a potential bug in mercury code for address management, please find attached small patch.

It needs to patch mercury source code and rebuild it. Commonly I do that by:

cd daos_m_path/_build.external/mercury; git am 0001-NA-OFI-fix-a-potential-bug-in-na_ofi_addr_ht_lookup.patch; cd ../ mercury.build; make clean; make install

 

Could you please have a test to see if it can resolve the problem?

 

Thanks,

Xuezhao

 

From: <daos@daos.groups.io> on behalf of Kevan Rehm <kevan.rehm@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Thursday, April 9, 2020 at 5:50 PM
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: [daos] Need advice debugging "No route to host" failures

 

Greetings,

 

We have been unsuccessful over the last few days debugging “No route to host” failures that we are constantly seeing when running “ior -a DAOS”.  The errors only occur in the log of the daos_io_server on IB1, never on the IB0 server.   Client processes hang when this occurs.  The problem does not appear to be related to our recent verbs;ofi_rxm issues which we think are now resolved.   Any advice you could provide would be greatly appreciated.

 

Our configuration is one dual-daos_io_server node and one client node.   The libfabric transport is verbs;ofi_rxm, both nodes have two interfaces.   The server node’s PMEM is wiped and reformatted before each test run.  We use mpirun to start 16 ior client processes on the client node.  The ior program has been slightly modified so that processes with even-numbered ranks use IB0 while odd-numbered ranks use IB1.   Here is an example command:

 

mpirun --allow-run-as-root -np 16 $X_OPTS /root/ior/ior/src/ior -a DAOS --daos.pool $POOL --daos.cont $CONT --daos.svcl $SVCL -w -r -t 64M -b 1G

 

When we reduce the number of client processes, the problem occurs less frequently.   With only two processes on the client node we haven’t seen the problem.

 

What happens is that some client processes which connect to the IB1 server actually connect twice using the same source IPoIB port number.   Those that do are guaranteed to generate “No route to host” errors in the IB1 server log and then hang.   For example, the following grep of the IB1 server log shows that the connecting processes that were assigned to libfabric fi_addrs 2 and 3 are going to fail.  If you would repeat this same grep on the IB0 server log, you would never see duplicate connections:

 

# grep '<debug> av_insert ' *1.log

libfabric:18490:ofi_rxm:av:ip_av_insert_addr():609<debug> av_insert addr: fi_sockaddr_in://10.0.0.220:20000

libfabric:18490:ofi_rxm:av:ip_av_insert_addr():612<debug> av_insert fi_addr: 0

libfabric:18490:ofi_rxm:av:ip_av_insert_addr():609<debug> av_insert addr: fi_sockaddr_in://10.0.0.222:43782

libfabric:18490:ofi_rxm:av:ip_av_insert_addr():612<debug> av_insert fi_addr: 1

libfabric:18490:ofi_rxm:av:ip_av_insert_addr():609<debug> av_insert addr: fi_sockaddr_in://10.0.0.222:37665

libfabric:18490:ofi_rxm:av:ip_av_insert_addr():612<debug> av_insert fi_addr: 2

libfabric:18490:ofi_rxm:av:ip_av_insert_addr():609<debug> av_insert addr: fi_sockaddr_in://10.0.0.222:37665

libfabric:18490:ofi_rxm:av:ip_av_insert_addr():612<debug> av_insert fi_addr: 2

libfabric:18490:ofi_rxm:av:ip_av_insert_addr():609<debug> av_insert addr: fi_sockaddr_in://10.0.1.222:48826

libfabric:18490:ofi_rxm:av:ip_av_insert_addr():612<debug> av_insert fi_addr: 3

libfabric:18490:ofi_rxm:av:ip_av_insert_addr():609<debug> av_insert addr: fi_sockaddr_in://10.0.1.222:48826

libfabric:18490:ofi_rxm:av:ip_av_insert_addr():612<debug> av_insert fi_addr: 3

libfabric:18490:ofi_rxm:av:ip_av_insert_addr():609<debug> av_insert addr: fi_sockaddr_in://10.0.0.222:39600

libfabric:18490:ofi_rxm:av:ip_av_insert_addr():612<debug> av_insert fi_addr: 4

libfabric:18490:ofi_rxm:av:ip_av_insert_addr():609<debug> av_insert addr: fi_sockaddr_in://10.0.0.222:56137

libfabric:18490:ofi_rxm:av:ip_av_insert_addr():612<debug> av_insert fi_addr: 5

 

Here are the initial failure messages for fi_addr 2 in the IB1 server log.   The messages with “krehm” in them were added by me for debugging purposes:

 

04/08-19:18:08.43 delphi-002 DAOS[18490/18530] hg   DBUG # NA -- Debug -- /root/daos/_build.external/mercury/src/na/na_ofi.c:4442

 # na_ofi_get(): Posting RMA get (op id=0x7fecab64eba0, fi_addr=2, uri=(null))

libfabric:18490:ofi_rxm:ep_ctrl:rxm_ep_prepare_tx():851<warn> krehm: rxm_cmap_acquire_handle() returned NULL, fi_addr: 2

libfabric:18490:ofi_rxm:ep_data:rxm_ep_rma_common():80<warn> krehm: rxm_ep_prepare_tx() returned -113, fi_addr: 2

04/08-19:18:08.43 delphi-002 DAOS[18490/18614] hg   ERR  # NA -- Error -- /root/daos/_build.external/mercury/src/na/na_ofi.c:4462

 # na_ofi_get(): fi_readmsg() failed, rc: -113 (No route to host)

04/08-19:18:08.43 delphi-002 DAOS[18490/18554] hg   DBUG # NA -- Debug -- /root/daos/_build.external/mercury/src/na/na_ofi.c:4442

 # na_ofi_get(): Posting RMA get (op id=0x7fec1b64e820, fi_addr=1, uri=(null))

libfabric:18490:core:mr:ofi_intercept_mmap():368<debug> intercepted mmap start (nil) len 4194304

04/08-19:18:08.43 delphi-002 DAOS[18490/18614] hg   ERR  # HG -- Error -- /root/daos/_build.external/mercury/src/mercury_bulk.c:793

 # hg_bulk_transfer_pieces(): Could not transfer data (NA_PROTOCOL_ERROR)

04/08-19:18:08.43 delphi-002 DAOS[18490/18614] hg   ERR  # HG -- Error -- /root/daos/_build.external/mercury/src/mercury_bulk.c:952

 # hg_bulk_transfer(): Could not transfer data pieces

04/08-19:18:08.43 delphi-002 DAOS[18490/18614] hg   ERR  # HG -- Error -- /root/daos/_build.external/mercury/src/mercury_bulk.c:1793

 # HG_Bulk_transfer_id(): Could not start transfer of bulk data

 

My suspicion is that the server’s cleanup of the first connection from fi_addr 2 throws away the libfabric address of that client, so then when the server tries to initiate a RMA Read request using the second client connection, it has no libfabric address for the client and fails.

 

The failure sequence for fi_addr 3 is identical:

 

04/08-19:18:08.43 delphi-002 DAOS[18490/18614] hg   DBUG # NA -- Debug -- /root/daos/_build.external/mercury/src/na/na_ofi.c:4442

 # na_ofi_get(): Posting RMA get (op id=0x7feab365f550, fi_addr=3, uri=(null))

libfabric:18490:ofi_rxm:ep_ctrl:rxm_ep_prepare_tx():851<warn> krehm: rxm_cmap_acquire_handle() returned NULL, fi_addr: 3

libfabric:18490:ofi_rxm:ep_data:rxm_ep_rma_common():80<warn> krehm: rxm_ep_prepare_tx() returned -113, fi_addr: 3

04/08-19:18:08.43 delphi-002 DAOS[18490/18554] hg   DBUG # NA -- Debug -- /root/daos/_build.external/mercury/src/na/na_ofi.c:2707

 # na_ofi_cq_process_recv_unexpected_event(): unexpected recv msg completion event with tag=140657172791930, len=4294967364

04/08-19:18:08.43 delphi-002 DAOS[18490/18578] object ERR  src/object/srv_obj.c:384 obj_bulk_transfer() crt_bulk_transfer 0 error (-1020).

04/08-19:18:08.43 delphi-002 DAOS[18490/18578] object ERR  src/object/srv_obj.c:1062 obj_local_rw() 1153784440846024704.5704104871060330004.13 data transfer failed, dma 1 rc DER_HG(-1020)

04/08-19:18:08.43 delphi-002 DAOS[18490/18614] hg   ERR  # NA -- Error -- /root/daos/_build.external/mercury/src/na/na_ofi.c:4462

 # na_ofi_get(): fi_readmsg() failed, rc: -113 (No route to host)

04/08-19:18:08.43 delphi-002 DAOS[18490/18614] hg   ERR  # HG -- Error -- /root/daos/_build.external/mercury/src/mercury_bulk.c:793

 # hg_bulk_transfer_pieces(): Could not transfer data (NA_PROTOCOL_ERROR)

04/08-19:18:08.43 delphi-002 DAOS[18490/18614] hg   ERR  # HG -- Error -- /root/daos/_build.external/mercury/src/mercury_bulk.c:952

 # hg_bulk_transfer(): Could not transfer data pieces

04/08-19:18:08.43 delphi-002 DAOS[18490/18614] hg   ERR  # HG -- Error -- /root/daos/_build.external/mercury/src/mercury_bulk.c:1793

 # HG_Bulk_transfer_id(): Could not start transfer of bulk data

 

Could someone suggest ways that we might narrow down the problem?

 

Thanks, Kevan

---------------------------------------------------------------------
Intel Corporation (UK) Limited
Registered No. 1134945 (England)
Registered Office: Pipers Way, Swindon SN3 1RJ
VAT No: 860 2173 47

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


Re: How to figure out the all the daos_io_server are talking?

Colin Ngam
 

Hi Tom,

 

So, I can repeat, stop and restart the servers:

 

DEBUG 19:32:47.524108 mgmt_system.go:147: updated system member: rank 2, addr 172.30.222.230:10001, Started->Started

ERROR: unexpected same state in rank 2 update (Started->Started)

DEBUG 19:32:47.609783 mgmt_system.go:147: updated system member: rank 3, addr 172.30.222.230:10001, Started->Started

 

However, if I shutdown the server, execute the “dmg -l delphi-006 system query –verbose” it knows that the servers have terminated (it does send ping out):

DEBUG 19:33:24.566669 ctl_system.go:173: Responding to SystemQuery RPC

DEBUG 19:34:59.186204 mgmt_system.go:147: updated system member: rank 4, addr 172.30.222.236:10001, Stopped->Started

DEBUG 19:34:59.271000 mgmt_system.go:147: updated system member: rank 5, addr 172.30.222.236:10001, Stopped->Started

 

May be some heartbeat etc.

 

Thanks.

 

Colin

 

From: <daos@daos.groups.io> on behalf of Colin Ngam <colin.ngam@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Friday, April 17, 2020 at 6:51 PM
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: Re: [daos] How to figure out the all the daos_io_server are talking?

 

It is not a 100% hit either ..

 

DEBUG 18:47:46.314720 ctl_system.go:173: Responding to SystemQuery RPC

DEBUG 18:49:02.859993 mgmt_system.go:147: updated system member: rank 4, addr 172.30.222.236:10001, Stopped->Started

DEBUG 18:49:02.945670 mgmt_system.go:147: updated system member: rank 5, addr 172.30.222.236:10001, Stopped->Started

 

From: <daos@daos.groups.io> on behalf of "Nabarro, Tom" <tom.nabarro@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Friday, April 17, 2020 at 5:23 PM
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: Re: [daos] How to figure out the all the daos_io_server are talking?

 

That’s unexpected, I will try to reproduce over the weekend.

 

From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of Colin Ngam
Sent: Friday, April 17, 2020 9:05 PM
To: daos@daos.groups.io
Subject: Re: [daos] How to figure out the all the daos_io_server are talking?

 

Hi Tom,

 

I have a host that is rank 4 and 5. I killed the servers Control-C and restarted the serves after maybe 3-5 minutes. The access-host did not see any server going away. After restart I see:

 

DEBUG 14:56:53.800937 mgmt_system.go:147: updated system member: rank 4, addr 172.30.222.236:10001, Started->Started

ERROR: unexpected same state in rank 4 update (Started->Started)

DEBUG 14:56:53.888180 mgmt_system.go:147: updated system member: rank 5, addr 172.30.222.236:10001, Started->Started

ERROR: unexpected same state in rank 5 update (Started->Started)

 

Is this good or bad?

 

Today’s master tree.

 

Thanks.

 

Colin

 

From: "Ngam, Colin" <colin.ngam@...>
Date: Wednesday, April 1, 2020 at 10:05 AM
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: Re: [daos] How to figure out the all the daos_io_server are talking?

 

Hi Tom,

 

Thanks Tom. I will see if I can replicate the issue.

 

Colin

 

From: <daos@daos.groups.io> on behalf of "Nabarro, Tom" <tom.nabarro@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Wednesday, April 1, 2020 at 8:14 AM
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: Re: [daos] How to figure out the all the daos_io_server are talking?

 

The superblock stores allocated ranks:

[tanabarr@wolf-71 daos_m]$ cat /mnt/daos0/superblock

version: 0

uuid: 8359d5d6-1d8f-430c-90db-205fc07113bf

system: daos_server

rank: 2

validrank: true

ms: false

createms: false

bootstrapms: false

 

On start-up, non-MS daos_server will supply the superblock rank in the join request to the system.

The iosrv/mgmt maintains rank:uuid mappings and should allow the rank to rejoin the system given the expected uuid is supplied along with the rank in the request.

 

I can’t recreate the behavior you are describing at the moment, if I stop then start a non-MS daos_server, they join with the same rank they were previously assigned. (I assume you are not reformatting)

 

if you can repeat that behavior, please send me the logs. We should be able to get some relevant information from the mgmt module in the server log from the function ds_mgmt_join_handler().

 

Regards,

Tom Nabarro – DCG/ESAD

M: +44 (0)7786 260986

Skype: tom.nabarro

 

From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of Colin Ngam
Sent: Tuesday, March 31, 2020 4:17 PM
To: daos@daos.groups.io
Subject: Re: [daos] How to figure out the all the daos_io_server are talking?

 

Hi,

 

I also noticed the following:

 

My configuration file does not have any “rank” specified. 2 daos_io_server per host. They came up and I see them as Rank 0 .. 3. All fine. I killed Rank 2 and 3 (same host) Control-C, restart daos_server, they came up registering as Rank 4 and 5. Is Rank persistent? If it is, what’s the association between Rank Number and daos_io_server? Host name IB GUID etc?

 

I will try to replicate and send trace when the opportunity arise here.

 

Thanks.

 

Colin

 

From: <daos@daos.groups.io> on behalf of "Nabarro, Tom" <tom.nabarro@...>
Reply-To: "
daos@daos.groups.io" <daos@daos.groups.io>
Date: Tuesday, March 31, 2020 at 10:07 AM
To: "
daos@daos.groups.io" <daos@daos.groups.io>
Subject: Re: [daos] How to figure out the all the daos_io_server are talking?

 

Thanks for the positive feedback, much appreciated.

 

We are looking to add information such as swim activity and other metrics to the dmg system query capability.

 

Regards,

Tom Nabarro – DCG/ESAD

M: +44 (0)7786 260986

Skype: tom.nabarro

 

From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of Colin Ngam
Sent: Tuesday, March 31, 2020 4:00 PM
To:
daos@daos.groups.io
Subject: Re: [daos] How to figure out the all the daos_io_server are talking?

 

Excellent!

 

I have seen messages that shows correct daos_io_server joined but 1 of the daos_io_server has no swim messages. I assume that if you set DEBUG mode they will all have swim messages flowing .. I will check nice time around.

 

The dmg command has feature grown nicely.

 

Thanks Tom.

 

Colin

 

From: <daos@daos.groups.io> on behalf of "Nabarro, Tom" <tom.nabarro@...>
Reply-To: "
daos@daos.groups.io" <daos@daos.groups.io>
Date: Tuesday, March 31, 2020 at 9:41 AM
To: "
daos@daos.groups.io" <daos@daos.groups.io>
Subject: Re: [daos] How to figure out the all the daos_io_server are talking?

 

“dmg -l <access_point_host> system query --verbose” will give you details of ranks that have previously joined the system including current state. This is an area of active development.

 

https://daos-stack.github.io/admin/administration/ “system operations“ section

 

Regards,

Tom Nabarro – DCG/ESAD

M: +44 (0)7786 260986

Skype: tom.nabarro

 

From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of Colin Ngam
Sent: Tuesday, March 31, 2020 12:32 AM
To:
daos@daos.groups.io
Subject: [daos] How to figure out the all the daos_io_server are talking?

 

Hi,

 

The daos_io_server say is it up and may not be  communicating with others. I set the DEBUG flag and watch for swim messages.

 

Is there a way to figure out that all the daos_io_server are healthy and functional?

 

Thanks.

 

Colin

---------------------------------------------------------------------
Intel Corporation (UK) Limited
Registered No. 1134945 (England)
Registered Office: Pipers Way, Swindon SN3 1RJ
VAT No: 860 2173 47

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

---------------------------------------------------------------------
Intel Corporation (UK) Limited
Registered No. 1134945 (England)
Registered Office: Pipers Way, Swindon SN3 1RJ
VAT No: 860 2173 47

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

---------------------------------------------------------------------
Intel Corporation (UK) Limited
Registered No. 1134945 (England)
Registered Office: Pipers Way, Swindon SN3 1RJ
VAT No: 860 2173 47

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

 

---------------------------------------------------------------------
Intel Corporation (UK) Limited
Registered No. 1134945 (England)
Registered Office: Pipers Way, Swindon SN3 1RJ
VAT No: 860 2173 47

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


Re: Need advice debugging "No route to host" failures

Hefty, Sean <sean.hefty@...>
 

Interesting finds. I will let Sean comment on expected OFI behavior in case of running
out of hugepages, but to me it sounds like it should have switched and used regular
memory in future calls.
Yes, this sounds ideal. I'll look into this.

- Sean


Re: How to figure out the all the daos_io_server are talking?

Colin Ngam
 

It is not a 100% hit either ..

 

DEBUG 18:47:46.314720 ctl_system.go:173: Responding to SystemQuery RPC

DEBUG 18:49:02.859993 mgmt_system.go:147: updated system member: rank 4, addr 172.30.222.236:10001, Stopped->Started

DEBUG 18:49:02.945670 mgmt_system.go:147: updated system member: rank 5, addr 172.30.222.236:10001, Stopped->Started

 

From: <daos@daos.groups.io> on behalf of "Nabarro, Tom" <tom.nabarro@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Friday, April 17, 2020 at 5:23 PM
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: Re: [daos] How to figure out the all the daos_io_server are talking?

 

That’s unexpected, I will try to reproduce over the weekend.

 

From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of Colin Ngam
Sent: Friday, April 17, 2020 9:05 PM
To: daos@daos.groups.io
Subject: Re: [daos] How to figure out the all the daos_io_server are talking?

 

Hi Tom,

 

I have a host that is rank 4 and 5. I killed the servers Control-C and restarted the serves after maybe 3-5 minutes. The access-host did not see any server going away. After restart I see:

 

DEBUG 14:56:53.800937 mgmt_system.go:147: updated system member: rank 4, addr 172.30.222.236:10001, Started->Started

ERROR: unexpected same state in rank 4 update (Started->Started)

DEBUG 14:56:53.888180 mgmt_system.go:147: updated system member: rank 5, addr 172.30.222.236:10001, Started->Started

ERROR: unexpected same state in rank 5 update (Started->Started)

 

Is this good or bad?

 

Today’s master tree.

 

Thanks.

 

Colin

 

From: "Ngam, Colin" <colin.ngam@...>
Date: Wednesday, April 1, 2020 at 10:05 AM
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: Re: [daos] How to figure out the all the daos_io_server are talking?

 

Hi Tom,

 

Thanks Tom. I will see if I can replicate the issue.

 

Colin

 

From: <daos@daos.groups.io> on behalf of "Nabarro, Tom" <tom.nabarro@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Wednesday, April 1, 2020 at 8:14 AM
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: Re: [daos] How to figure out the all the daos_io_server are talking?

 

The superblock stores allocated ranks:

[tanabarr@wolf-71 daos_m]$ cat /mnt/daos0/superblock

version: 0

uuid: 8359d5d6-1d8f-430c-90db-205fc07113bf

system: daos_server

rank: 2

validrank: true

ms: false

createms: false

bootstrapms: false

 

On start-up, non-MS daos_server will supply the superblock rank in the join request to the system.

The iosrv/mgmt maintains rank:uuid mappings and should allow the rank to rejoin the system given the expected uuid is supplied along with the rank in the request.

 

I can’t recreate the behavior you are describing at the moment, if I stop then start a non-MS daos_server, they join with the same rank they were previously assigned. (I assume you are not reformatting)

 

if you can repeat that behavior, please send me the logs. We should be able to get some relevant information from the mgmt module in the server log from the function ds_mgmt_join_handler().

 

Regards,

Tom Nabarro – DCG/ESAD

M: +44 (0)7786 260986

Skype: tom.nabarro

 

From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of Colin Ngam
Sent: Tuesday, March 31, 2020 4:17 PM
To: daos@daos.groups.io
Subject: Re: [daos] How to figure out the all the daos_io_server are talking?

 

Hi,

 

I also noticed the following:

 

My configuration file does not have any “rank” specified. 2 daos_io_server per host. They came up and I see them as Rank 0 .. 3. All fine. I killed Rank 2 and 3 (same host) Control-C, restart daos_server, they came up registering as Rank 4 and 5. Is Rank persistent? If it is, what’s the association between Rank Number and daos_io_server? Host name IB GUID etc?

 

I will try to replicate and send trace when the opportunity arise here.

 

Thanks.

 

Colin

 

From: <daos@daos.groups.io> on behalf of "Nabarro, Tom" <tom.nabarro@...>
Reply-To: "
daos@daos.groups.io" <daos@daos.groups.io>
Date: Tuesday, March 31, 2020 at 10:07 AM
To: "
daos@daos.groups.io" <daos@daos.groups.io>
Subject: Re: [daos] How to figure out the all the daos_io_server are talking?

 

Thanks for the positive feedback, much appreciated.

 

We are looking to add information such as swim activity and other metrics to the dmg system query capability.

 

Regards,

Tom Nabarro – DCG/ESAD

M: +44 (0)7786 260986

Skype: tom.nabarro

 

From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of Colin Ngam
Sent: Tuesday, March 31, 2020 4:00 PM
To:
daos@daos.groups.io
Subject: Re: [daos] How to figure out the all the daos_io_server are talking?

 

Excellent!

 

I have seen messages that shows correct daos_io_server joined but 1 of the daos_io_server has no swim messages. I assume that if you set DEBUG mode they will all have swim messages flowing .. I will check nice time around.

 

The dmg command has feature grown nicely.

 

Thanks Tom.

 

Colin

 

From: <daos@daos.groups.io> on behalf of "Nabarro, Tom" <tom.nabarro@...>
Reply-To: "
daos@daos.groups.io" <daos@daos.groups.io>
Date: Tuesday, March 31, 2020 at 9:41 AM
To: "
daos@daos.groups.io" <daos@daos.groups.io>
Subject: Re: [daos] How to figure out the all the daos_io_server are talking?

 

“dmg -l <access_point_host> system query --verbose” will give you details of ranks that have previously joined the system including current state. This is an area of active development.

 

https://daos-stack.github.io/admin/administration/ “system operations“ section

 

Regards,

Tom Nabarro – DCG/ESAD

M: +44 (0)7786 260986

Skype: tom.nabarro

 

From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of Colin Ngam
Sent: Tuesday, March 31, 2020 12:32 AM
To:
daos@daos.groups.io
Subject: [daos] How to figure out the all the daos_io_server are talking?

 

Hi,

 

The daos_io_server say is it up and may not be  communicating with others. I set the DEBUG flag and watch for swim messages.

 

Is there a way to figure out that all the daos_io_server are healthy and functional?

 

Thanks.

 

Colin

---------------------------------------------------------------------
Intel Corporation (UK) Limited
Registered No. 1134945 (England)
Registered Office: Pipers Way, Swindon SN3 1RJ
VAT No: 860 2173 47

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

---------------------------------------------------------------------
Intel Corporation (UK) Limited
Registered No. 1134945 (England)
Registered Office: Pipers Way, Swindon SN3 1RJ
VAT No: 860 2173 47

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

---------------------------------------------------------------------
Intel Corporation (UK) Limited
Registered No. 1134945 (England)
Registered Office: Pipers Way, Swindon SN3 1RJ
VAT No: 860 2173 47

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

 

---------------------------------------------------------------------
Intel Corporation (UK) Limited
Registered No. 1134945 (England)
Registered Office: Pipers Way, Swindon SN3 1RJ
VAT No: 860 2173 47

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


Re: Need advice debugging "No route to host" failures

Oganezov, Alexander A
 

Hi Kevan,

 

Interesting finds. I will let Sean comment on expected OFI behavior in case of running out of hugepages, but to me it sounds like it should have switched and used regular memory in future calls.

 

As for nr_hugepages param I would need to defer to some other folks who are familiar with SPDK and daoses behavior in this case.

 

 

Thanks,

~~Alex.

 

From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of Kevan Rehm
Sent: Friday, April 17, 2020 2:25 PM
To: daos@daos.groups.io
Subject: Re: [daos] Need advice debugging "No route to host" failures

 

Alex,

 

Okay, I figured out what was happening here.   I have a couple of questions; I wonder if you think this is mostly a configuration error, or a bug.   Or at least maybe there is a way to make the problem get reported more clearly so that I don’t have to debug libfabric to figure it out…

 

First the memory failure.   I was running the 36 clients on one of the server nodes.  The libfabric function ofi_bufpool_grow() is coded to try the first time it needs memory to get a huge page.  If that succeeds, it will always use hugepages in subsequent calls.   If that fails, it will instead use regular memory in all future calls.   Note that if libfabric runs out of hugepages, it does NOT then switch and use regular memory in future calls.

 

In my case there were huge pages defined, but between the 36 clients they used up all the available hugepages not used by the servers, they still needed more, and so libfabric died with “Cannot allocate memory”.   Do you think that is acceptable, or should it have switched to using regular memory when the hugepages were gone?

 

In my daos_server.yml file the nr_hugepages parameter is now set to 8192.   This resulted in 16384 pages being allocated, because each SPDK run (2 daos_io_servers) allocated 8192 pages.   The comments in daos_server.yml are a little fuzzy, they could be read either way, did this behave correctly, should the total number of pages allocated be double the value of nr_hugepages?

 

Originally I had nr_hugepages set to 4096 in the daos_server.yml file.  Using sysctl I had set vm.nr_hugepages to 16384, and /proc/meminfo showed I had 16384 huge pages.   But when I started the daos server, SPDK released half of my huge pages, I was left with only 8192, or 4096 per server.   If I configure the machine for 16384, SPDK should not lower it, correct?   It should raise the number of huge pages if the config file says so, but it should never lower the value set for the machine if it is greater than what is set in nr_hugepages.

 

I had to do the math to figure out how many hugepages my clients needed, divide that by two, then add that to nr_hugepages in the yml file to ensure that SPDK would allocate enough pages both for the two servers and my clients.   That seems complicated.

 

Your thoughts would be appreciated,

 

Thanks, Kevan

 

From: <daos@daos.groups.io> on behalf of "Oganezov, Alexander A" <alexander.a.oganezov@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Friday, April 17, 2020 at 2:51 PM
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: Re: [daos] Need advice debugging "No route to host" failures

 

Hi Kevan,

 

In our scenarios/cases so far we do not see any error before “No route to host”, so it sounds like you might be hitting a different issue there.

Can you provide more information as to what your clients are and which params are being used?

 

Based on your description, yes, it does not look like FI_UNIVERSE_SIZE would make any difference there given the default size of 256 which you do not cross in your case.

 

Also, to verify, when you see those errors you also have the Xuezhaos patch, from below, correct?

 

As to allocation failures – we have not had any reports of that yet using latest daos (which includes xuezhaos patch).

 

Thanks,

~~Alex.

 

 

From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of Kevan Rehm
Sent: Friday, April 17, 2020 12:27 PM
To: daos@daos.groups.io
Subject: Re: [daos] Need advice debugging "No route to host" failures

 

Alex,

 

I see “No route to host” a lot, including right at this moment, with only 36 clients talking to 2 daos_io_servers.   Your FI_UNIVERSE_SIZE variable probably wouldn’t have an effect in a config this small, correct?

 

The way this always seems to happen is that some other libfabric error occurs first,  e.g. see below, which causes libfabric to close the connection with the server and remove the server’s address from the libfabric AV table.  Mercury doesn’t give up when that happens, it keeps retrying when the retry cannot possibly succeed because the client no longer has the server’s address.   

 

I have been meaning to track down this issue, but other issues keep getting in my way.   I know there’s a spot in libfabric where it shuts down the connection, but then returns FI_EAGAIN, and that is the beginning of at least one of the “No route to host” loops in Mercury.   I am chasing another one at the moment that looks a little different, see below

 

I think we need some code in Mercury to detect when the connection to the server has been shut down, and cancel the affected requests immediately, so that the logs don’t clutter with extraneous downstream errors.

 

This current failure is actually a memory allocation failure in libfabric.   You can see from the bolded messages that if Mercury would retry this request or any other request, it is guaranteed to get “No route to host”.  Does this look familiar to you?   If not, I will start inserting debug messages into libfabric.

 

Thanks for keeping us in the loop,

 

Kevan

 

libfabric:140355:verbs:core:ofi_check_ep_type():624<info> Supported: FI_EP_DGRAM

libfabric:140355:verbs:core:ofi_check_ep_type():624<info> Requested: FI_EP_MSG

libfabric:140355:core:mr:ofi_intercept_mmap():368<debug> intercepted mmap start (nil) len 4194304

libfabric:140355:core:core:ofi_bufpool_grow():87<debug> Allocation failed: Cannot allocate memory

libfabric:140355:verbs:domain:vrb_ep_close():383<info> EP 0x3ed0520 is being closed

libfabric:140355:ofi_rxm:ep_ctrl:rxm_cmap_del_handle():159<debug> marking connection handle: 0x3e18c80 for deletion

libfabric:140355:ofi_rxm:ep_ctrl:rxm_cmap_del_handle():162<debug> [CM] handle: 0x3e18c80 RXM_CMAP_IDLE -> RXM_CMAP_SHUTDOWN

04/17-14:02:59.70 delphi-004 DAOS[140355/140355] hg   ERR  # NA -- Error -- /delphi/krehm/daos/_build.external/mercury/src/na/na_ofi.c:3877

 # na_ofi_msg_send_unexpected(): fi_tsend() unexpected failed, rc: -12 (Cannot allocate memory)

04/17-14:02:59.70 delphi-004 DAOS[140355/140355] hg   ERR  # HG -- Error -- /delphi/krehm/daos/_build.external/mercury/src/mercury_core.c:1985

 # hg_core_forward_na(): Could not post send for input buffer (NA_PROTOCOL_ERROR)

04/17-14:02:59.70 delphi-004 DAOS[140355/140355] hg   DBUG # NA -- Debug -- /delphi/krehm/daos/_build.external/mercury/src/na/na_ofi.c:4626

 # na_ofi_cancel(): Canceling operation ID 0x1d92ca0

04/17-14:02:59.70 delphi-004 DAOS[140355/140355] hg   DBUG # NA -- Debug -- /delphi/krehm/daos/_build.external/mercury/src/na/na_ofi.c:4667

 # na_ofi_cancel(): fi_cancel() rc: 0 (Success)

04/17-14:02:59.70 delphi-004 DAOS[140355/140355] hg   ERR  # HG -- Error -- /delphi/krehm/daos/_build.external/mercury/src/mercury_core.c:4393

 # HG_Core_forward(): Could not forward buffer

04/17-14:02:59.70 delphi-004 DAOS[140355/140355] hg   ERR  # HG -- Error -- /delphi/krehm/daos/_build.external/mercury/src/mercury.c:1919

 # HG_Forward(): Could not forward call (HG_PROTOCOL_ERROR)

04/17-14:02:59.70 delphi-004 DAOS[140355/140355] hg   ERR  src/cart/src/cart/crt_hg.c:1208 crt_hg_req_send(0x3ed0040)  [opc=0x4010001 xid=0x5c rank:tag=0:6] HG_Forward failed, hg_ret: 11

04/17-14:02:59.70 delphi-004 DAOS[140355/140355] hg   DBUG src/cart/src/cart/crt_hg.c:1230 crt_hg_req_send(0x3ed0040)  [opc=0x4010001 xid=0x5c rank:tag=0:6] decref to 5.

04/17-14:02:59.70 delphi-004 DAOS[140355/140355] rpc  ERR  src/cart/src/cart/crt_rpc.c:1034 crt_req_send_immediately(0x3ed0040)  [opc=0x4010001 xid=0x5c rank:tag=0:6] crt_hg_req_send failed, rc: -1020

04/17-14:02:59.70 delphi-004 DAOS[140355/140355] rpc  ERR  src/cart/src/cart/crt_rpc.c:517 crt_req_hg_addr_lookup_cb(0x3ed0040)  [opc=0x4010001 xid=0x5c rank:tag=0:6] crt_req_send_internal() failed, rc -1020

04/17-14:02:59.70 delphi-004 DAOS[140355/140355] rpc  ERR  src/cart/src/cart/crt_context.c:302 crt_rpc_complete(0x3ed0040)  [opc=0x4010001 xid=0x5c rank:tag=0:6] RPC failed; rc: -1020

04/17-14:02:59.70 delphi-004 DAOS[140355/140355] object ERR  src/object/cli_shard.c:263 dc_rw_cb() RPC 1 failed: -1020

 

 

From: <daos@daos.groups.io> on behalf of "Oganezov, Alexander A" <alexander.a.oganezov@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Friday, April 17, 2020 at 10:44 AM
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: Re: [daos] Need advice debugging "No route to host" failures

 

Hi Kevan & others,

 

As we’ve been debugging few other “no route to host” failures here, one thing that also turns out needs to be set on large-scale systems is FI_UNIVERSE_SIZE envariable, which in RXM-based providers (verbs included) dictates internal size of address vector table for peers.

 

With Xuezhaos patch below (already part of latest 0.9 DAOS) and FI_UNIVERSE_SIZE=2048  setting we were able to avoid “No route to host” issues on all known failures so far.

 

This issue is also being tracked in libfabric as https://github.com/ofiwg/libfabric/issues/5843

 

Thanks,

~~Alex.

 

 

From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of Liu, Xuezhao
Sent: Thursday, April 9, 2020 7:24 AM
To: daos@daos.groups.io
Subject: Re: [daos] Need advice debugging "No route to host" failures

 

Hi Kevan,

 

Great, thanks for verifying and the helpful logs.

 

On the server side, what should be the proper behavior if a client tries to connect twice using the same IPoIB port?   Should the server reject the second connection attempt?   Close the first connection and then open the second?

Server should not reject the 2nd connection attempt, the current handling (with the patch) is it will check its local cache (a hash-table) and reuse it if possible.

 

Xuezhao

 

From: <daos@daos.groups.io> on behalf of Kevan Rehm <kevan.rehm@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Thursday, April 9, 2020 at 10:19 PM
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: Re: [daos] Need advice debugging "No route to host" failures

 

Xuezhao,

 

Yes, it works, thank you very much!    I have been trying to figure that one out for several days now! 😊

 

Kevan

 

From: <daos@daos.groups.io> on behalf of "Liu, Xuezhao" <xuezhao.liu@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Thursday, April 9, 2020 at 8:34 AM
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: Re: [daos] Need advice debugging "No route to host" failures

 

Hi Kevan,

 

The log you posted is very helpful. It helped to find a potential bug in mercury code for address management, please find attached small patch.

It needs to patch mercury source code and rebuild it. Commonly I do that by:

cd daos_m_path/_build.external/mercury; git am 0001-NA-OFI-fix-a-potential-bug-in-na_ofi_addr_ht_lookup.patch; cd ../ mercury.build; make clean; make install

 

Could you please have a test to see if it can resolve the problem?

 

Thanks,

Xuezhao

 

From: <daos@daos.groups.io> on behalf of Kevan Rehm <kevan.rehm@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Thursday, April 9, 2020 at 5:50 PM
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: [daos] Need advice debugging "No route to host" failures

 

Greetings,

 

We have been unsuccessful over the last few days debugging “No route to host” failures that we are constantly seeing when running “ior -a DAOS”.  The errors only occur in the log of the daos_io_server on IB1, never on the IB0 server.   Client processes hang when this occurs.  The problem does not appear to be related to our recent verbs;ofi_rxm issues which we think are now resolved.   Any advice you could provide would be greatly appreciated.

 

Our configuration is one dual-daos_io_server node and one client node.   The libfabric transport is verbs;ofi_rxm, both nodes have two interfaces.   The server node’s PMEM is wiped and reformatted before each test run.  We use mpirun to start 16 ior client processes on the client node.  The ior program has been slightly modified so that processes with even-numbered ranks use IB0 while odd-numbered ranks use IB1.   Here is an example command:

 

mpirun --allow-run-as-root -np 16 $X_OPTS /root/ior/ior/src/ior -a DAOS --daos.pool $POOL --daos.cont $CONT --daos.svcl $SVCL -w -r -t 64M -b 1G

 

When we reduce the number of client processes, the problem occurs less frequently.   With only two processes on the client node we haven’t seen the problem.

 

What happens is that some client processes which connect to the IB1 server actually connect twice using the same source IPoIB port number.   Those that do are guaranteed to generate “No route to host” errors in the IB1 server log and then hang.   For example, the following grep of the IB1 server log shows that the connecting processes that were assigned to libfabric fi_addrs 2 and 3 are going to fail.  If you would repeat this same grep on the IB0 server log, you would never see duplicate connections:

 

# grep '<debug> av_insert ' *1.log

libfabric:18490:ofi_rxm:av:ip_av_insert_addr():609<debug> av_insert addr: fi_sockaddr_in://10.0.0.220:20000

libfabric:18490:ofi_rxm:av:ip_av_insert_addr():612<debug> av_insert fi_addr: 0

libfabric:18490:ofi_rxm:av:ip_av_insert_addr():609<debug> av_insert addr: fi_sockaddr_in://10.0.0.222:43782

libfabric:18490:ofi_rxm:av:ip_av_insert_addr():612<debug> av_insert fi_addr: 1

libfabric:18490:ofi_rxm:av:ip_av_insert_addr():609<debug> av_insert addr: fi_sockaddr_in://10.0.0.222:37665

libfabric:18490:ofi_rxm:av:ip_av_insert_addr():612<debug> av_insert fi_addr: 2

libfabric:18490:ofi_rxm:av:ip_av_insert_addr():609<debug> av_insert addr: fi_sockaddr_in://10.0.0.222:37665

libfabric:18490:ofi_rxm:av:ip_av_insert_addr():612<debug> av_insert fi_addr: 2

libfabric:18490:ofi_rxm:av:ip_av_insert_addr():609<debug> av_insert addr: fi_sockaddr_in://10.0.1.222:48826

libfabric:18490:ofi_rxm:av:ip_av_insert_addr():612<debug> av_insert fi_addr: 3

libfabric:18490:ofi_rxm:av:ip_av_insert_addr():609<debug> av_insert addr: fi_sockaddr_in://10.0.1.222:48826

libfabric:18490:ofi_rxm:av:ip_av_insert_addr():612<debug> av_insert fi_addr: 3

libfabric:18490:ofi_rxm:av:ip_av_insert_addr():609<debug> av_insert addr: fi_sockaddr_in://10.0.0.222:39600

libfabric:18490:ofi_rxm:av:ip_av_insert_addr():612<debug> av_insert fi_addr: 4

libfabric:18490:ofi_rxm:av:ip_av_insert_addr():609<debug> av_insert addr: fi_sockaddr_in://10.0.0.222:56137

libfabric:18490:ofi_rxm:av:ip_av_insert_addr():612<debug> av_insert fi_addr: 5

 

Here are the initial failure messages for fi_addr 2 in the IB1 server log.   The messages with “krehm” in them were added by me for debugging purposes:

 

04/08-19:18:08.43 delphi-002 DAOS[18490/18530] hg   DBUG # NA -- Debug -- /root/daos/_build.external/mercury/src/na/na_ofi.c:4442

 # na_ofi_get(): Posting RMA get (op id=0x7fecab64eba0, fi_addr=2, uri=(null))

libfabric:18490:ofi_rxm:ep_ctrl:rxm_ep_prepare_tx():851<warn> krehm: rxm_cmap_acquire_handle() returned NULL, fi_addr: 2

libfabric:18490:ofi_rxm:ep_data:rxm_ep_rma_common():80<warn> krehm: rxm_ep_prepare_tx() returned -113, fi_addr: 2

04/08-19:18:08.43 delphi-002 DAOS[18490/18614] hg   ERR  # NA -- Error -- /root/daos/_build.external/mercury/src/na/na_ofi.c:4462

 # na_ofi_get(): fi_readmsg() failed, rc: -113 (No route to host)

04/08-19:18:08.43 delphi-002 DAOS[18490/18554] hg   DBUG # NA -- Debug -- /root/daos/_build.external/mercury/src/na/na_ofi.c:4442

 # na_ofi_get(): Posting RMA get (op id=0x7fec1b64e820, fi_addr=1, uri=(null))

libfabric:18490:core:mr:ofi_intercept_mmap():368<debug> intercepted mmap start (nil) len 4194304

04/08-19:18:08.43 delphi-002 DAOS[18490/18614] hg   ERR  # HG -- Error -- /root/daos/_build.external/mercury/src/mercury_bulk.c:793

 # hg_bulk_transfer_pieces(): Could not transfer data (NA_PROTOCOL_ERROR)

04/08-19:18:08.43 delphi-002 DAOS[18490/18614] hg   ERR  # HG -- Error -- /root/daos/_build.external/mercury/src/mercury_bulk.c:952

 # hg_bulk_transfer(): Could not transfer data pieces

04/08-19:18:08.43 delphi-002 DAOS[18490/18614] hg   ERR  # HG -- Error -- /root/daos/_build.external/mercury/src/mercury_bulk.c:1793

 # HG_Bulk_transfer_id(): Could not start transfer of bulk data

 

My suspicion is that the server’s cleanup of the first connection from fi_addr 2 throws away the libfabric address of that client, so then when the server tries to initiate a RMA Read request using the second client connection, it has no libfabric address for the client and fails.

 

The failure sequence for fi_addr 3 is identical:

 

04/08-19:18:08.43 delphi-002 DAOS[18490/18614] hg   DBUG # NA -- Debug -- /root/daos/_build.external/mercury/src/na/na_ofi.c:4442

 # na_ofi_get(): Posting RMA get (op id=0x7feab365f550, fi_addr=3, uri=(null))

libfabric:18490:ofi_rxm:ep_ctrl:rxm_ep_prepare_tx():851<warn> krehm: rxm_cmap_acquire_handle() returned NULL, fi_addr: 3

libfabric:18490:ofi_rxm:ep_data:rxm_ep_rma_common():80<warn> krehm: rxm_ep_prepare_tx() returned -113, fi_addr: 3

04/08-19:18:08.43 delphi-002 DAOS[18490/18554] hg   DBUG # NA -- Debug -- /root/daos/_build.external/mercury/src/na/na_ofi.c:2707

 # na_ofi_cq_process_recv_unexpected_event(): unexpected recv msg completion event with tag=140657172791930, len=4294967364

04/08-19:18:08.43 delphi-002 DAOS[18490/18578] object ERR  src/object/srv_obj.c:384 obj_bulk_transfer() crt_bulk_transfer 0 error (-1020).

04/08-19:18:08.43 delphi-002 DAOS[18490/18578] object ERR  src/object/srv_obj.c:1062 obj_local_rw() 1153784440846024704.5704104871060330004.13 data transfer failed, dma 1 rc DER_HG(-1020)

04/08-19:18:08.43 delphi-002 DAOS[18490/18614] hg   ERR  # NA -- Error -- /root/daos/_build.external/mercury/src/na/na_ofi.c:4462

 # na_ofi_get(): fi_readmsg() failed, rc: -113 (No route to host)

04/08-19:18:08.43 delphi-002 DAOS[18490/18614] hg   ERR  # HG -- Error -- /root/daos/_build.external/mercury/src/mercury_bulk.c:793

 # hg_bulk_transfer_pieces(): Could not transfer data (NA_PROTOCOL_ERROR)

04/08-19:18:08.43 delphi-002 DAOS[18490/18614] hg   ERR  # HG -- Error -- /root/daos/_build.external/mercury/src/mercury_bulk.c:952

 # hg_bulk_transfer(): Could not transfer data pieces

04/08-19:18:08.43 delphi-002 DAOS[18490/18614] hg   ERR  # HG -- Error -- /root/daos/_build.external/mercury/src/mercury_bulk.c:1793

 # HG_Bulk_transfer_id(): Could not start transfer of bulk data

 

Could someone suggest ways that we might narrow down the problem?

 

Thanks, Kevan


Re: How to figure out the all the daos_io_server are talking?

Nabarro, Tom
 

That’s unexpected, I will try to reproduce over the weekend.

 

From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of Colin Ngam
Sent: Friday, April 17, 2020 9:05 PM
To: daos@daos.groups.io
Subject: Re: [daos] How to figure out the all the daos_io_server are talking?

 

Hi Tom,

 

I have a host that is rank 4 and 5. I killed the servers Control-C and restarted the serves after maybe 3-5 minutes. The access-host did not see any server going away. After restart I see:

 

DEBUG 14:56:53.800937 mgmt_system.go:147: updated system member: rank 4, addr 172.30.222.236:10001, Started->Started

ERROR: unexpected same state in rank 4 update (Started->Started)

DEBUG 14:56:53.888180 mgmt_system.go:147: updated system member: rank 5, addr 172.30.222.236:10001, Started->Started

ERROR: unexpected same state in rank 5 update (Started->Started)

 

Is this good or bad?

 

Today’s master tree.

 

Thanks.

 

Colin

 

From: "Ngam, Colin" <colin.ngam@...>
Date: Wednesday, April 1, 2020 at 10:05 AM
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: Re: [daos] How to figure out the all the daos_io_server are talking?

 

Hi Tom,

 

Thanks Tom. I will see if I can replicate the issue.

 

Colin

 

From: <daos@daos.groups.io> on behalf of "Nabarro, Tom" <tom.nabarro@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Wednesday, April 1, 2020 at 8:14 AM
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: Re: [daos] How to figure out the all the daos_io_server are talking?

 

The superblock stores allocated ranks:

[tanabarr@wolf-71 daos_m]$ cat /mnt/daos0/superblock

version: 0

uuid: 8359d5d6-1d8f-430c-90db-205fc07113bf

system: daos_server

rank: 2

validrank: true

ms: false

createms: false

bootstrapms: false

 

On start-up, non-MS daos_server will supply the superblock rank in the join request to the system.

The iosrv/mgmt maintains rank:uuid mappings and should allow the rank to rejoin the system given the expected uuid is supplied along with the rank in the request.

 

I can’t recreate the behavior you are describing at the moment, if I stop then start a non-MS daos_server, they join with the same rank they were previously assigned. (I assume you are not reformatting)

 

if you can repeat that behavior, please send me the logs. We should be able to get some relevant information from the mgmt module in the server log from the function ds_mgmt_join_handler().

 

Regards,

Tom Nabarro – DCG/ESAD

M: +44 (0)7786 260986

Skype: tom.nabarro

 

From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of Colin Ngam
Sent: Tuesday, March 31, 2020 4:17 PM
To: daos@daos.groups.io
Subject: Re: [daos] How to figure out the all the daos_io_server are talking?

 

Hi,

 

I also noticed the following:

 

My configuration file does not have any “rank” specified. 2 daos_io_server per host. They came up and I see them as Rank 0 .. 3. All fine. I killed Rank 2 and 3 (same host) Control-C, restart daos_server, they came up registering as Rank 4 and 5. Is Rank persistent? If it is, what’s the association between Rank Number and daos_io_server? Host name IB GUID etc?

 

I will try to replicate and send trace when the opportunity arise here.

 

Thanks.

 

Colin

 

From: <daos@daos.groups.io> on behalf of "Nabarro, Tom" <tom.nabarro@...>
Reply-To: "
daos@daos.groups.io" <daos@daos.groups.io>
Date: Tuesday, March 31, 2020 at 10:07 AM
To: "
daos@daos.groups.io" <daos@daos.groups.io>
Subject: Re: [daos] How to figure out the all the daos_io_server are talking?

 

Thanks for the positive feedback, much appreciated.

 

We are looking to add information such as swim activity and other metrics to the dmg system query capability.

 

Regards,

Tom Nabarro – DCG/ESAD

M: +44 (0)7786 260986

Skype: tom.nabarro

 

From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of Colin Ngam
Sent: Tuesday, March 31, 2020 4:00 PM
To:
daos@daos.groups.io
Subject: Re: [daos] How to figure out the all the daos_io_server are talking?

 

Excellent!

 

I have seen messages that shows correct daos_io_server joined but 1 of the daos_io_server has no swim messages. I assume that if you set DEBUG mode they will all have swim messages flowing .. I will check nice time around.

 

The dmg command has feature grown nicely.

 

Thanks Tom.

 

Colin

 

From: <daos@daos.groups.io> on behalf of "Nabarro, Tom" <tom.nabarro@...>
Reply-To: "
daos@daos.groups.io" <daos@daos.groups.io>
Date: Tuesday, March 31, 2020 at 9:41 AM
To: "
daos@daos.groups.io" <daos@daos.groups.io>
Subject: Re: [daos] How to figure out the all the daos_io_server are talking?

 

“dmg -l <access_point_host> system query --verbose” will give you details of ranks that have previously joined the system including current state. This is an area of active development.

 

https://daos-stack.github.io/admin/administration/ “system operations“ section

 

Regards,

Tom Nabarro – DCG/ESAD

M: +44 (0)7786 260986

Skype: tom.nabarro

 

From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of Colin Ngam
Sent: Tuesday, March 31, 2020 12:32 AM
To:
daos@daos.groups.io
Subject: [daos] How to figure out the all the daos_io_server are talking?

 

Hi,

 

The daos_io_server say is it up and may not be  communicating with others. I set the DEBUG flag and watch for swim messages.

 

Is there a way to figure out that all the daos_io_server are healthy and functional?

 

Thanks.

 

Colin

---------------------------------------------------------------------
Intel Corporation (UK) Limited
Registered No. 1134945 (England)
Registered Office: Pipers Way, Swindon SN3 1RJ
VAT No: 860 2173 47

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

---------------------------------------------------------------------
Intel Corporation (UK) Limited
Registered No. 1134945 (England)
Registered Office: Pipers Way, Swindon SN3 1RJ
VAT No: 860 2173 47

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

---------------------------------------------------------------------
Intel Corporation (UK) Limited
Registered No. 1134945 (England)
Registered Office: Pipers Way, Swindon SN3 1RJ
VAT No: 860 2173 47

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

 

---------------------------------------------------------------------
Intel Corporation (UK) Limited
Registered No. 1134945 (England)
Registered Office: Pipers Way, Swindon SN3 1RJ
VAT No: 860 2173 47

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


Re: Need advice debugging "No route to host" failures

Farrell, Patrick Arthur <patrick.farrell@...>
 

Note also that this problem can happen on a pure client - So I'd just add that changes to the server config file & handling are not enough.  (I think that client was running a server previously, but the server was definitely not running when I saw this issue.)

-Patrick

From: daos@daos.groups.io <daos@daos.groups.io> on behalf of Kevan Rehm <kevan.rehm@...>
Sent: Friday, April 17, 2020 4:25 PM
To: daos@daos.groups.io <daos@daos.groups.io>
Subject: Re: [daos] Need advice debugging "No route to host" failures
 

Alex,

 

Okay, I figured out what was happening here.   I have a couple of questions; I wonder if you think this is mostly a configuration error, or a bug.   Or at least maybe there is a way to make the problem get reported more clearly so that I don’t have to debug libfabric to figure it out…

 

First the memory failure.   I was running the 36 clients on one of the server nodes.  The libfabric function ofi_bufpool_grow() is coded to try the first time it needs memory to get a huge page.  If that succeeds, it will always use hugepages in subsequent calls.   If that fails, it will instead use regular memory in all future calls.   Note that if libfabric runs out of hugepages, it does NOT then switch and use regular memory in future calls.

 

In my case there were huge pages defined, but between the 36 clients they used up all the available hugepages not used by the servers, they still needed more, and so libfabric died with “Cannot allocate memory”.   Do you think that is acceptable, or should it have switched to using regular memory when the hugepages were gone?

 

In my daos_server.yml file the nr_hugepages parameter is now set to 8192.   This resulted in 16384 pages being allocated, because each SPDK run (2 daos_io_servers) allocated 8192 pages.   The comments in daos_server.yml are a little fuzzy, they could be read either way, did this behave correctly, should the total number of pages allocated be double the value of nr_hugepages?

 

Originally I had nr_hugepages set to 4096 in the daos_server.yml file.  Using sysctl I had set vm.nr_hugepages to 16384, and /proc/meminfo showed I had 16384 huge pages.   But when I started the daos server, SPDK released half of my huge pages, I was left with only 8192, or 4096 per server.   If I configure the machine for 16384, SPDK should not lower it, correct?   It should raise the number of huge pages if the config file says so, but it should never lower the value set for the machine if it is greater than what is set in nr_hugepages.

 

I had to do the math to figure out how many hugepages my clients needed, divide that by two, then add that to nr_hugepages in the yml file to ensure that SPDK would allocate enough pages both for the two servers and my clients.   That seems complicated.

 

Your thoughts would be appreciated,

 

Thanks, Kevan

 

From: <daos@daos.groups.io> on behalf of "Oganezov, Alexander A" <alexander.a.oganezov@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Friday, April 17, 2020 at 2:51 PM
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: Re: [daos] Need advice debugging "No route to host" failures

 

Hi Kevan,

 

In our scenarios/cases so far we do not see any error before “No route to host”, so it sounds like you might be hitting a different issue there.

Can you provide more information as to what your clients are and which params are being used?

 

Based on your description, yes, it does not look like FI_UNIVERSE_SIZE would make any difference there given the default size of 256 which you do not cross in your case.

 

Also, to verify, when you see those errors you also have the Xuezhaos patch, from below, correct?

 

As to allocation failures – we have not had any reports of that yet using latest daos (which includes xuezhaos patch).

 

Thanks,

~~Alex.

 

 

From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of Kevan Rehm
Sent: Friday, April 17, 2020 12:27 PM
To: daos@daos.groups.io
Subject: Re: [daos] Need advice debugging "No route to host" failures

 

Alex,

 

I see “No route to host” a lot, including right at this moment, with only 36 clients talking to 2 daos_io_servers.   Your FI_UNIVERSE_SIZE variable probably wouldn’t have an effect in a config this small, correct?

 

The way this always seems to happen is that some other libfabric error occurs first,  e.g. see below, which causes libfabric to close the connection with the server and remove the server’s address from the libfabric AV table.  Mercury doesn’t give up when that happens, it keeps retrying when the retry cannot possibly succeed because the client no longer has the server’s address.   

 

I have been meaning to track down this issue, but other issues keep getting in my way.   I know there’s a spot in libfabric where it shuts down the connection, but then returns FI_EAGAIN, and that is the beginning of at least one of the “No route to host” loops in Mercury.   I am chasing another one at the moment that looks a little different, see below

 

I think we need some code in Mercury to detect when the connection to the server has been shut down, and cancel the affected requests immediately, so that the logs don’t clutter with extraneous downstream errors.

 

This current failure is actually a memory allocation failure in libfabric.   You can see from the bolded messages that if Mercury would retry this request or any other request, it is guaranteed to get “No route to host”.  Does this look familiar to you?   If not, I will start inserting debug messages into libfabric.

 

Thanks for keeping us in the loop,

 

Kevan

 

libfabric:140355:verbs:core:ofi_check_ep_type():624<info> Supported: FI_EP_DGRAM

libfabric:140355:verbs:core:ofi_check_ep_type():624<info> Requested: FI_EP_MSG

libfabric:140355:core:mr:ofi_intercept_mmap():368<debug> intercepted mmap start (nil) len 4194304

libfabric:140355:core:core:ofi_bufpool_grow():87<debug> Allocation failed: Cannot allocate memory

libfabric:140355:verbs:domain:vrb_ep_close():383<info> EP 0x3ed0520 is being closed

libfabric:140355:ofi_rxm:ep_ctrl:rxm_cmap_del_handle():159<debug> marking connection handle: 0x3e18c80 for deletion

libfabric:140355:ofi_rxm:ep_ctrl:rxm_cmap_del_handle():162<debug> [CM] handle: 0x3e18c80 RXM_CMAP_IDLE -> RXM_CMAP_SHUTDOWN

04/17-14:02:59.70 delphi-004 DAOS[140355/140355] hg   ERR  # NA -- Error -- /delphi/krehm/daos/_build.external/mercury/src/na/na_ofi.c:3877

 # na_ofi_msg_send_unexpected(): fi_tsend() unexpected failed, rc: -12 (Cannot allocate memory)

04/17-14:02:59.70 delphi-004 DAOS[140355/140355] hg   ERR  # HG -- Error -- /delphi/krehm/daos/_build.external/mercury/src/mercury_core.c:1985

 # hg_core_forward_na(): Could not post send for input buffer (NA_PROTOCOL_ERROR)

04/17-14:02:59.70 delphi-004 DAOS[140355/140355] hg   DBUG # NA -- Debug -- /delphi/krehm/daos/_build.external/mercury/src/na/na_ofi.c:4626

 # na_ofi_cancel(): Canceling operation ID 0x1d92ca0

04/17-14:02:59.70 delphi-004 DAOS[140355/140355] hg   DBUG # NA -- Debug -- /delphi/krehm/daos/_build.external/mercury/src/na/na_ofi.c:4667

 # na_ofi_cancel(): fi_cancel() rc: 0 (Success)

04/17-14:02:59.70 delphi-004 DAOS[140355/140355] hg   ERR  # HG -- Error -- /delphi/krehm/daos/_build.external/mercury/src/mercury_core.c:4393

 # HG_Core_forward(): Could not forward buffer

04/17-14:02:59.70 delphi-004 DAOS[140355/140355] hg   ERR  # HG -- Error -- /delphi/krehm/daos/_build.external/mercury/src/mercury.c:1919

 # HG_Forward(): Could not forward call (HG_PROTOCOL_ERROR)

04/17-14:02:59.70 delphi-004 DAOS[140355/140355] hg   ERR  src/cart/src/cart/crt_hg.c:1208 crt_hg_req_send(0x3ed0040)  [opc=0x4010001 xid=0x5c rank:tag=0:6] HG_Forward failed, hg_ret: 11

04/17-14:02:59.70 delphi-004 DAOS[140355/140355] hg   DBUG src/cart/src/cart/crt_hg.c:1230 crt_hg_req_send(0x3ed0040)  [opc=0x4010001 xid=0x5c rank:tag=0:6] decref to 5.

04/17-14:02:59.70 delphi-004 DAOS[140355/140355] rpc  ERR  src/cart/src/cart/crt_rpc.c:1034 crt_req_send_immediately(0x3ed0040)  [opc=0x4010001 xid=0x5c rank:tag=0:6] crt_hg_req_send failed, rc: -1020

04/17-14:02:59.70 delphi-004 DAOS[140355/140355] rpc  ERR  src/cart/src/cart/crt_rpc.c:517 crt_req_hg_addr_lookup_cb(0x3ed0040)  [opc=0x4010001 xid=0x5c rank:tag=0:6] crt_req_send_internal() failed, rc -1020

04/17-14:02:59.70 delphi-004 DAOS[140355/140355] rpc  ERR  src/cart/src/cart/crt_context.c:302 crt_rpc_complete(0x3ed0040)  [opc=0x4010001 xid=0x5c rank:tag=0:6] RPC failed; rc: -1020

04/17-14:02:59.70 delphi-004 DAOS[140355/140355] object ERR  src/object/cli_shard.c:263 dc_rw_cb() RPC 1 failed: -1020

 

 

From: <daos@daos.groups.io> on behalf of "Oganezov, Alexander A" <alexander.a.oganezov@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Friday, April 17, 2020 at 10:44 AM
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: Re: [daos] Need advice debugging "No route to host" failures

 

Hi Kevan & others,

 

As we’ve been debugging few other “no route to host” failures here, one thing that also turns out needs to be set on large-scale systems is FI_UNIVERSE_SIZE envariable, which in RXM-based providers (verbs included) dictates internal size of address vector table for peers.

 

With Xuezhaos patch below (already part of latest 0.9 DAOS) and FI_UNIVERSE_SIZE=2048  setting we were able to avoid “No route to host” issues on all known failures so far.

 

This issue is also being tracked in libfabric as https://github.com/ofiwg/libfabric/issues/5843

 

Thanks,

~~Alex.

 

 

From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of Liu, Xuezhao
Sent: Thursday, April 9, 2020 7:24 AM
To: daos@daos.groups.io
Subject: Re: [daos] Need advice debugging "No route to host" failures

 

Hi Kevan,

 

Great, thanks for verifying and the helpful logs.

 

On the server side, what should be the proper behavior if a client tries to connect twice using the same IPoIB port?   Should the server reject the second connection attempt?   Close the first connection and then open the second?

Server should not reject the 2nd connection attempt, the current handling (with the patch) is it will check its local cache (a hash-table) and reuse it if possible.

 

Xuezhao

 

From: <daos@daos.groups.io> on behalf of Kevan Rehm <kevan.rehm@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Thursday, April 9, 2020 at 10:19 PM
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: Re: [daos] Need advice debugging "No route to host" failures

 

Xuezhao,

 

Yes, it works, thank you very much!    I have been trying to figure that one out for several days now! 😊

 

Kevan

 

From: <daos@daos.groups.io> on behalf of "Liu, Xuezhao" <xuezhao.liu@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Thursday, April 9, 2020 at 8:34 AM
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: Re: [daos] Need advice debugging "No route to host" failures

 

Hi Kevan,

 

The log you posted is very helpful. It helped to find a potential bug in mercury code for address management, please find attached small patch.

It needs to patch mercury source code and rebuild it. Commonly I do that by:

cd daos_m_path/_build.external/mercury; git am 0001-NA-OFI-fix-a-potential-bug-in-na_ofi_addr_ht_lookup.patch; cd ../ mercury.build; make clean; make install

 

Could you please have a test to see if it can resolve the problem?

 

Thanks,

Xuezhao

 

From: <daos@daos.groups.io> on behalf of Kevan Rehm <kevan.rehm@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Thursday, April 9, 2020 at 5:50 PM
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: [daos] Need advice debugging "No route to host" failures

 

Greetings,

 

We have been unsuccessful over the last few days debugging “No route to host” failures that we are constantly seeing when running “ior -a DAOS”.  The errors only occur in the log of the daos_io_server on IB1, never on the IB0 server.   Client processes hang when this occurs.  The problem does not appear to be related to our recent verbs;ofi_rxm issues which we think are now resolved.   Any advice you could provide would be greatly appreciated.

 

Our configuration is one dual-daos_io_server node and one client node.   The libfabric transport is verbs;ofi_rxm, both nodes have two interfaces.   The server node’s PMEM is wiped and reformatted before each test run.  We use mpirun to start 16 ior client processes on the client node.  The ior program has been slightly modified so that processes with even-numbered ranks use IB0 while odd-numbered ranks use IB1.   Here is an example command:

 

mpirun --allow-run-as-root -np 16 $X_OPTS /root/ior/ior/src/ior -a DAOS --daos.pool $POOL --daos.cont $CONT --daos.svcl $SVCL -w -r -t 64M -b 1G

 

When we reduce the number of client processes, the problem occurs less frequently.   With only two processes on the client node we haven’t seen the problem.

 

What happens is that some client processes which connect to the IB1 server actually connect twice using the same source IPoIB port number.   Those that do are guaranteed to generate “No route to host” errors in the IB1 server log and then hang.   For example, the following grep of the IB1 server log shows that the connecting processes that were assigned to libfabric fi_addrs 2 and 3 are going to fail.  If you would repeat this same grep on the IB0 server log, you would never see duplicate connections:

 

# grep '<debug> av_insert ' *1.log

libfabric:18490:ofi_rxm:av:ip_av_insert_addr():609<debug> av_insert addr: fi_sockaddr_in://10.0.0.220:20000

libfabric:18490:ofi_rxm:av:ip_av_insert_addr():612<debug> av_insert fi_addr: 0

libfabric:18490:ofi_rxm:av:ip_av_insert_addr():609<debug> av_insert addr: fi_sockaddr_in://10.0.0.222:43782

libfabric:18490:ofi_rxm:av:ip_av_insert_addr():612<debug> av_insert fi_addr: 1

libfabric:18490:ofi_rxm:av:ip_av_insert_addr():609<debug> av_insert addr: fi_sockaddr_in://10.0.0.222:37665

libfabric:18490:ofi_rxm:av:ip_av_insert_addr():612<debug> av_insert fi_addr: 2

libfabric:18490:ofi_rxm:av:ip_av_insert_addr():609<debug> av_insert addr: fi_sockaddr_in://10.0.0.222:37665

libfabric:18490:ofi_rxm:av:ip_av_insert_addr():612<debug> av_insert fi_addr: 2

libfabric:18490:ofi_rxm:av:ip_av_insert_addr():609<debug> av_insert addr: fi_sockaddr_in://10.0.1.222:48826

libfabric:18490:ofi_rxm:av:ip_av_insert_addr():612<debug> av_insert fi_addr: 3

libfabric:18490:ofi_rxm:av:ip_av_insert_addr():609<debug> av_insert addr: fi_sockaddr_in://10.0.1.222:48826

libfabric:18490:ofi_rxm:av:ip_av_insert_addr():612<debug> av_insert fi_addr: 3

libfabric:18490:ofi_rxm:av:ip_av_insert_addr():609<debug> av_insert addr: fi_sockaddr_in://10.0.0.222:39600

libfabric:18490:ofi_rxm:av:ip_av_insert_addr():612<debug> av_insert fi_addr: 4

libfabric:18490:ofi_rxm:av:ip_av_insert_addr():609<debug> av_insert addr: fi_sockaddr_in://10.0.0.222:56137

libfabric:18490:ofi_rxm:av:ip_av_insert_addr():612<debug> av_insert fi_addr: 5

 

Here are the initial failure messages for fi_addr 2 in the IB1 server log.   The messages with “krehm” in them were added by me for debugging purposes:

 

04/08-19:18:08.43 delphi-002 DAOS[18490/18530] hg   DBUG # NA -- Debug -- /root/daos/_build.external/mercury/src/na/na_ofi.c:4442

 # na_ofi_get(): Posting RMA get (op id=0x7fecab64eba0, fi_addr=2, uri=(null))

libfabric:18490:ofi_rxm:ep_ctrl:rxm_ep_prepare_tx():851<warn> krehm: rxm_cmap_acquire_handle() returned NULL, fi_addr: 2

libfabric:18490:ofi_rxm:ep_data:rxm_ep_rma_common():80<warn> krehm: rxm_ep_prepare_tx() returned -113, fi_addr: 2

04/08-19:18:08.43 delphi-002 DAOS[18490/18614] hg   ERR  # NA -- Error -- /root/daos/_build.external/mercury/src/na/na_ofi.c:4462

 # na_ofi_get(): fi_readmsg() failed, rc: -113 (No route to host)

04/08-19:18:08.43 delphi-002 DAOS[18490/18554] hg   DBUG # NA -- Debug -- /root/daos/_build.external/mercury/src/na/na_ofi.c:4442

 # na_ofi_get(): Posting RMA get (op id=0x7fec1b64e820, fi_addr=1, uri=(null))

libfabric:18490:core:mr:ofi_intercept_mmap():368<debug> intercepted mmap start (nil) len 4194304

04/08-19:18:08.43 delphi-002 DAOS[18490/18614] hg   ERR  # HG -- Error -- /root/daos/_build.external/mercury/src/mercury_bulk.c:793

 # hg_bulk_transfer_pieces(): Could not transfer data (NA_PROTOCOL_ERROR)

04/08-19:18:08.43 delphi-002 DAOS[18490/18614] hg   ERR  # HG -- Error -- /root/daos/_build.external/mercury/src/mercury_bulk.c:952

 # hg_bulk_transfer(): Could not transfer data pieces

04/08-19:18:08.43 delphi-002 DAOS[18490/18614] hg   ERR  # HG -- Error -- /root/daos/_build.external/mercury/src/mercury_bulk.c:1793

 # HG_Bulk_transfer_id(): Could not start transfer of bulk data

 

My suspicion is that the server’s cleanup of the first connection from fi_addr 2 throws away the libfabric address of that client, so then when the server tries to initiate a RMA Read request using the second client connection, it has no libfabric address for the client and fails.

 

The failure sequence for fi_addr 3 is identical:

 

04/08-19:18:08.43 delphi-002 DAOS[18490/18614] hg   DBUG # NA -- Debug -- /root/daos/_build.external/mercury/src/na/na_ofi.c:4442

 # na_ofi_get(): Posting RMA get (op id=0x7feab365f550, fi_addr=3, uri=(null))

libfabric:18490:ofi_rxm:ep_ctrl:rxm_ep_prepare_tx():851<warn> krehm: rxm_cmap_acquire_handle() returned NULL, fi_addr: 3

libfabric:18490:ofi_rxm:ep_data:rxm_ep_rma_common():80<warn> krehm: rxm_ep_prepare_tx() returned -113, fi_addr: 3

04/08-19:18:08.43 delphi-002 DAOS[18490/18554] hg   DBUG # NA -- Debug -- /root/daos/_build.external/mercury/src/na/na_ofi.c:2707

 # na_ofi_cq_process_recv_unexpected_event(): unexpected recv msg completion event with tag=140657172791930, len=4294967364

04/08-19:18:08.43 delphi-002 DAOS[18490/18578] object ERR  src/object/srv_obj.c:384 obj_bulk_transfer() crt_bulk_transfer 0 error (-1020).

04/08-19:18:08.43 delphi-002 DAOS[18490/18578] object ERR  src/object/srv_obj.c:1062 obj_local_rw() 1153784440846024704.5704104871060330004.13 data transfer failed, dma 1 rc DER_HG(-1020)

04/08-19:18:08.43 delphi-002 DAOS[18490/18614] hg   ERR  # NA -- Error -- /root/daos/_build.external/mercury/src/na/na_ofi.c:4462

 # na_ofi_get(): fi_readmsg() failed, rc: -113 (No route to host)

04/08-19:18:08.43 delphi-002 DAOS[18490/18614] hg   ERR  # HG -- Error -- /root/daos/_build.external/mercury/src/mercury_bulk.c:793

 # hg_bulk_transfer_pieces(): Could not transfer data (NA_PROTOCOL_ERROR)

04/08-19:18:08.43 delphi-002 DAOS[18490/18614] hg   ERR  # HG -- Error -- /root/daos/_build.external/mercury/src/mercury_bulk.c:952

 # hg_bulk_transfer(): Could not transfer data pieces

04/08-19:18:08.43 delphi-002 DAOS[18490/18614] hg   ERR  # HG -- Error -- /root/daos/_build.external/mercury/src/mercury_bulk.c:1793

 # HG_Bulk_transfer_id(): Could not start transfer of bulk data

 

Could someone suggest ways that we might narrow down the problem?

 

Thanks, Kevan


Re: Need advice debugging "No route to host" failures

Kevan Rehm
 

Alex,

 

Okay, I figured out what was happening here.   I have a couple of questions; I wonder if you think this is mostly a configuration error, or a bug.   Or at least maybe there is a way to make the problem get reported more clearly so that I don’t have to debug libfabric to figure it out…

 

First the memory failure.   I was running the 36 clients on one of the server nodes.  The libfabric function ofi_bufpool_grow() is coded to try the first time it needs memory to get a huge page.  If that succeeds, it will always use hugepages in subsequent calls.   If that fails, it will instead use regular memory in all future calls.   Note that if libfabric runs out of hugepages, it does NOT then switch and use regular memory in future calls.

 

In my case there were huge pages defined, but between the 36 clients they used up all the available hugepages not used by the servers, they still needed more, and so libfabric died with “Cannot allocate memory”.   Do you think that is acceptable, or should it have switched to using regular memory when the hugepages were gone?

 

In my daos_server.yml file the nr_hugepages parameter is now set to 8192.   This resulted in 16384 pages being allocated, because each SPDK run (2 daos_io_servers) allocated 8192 pages.   The comments in daos_server.yml are a little fuzzy, they could be read either way, did this behave correctly, should the total number of pages allocated be double the value of nr_hugepages?

 

Originally I had nr_hugepages set to 4096 in the daos_server.yml file.  Using sysctl I had set vm.nr_hugepages to 16384, and /proc/meminfo showed I had 16384 huge pages.   But when I started the daos server, SPDK released half of my huge pages, I was left with only 8192, or 4096 per server.   If I configure the machine for 16384, SPDK should not lower it, correct?   It should raise the number of huge pages if the config file says so, but it should never lower the value set for the machine if it is greater than what is set in nr_hugepages.

 

I had to do the math to figure out how many hugepages my clients needed, divide that by two, then add that to nr_hugepages in the yml file to ensure that SPDK would allocate enough pages both for the two servers and my clients.   That seems complicated.

 

Your thoughts would be appreciated,

 

Thanks, Kevan

 

From: <daos@daos.groups.io> on behalf of "Oganezov, Alexander A" <alexander.a.oganezov@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Friday, April 17, 2020 at 2:51 PM
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: Re: [daos] Need advice debugging "No route to host" failures

 

Hi Kevan,

 

In our scenarios/cases so far we do not see any error before “No route to host”, so it sounds like you might be hitting a different issue there.

Can you provide more information as to what your clients are and which params are being used?

 

Based on your description, yes, it does not look like FI_UNIVERSE_SIZE would make any difference there given the default size of 256 which you do not cross in your case.

 

Also, to verify, when you see those errors you also have the Xuezhaos patch, from below, correct?

 

As to allocation failures – we have not had any reports of that yet using latest daos (which includes xuezhaos patch).

 

Thanks,

~~Alex.

 

 

From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of Kevan Rehm
Sent: Friday, April 17, 2020 12:27 PM
To: daos@daos.groups.io
Subject: Re: [daos] Need advice debugging "No route to host" failures

 

Alex,

 

I see “No route to host” a lot, including right at this moment, with only 36 clients talking to 2 daos_io_servers.   Your FI_UNIVERSE_SIZE variable probably wouldn’t have an effect in a config this small, correct?

 

The way this always seems to happen is that some other libfabric error occurs first,  e.g. see below, which causes libfabric to close the connection with the server and remove the server’s address from the libfabric AV table.  Mercury doesn’t give up when that happens, it keeps retrying when the retry cannot possibly succeed because the client no longer has the server’s address.   

 

I have been meaning to track down this issue, but other issues keep getting in my way.   I know there’s a spot in libfabric where it shuts down the connection, but then returns FI_EAGAIN, and that is the beginning of at least one of the “No route to host” loops in Mercury.   I am chasing another one at the moment that looks a little different, see below

 

I think we need some code in Mercury to detect when the connection to the server has been shut down, and cancel the affected requests immediately, so that the logs don’t clutter with extraneous downstream errors.

 

This current failure is actually a memory allocation failure in libfabric.   You can see from the bolded messages that if Mercury would retry this request or any other request, it is guaranteed to get “No route to host”.  Does this look familiar to you?   If not, I will start inserting debug messages into libfabric.

 

Thanks for keeping us in the loop,

 

Kevan

 

libfabric:140355:verbs:core:ofi_check_ep_type():624<info> Supported: FI_EP_DGRAM

libfabric:140355:verbs:core:ofi_check_ep_type():624<info> Requested: FI_EP_MSG

libfabric:140355:core:mr:ofi_intercept_mmap():368<debug> intercepted mmap start (nil) len 4194304

libfabric:140355:core:core:ofi_bufpool_grow():87<debug> Allocation failed: Cannot allocate memory

libfabric:140355:verbs:domain:vrb_ep_close():383<info> EP 0x3ed0520 is being closed

libfabric:140355:ofi_rxm:ep_ctrl:rxm_cmap_del_handle():159<debug> marking connection handle: 0x3e18c80 for deletion

libfabric:140355:ofi_rxm:ep_ctrl:rxm_cmap_del_handle():162<debug> [CM] handle: 0x3e18c80 RXM_CMAP_IDLE -> RXM_CMAP_SHUTDOWN

04/17-14:02:59.70 delphi-004 DAOS[140355/140355] hg   ERR  # NA -- Error -- /delphi/krehm/daos/_build.external/mercury/src/na/na_ofi.c:3877

 # na_ofi_msg_send_unexpected(): fi_tsend() unexpected failed, rc: -12 (Cannot allocate memory)

04/17-14:02:59.70 delphi-004 DAOS[140355/140355] hg   ERR  # HG -- Error -- /delphi/krehm/daos/_build.external/mercury/src/mercury_core.c:1985

 # hg_core_forward_na(): Could not post send for input buffer (NA_PROTOCOL_ERROR)

04/17-14:02:59.70 delphi-004 DAOS[140355/140355] hg   DBUG # NA -- Debug -- /delphi/krehm/daos/_build.external/mercury/src/na/na_ofi.c:4626

 # na_ofi_cancel(): Canceling operation ID 0x1d92ca0

04/17-14:02:59.70 delphi-004 DAOS[140355/140355] hg   DBUG # NA -- Debug -- /delphi/krehm/daos/_build.external/mercury/src/na/na_ofi.c:4667

 # na_ofi_cancel(): fi_cancel() rc: 0 (Success)

04/17-14:02:59.70 delphi-004 DAOS[140355/140355] hg   ERR  # HG -- Error -- /delphi/krehm/daos/_build.external/mercury/src/mercury_core.c:4393

 # HG_Core_forward(): Could not forward buffer

04/17-14:02:59.70 delphi-004 DAOS[140355/140355] hg   ERR  # HG -- Error -- /delphi/krehm/daos/_build.external/mercury/src/mercury.c:1919

 # HG_Forward(): Could not forward call (HG_PROTOCOL_ERROR)

04/17-14:02:59.70 delphi-004 DAOS[140355/140355] hg   ERR  src/cart/src/cart/crt_hg.c:1208 crt_hg_req_send(0x3ed0040)  [opc=0x4010001 xid=0x5c rank:tag=0:6] HG_Forward failed, hg_ret: 11

04/17-14:02:59.70 delphi-004 DAOS[140355/140355] hg   DBUG src/cart/src/cart/crt_hg.c:1230 crt_hg_req_send(0x3ed0040)  [opc=0x4010001 xid=0x5c rank:tag=0:6] decref to 5.

04/17-14:02:59.70 delphi-004 DAOS[140355/140355] rpc  ERR  src/cart/src/cart/crt_rpc.c:1034 crt_req_send_immediately(0x3ed0040)  [opc=0x4010001 xid=0x5c rank:tag=0:6] crt_hg_req_send failed, rc: -1020

04/17-14:02:59.70 delphi-004 DAOS[140355/140355] rpc  ERR  src/cart/src/cart/crt_rpc.c:517 crt_req_hg_addr_lookup_cb(0x3ed0040)  [opc=0x4010001 xid=0x5c rank:tag=0:6] crt_req_send_internal() failed, rc -1020

04/17-14:02:59.70 delphi-004 DAOS[140355/140355] rpc  ERR  src/cart/src/cart/crt_context.c:302 crt_rpc_complete(0x3ed0040)  [opc=0x4010001 xid=0x5c rank:tag=0:6] RPC failed; rc: -1020

04/17-14:02:59.70 delphi-004 DAOS[140355/140355] object ERR  src/object/cli_shard.c:263 dc_rw_cb() RPC 1 failed: -1020

 

 

From: <daos@daos.groups.io> on behalf of "Oganezov, Alexander A" <alexander.a.oganezov@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Friday, April 17, 2020 at 10:44 AM
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: Re: [daos] Need advice debugging "No route to host" failures

 

Hi Kevan & others,

 

As we’ve been debugging few other “no route to host” failures here, one thing that also turns out needs to be set on large-scale systems is FI_UNIVERSE_SIZE envariable, which in RXM-based providers (verbs included) dictates internal size of address vector table for peers.

 

With Xuezhaos patch below (already part of latest 0.9 DAOS) and FI_UNIVERSE_SIZE=2048  setting we were able to avoid “No route to host” issues on all known failures so far.

 

This issue is also being tracked in libfabric as https://github.com/ofiwg/libfabric/issues/5843

 

Thanks,

~~Alex.

 

 

From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of Liu, Xuezhao
Sent: Thursday, April 9, 2020 7:24 AM
To: daos@daos.groups.io
Subject: Re: [daos] Need advice debugging "No route to host" failures

 

Hi Kevan,

 

Great, thanks for verifying and the helpful logs.

 

On the server side, what should be the proper behavior if a client tries to connect twice using the same IPoIB port?   Should the server reject the second connection attempt?   Close the first connection and then open the second?

Server should not reject the 2nd connection attempt, the current handling (with the patch) is it will check its local cache (a hash-table) and reuse it if possible.

 

Xuezhao

 

From: <daos@daos.groups.io> on behalf of Kevan Rehm <kevan.rehm@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Thursday, April 9, 2020 at 10:19 PM
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: Re: [daos] Need advice debugging "No route to host" failures

 

Xuezhao,

 

Yes, it works, thank you very much!    I have been trying to figure that one out for several days now! 😊

 

Kevan

 

From: <daos@daos.groups.io> on behalf of "Liu, Xuezhao" <xuezhao.liu@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Thursday, April 9, 2020 at 8:34 AM
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: Re: [daos] Need advice debugging "No route to host" failures

 

Hi Kevan,

 

The log you posted is very helpful. It helped to find a potential bug in mercury code for address management, please find attached small patch.

It needs to patch mercury source code and rebuild it. Commonly I do that by:

cd daos_m_path/_build.external/mercury; git am 0001-NA-OFI-fix-a-potential-bug-in-na_ofi_addr_ht_lookup.patch; cd ../ mercury.build; make clean; make install

 

Could you please have a test to see if it can resolve the problem?

 

Thanks,

Xuezhao

 

From: <daos@daos.groups.io> on behalf of Kevan Rehm <kevan.rehm@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Thursday, April 9, 2020 at 5:50 PM
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: [daos] Need advice debugging "No route to host" failures

 

Greetings,

 

We have been unsuccessful over the last few days debugging “No route to host” failures that we are constantly seeing when running “ior -a DAOS”.  The errors only occur in the log of the daos_io_server on IB1, never on the IB0 server.   Client processes hang when this occurs.  The problem does not appear to be related to our recent verbs;ofi_rxm issues which we think are now resolved.   Any advice you could provide would be greatly appreciated.

 

Our configuration is one dual-daos_io_server node and one client node.   The libfabric transport is verbs;ofi_rxm, both nodes have two interfaces.   The server node’s PMEM is wiped and reformatted before each test run.  We use mpirun to start 16 ior client processes on the client node.  The ior program has been slightly modified so that processes with even-numbered ranks use IB0 while odd-numbered ranks use IB1.   Here is an example command:

 

mpirun --allow-run-as-root -np 16 $X_OPTS /root/ior/ior/src/ior -a DAOS --daos.pool $POOL --daos.cont $CONT --daos.svcl $SVCL -w -r -t 64M -b 1G

 

When we reduce the number of client processes, the problem occurs less frequently.   With only two processes on the client node we haven’t seen the problem.

 

What happens is that some client processes which connect to the IB1 server actually connect twice using the same source IPoIB port number.   Those that do are guaranteed to generate “No route to host” errors in the IB1 server log and then hang.   For example, the following grep of the IB1 server log shows that the connecting processes that were assigned to libfabric fi_addrs 2 and 3 are going to fail.  If you would repeat this same grep on the IB0 server log, you would never see duplicate connections:

 

# grep '<debug> av_insert ' *1.log

libfabric:18490:ofi_rxm:av:ip_av_insert_addr():609<debug> av_insert addr: fi_sockaddr_in://10.0.0.220:20000

libfabric:18490:ofi_rxm:av:ip_av_insert_addr():612<debug> av_insert fi_addr: 0

libfabric:18490:ofi_rxm:av:ip_av_insert_addr():609<debug> av_insert addr: fi_sockaddr_in://10.0.0.222:43782

libfabric:18490:ofi_rxm:av:ip_av_insert_addr():612<debug> av_insert fi_addr: 1

libfabric:18490:ofi_rxm:av:ip_av_insert_addr():609<debug> av_insert addr: fi_sockaddr_in://10.0.0.222:37665

libfabric:18490:ofi_rxm:av:ip_av_insert_addr():612<debug> av_insert fi_addr: 2

libfabric:18490:ofi_rxm:av:ip_av_insert_addr():609<debug> av_insert addr: fi_sockaddr_in://10.0.0.222:37665

libfabric:18490:ofi_rxm:av:ip_av_insert_addr():612<debug> av_insert fi_addr: 2

libfabric:18490:ofi_rxm:av:ip_av_insert_addr():609<debug> av_insert addr: fi_sockaddr_in://10.0.1.222:48826

libfabric:18490:ofi_rxm:av:ip_av_insert_addr():612<debug> av_insert fi_addr: 3

libfabric:18490:ofi_rxm:av:ip_av_insert_addr():609<debug> av_insert addr: fi_sockaddr_in://10.0.1.222:48826

libfabric:18490:ofi_rxm:av:ip_av_insert_addr():612<debug> av_insert fi_addr: 3

libfabric:18490:ofi_rxm:av:ip_av_insert_addr():609<debug> av_insert addr: fi_sockaddr_in://10.0.0.222:39600

libfabric:18490:ofi_rxm:av:ip_av_insert_addr():612<debug> av_insert fi_addr: 4

libfabric:18490:ofi_rxm:av:ip_av_insert_addr():609<debug> av_insert addr: fi_sockaddr_in://10.0.0.222:56137

libfabric:18490:ofi_rxm:av:ip_av_insert_addr():612<debug> av_insert fi_addr: 5

 

Here are the initial failure messages for fi_addr 2 in the IB1 server log.   The messages with “krehm” in them were added by me for debugging purposes:

 

04/08-19:18:08.43 delphi-002 DAOS[18490/18530] hg   DBUG # NA -- Debug -- /root/daos/_build.external/mercury/src/na/na_ofi.c:4442

 # na_ofi_get(): Posting RMA get (op id=0x7fecab64eba0, fi_addr=2, uri=(null))

libfabric:18490:ofi_rxm:ep_ctrl:rxm_ep_prepare_tx():851<warn> krehm: rxm_cmap_acquire_handle() returned NULL, fi_addr: 2

libfabric:18490:ofi_rxm:ep_data:rxm_ep_rma_common():80<warn> krehm: rxm_ep_prepare_tx() returned -113, fi_addr: 2

04/08-19:18:08.43 delphi-002 DAOS[18490/18614] hg   ERR  # NA -- Error -- /root/daos/_build.external/mercury/src/na/na_ofi.c:4462

 # na_ofi_get(): fi_readmsg() failed, rc: -113 (No route to host)

04/08-19:18:08.43 delphi-002 DAOS[18490/18554] hg   DBUG # NA -- Debug -- /root/daos/_build.external/mercury/src/na/na_ofi.c:4442

 # na_ofi_get(): Posting RMA get (op id=0x7fec1b64e820, fi_addr=1, uri=(null))

libfabric:18490:core:mr:ofi_intercept_mmap():368<debug> intercepted mmap start (nil) len 4194304

04/08-19:18:08.43 delphi-002 DAOS[18490/18614] hg   ERR  # HG -- Error -- /root/daos/_build.external/mercury/src/mercury_bulk.c:793

 # hg_bulk_transfer_pieces(): Could not transfer data (NA_PROTOCOL_ERROR)

04/08-19:18:08.43 delphi-002 DAOS[18490/18614] hg   ERR  # HG -- Error -- /root/daos/_build.external/mercury/src/mercury_bulk.c:952

 # hg_bulk_transfer(): Could not transfer data pieces

04/08-19:18:08.43 delphi-002 DAOS[18490/18614] hg   ERR  # HG -- Error -- /root/daos/_build.external/mercury/src/mercury_bulk.c:1793

 # HG_Bulk_transfer_id(): Could not start transfer of bulk data

 

My suspicion is that the server’s cleanup of the first connection from fi_addr 2 throws away the libfabric address of that client, so then when the server tries to initiate a RMA Read request using the second client connection, it has no libfabric address for the client and fails.

 

The failure sequence for fi_addr 3 is identical:

 

04/08-19:18:08.43 delphi-002 DAOS[18490/18614] hg   DBUG # NA -- Debug -- /root/daos/_build.external/mercury/src/na/na_ofi.c:4442

 # na_ofi_get(): Posting RMA get (op id=0x7feab365f550, fi_addr=3, uri=(null))

libfabric:18490:ofi_rxm:ep_ctrl:rxm_ep_prepare_tx():851<warn> krehm: rxm_cmap_acquire_handle() returned NULL, fi_addr: 3

libfabric:18490:ofi_rxm:ep_data:rxm_ep_rma_common():80<warn> krehm: rxm_ep_prepare_tx() returned -113, fi_addr: 3

04/08-19:18:08.43 delphi-002 DAOS[18490/18554] hg   DBUG # NA -- Debug -- /root/daos/_build.external/mercury/src/na/na_ofi.c:2707

 # na_ofi_cq_process_recv_unexpected_event(): unexpected recv msg completion event with tag=140657172791930, len=4294967364

04/08-19:18:08.43 delphi-002 DAOS[18490/18578] object ERR  src/object/srv_obj.c:384 obj_bulk_transfer() crt_bulk_transfer 0 error (-1020).

04/08-19:18:08.43 delphi-002 DAOS[18490/18578] object ERR  src/object/srv_obj.c:1062 obj_local_rw() 1153784440846024704.5704104871060330004.13 data transfer failed, dma 1 rc DER_HG(-1020)

04/08-19:18:08.43 delphi-002 DAOS[18490/18614] hg   ERR  # NA -- Error -- /root/daos/_build.external/mercury/src/na/na_ofi.c:4462

 # na_ofi_get(): fi_readmsg() failed, rc: -113 (No route to host)

04/08-19:18:08.43 delphi-002 DAOS[18490/18614] hg   ERR  # HG -- Error -- /root/daos/_build.external/mercury/src/mercury_bulk.c:793

 # hg_bulk_transfer_pieces(): Could not transfer data (NA_PROTOCOL_ERROR)

04/08-19:18:08.43 delphi-002 DAOS[18490/18614] hg   ERR  # HG -- Error -- /root/daos/_build.external/mercury/src/mercury_bulk.c:952

 # hg_bulk_transfer(): Could not transfer data pieces

04/08-19:18:08.43 delphi-002 DAOS[18490/18614] hg   ERR  # HG -- Error -- /root/daos/_build.external/mercury/src/mercury_bulk.c:1793

 # HG_Bulk_transfer_id(): Could not start transfer of bulk data

 

Could someone suggest ways that we might narrow down the problem?

 

Thanks, Kevan


Re: How to figure out the all the daos_io_server are talking?

Colin Ngam
 

Hi Tom,

 

I have a host that is rank 4 and 5. I killed the servers Control-C and restarted the serves after maybe 3-5 minutes. The access-host did not see any server going away. After restart I see:

 

DEBUG 14:56:53.800937 mgmt_system.go:147: updated system member: rank 4, addr 172.30.222.236:10001, Started->Started

ERROR: unexpected same state in rank 4 update (Started->Started)

DEBUG 14:56:53.888180 mgmt_system.go:147: updated system member: rank 5, addr 172.30.222.236:10001, Started->Started

ERROR: unexpected same state in rank 5 update (Started->Started)

 

Is this good or bad?

 

Today’s master tree.

 

Thanks.

 

Colin

 

From: "Ngam, Colin" <colin.ngam@...>
Date: Wednesday, April 1, 2020 at 10:05 AM
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: Re: [daos] How to figure out the all the daos_io_server are talking?

 

Hi Tom,

 

Thanks Tom. I will see if I can replicate the issue.

 

Colin

 

From: <daos@daos.groups.io> on behalf of "Nabarro, Tom" <tom.nabarro@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Wednesday, April 1, 2020 at 8:14 AM
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: Re: [daos] How to figure out the all the daos_io_server are talking?

 

The superblock stores allocated ranks:

[tanabarr@wolf-71 daos_m]$ cat /mnt/daos0/superblock

version: 0

uuid: 8359d5d6-1d8f-430c-90db-205fc07113bf

system: daos_server

rank: 2

validrank: true

ms: false

createms: false

bootstrapms: false

 

On start-up, non-MS daos_server will supply the superblock rank in the join request to the system.

The iosrv/mgmt maintains rank:uuid mappings and should allow the rank to rejoin the system given the expected uuid is supplied along with the rank in the request.

 

I can’t recreate the behavior you are describing at the moment, if I stop then start a non-MS daos_server, they join with the same rank they were previously assigned. (I assume you are not reformatting)

 

if you can repeat that behavior, please send me the logs. We should be able to get some relevant information from the mgmt module in the server log from the function ds_mgmt_join_handler().

 

Regards,

Tom Nabarro – DCG/ESAD

M: +44 (0)7786 260986

Skype: tom.nabarro

 

From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of Colin Ngam
Sent: Tuesday, March 31, 2020 4:17 PM
To: daos@daos.groups.io
Subject: Re: [daos] How to figure out the all the daos_io_server are talking?

 

Hi,

 

I also noticed the following:

 

My configuration file does not have any “rank” specified. 2 daos_io_server per host. They came up and I see them as Rank 0 .. 3. All fine. I killed Rank 2 and 3 (same host) Control-C, restart daos_server, they came up registering as Rank 4 and 5. Is Rank persistent? If it is, what’s the association between Rank Number and daos_io_server? Host name IB GUID etc?

 

I will try to replicate and send trace when the opportunity arise here.

 

Thanks.

 

Colin

 

From: <daos@daos.groups.io> on behalf of "Nabarro, Tom" <tom.nabarro@...>
Reply-To: "
daos@daos.groups.io" <daos@daos.groups.io>
Date: Tuesday, March 31, 2020 at 10:07 AM
To: "
daos@daos.groups.io" <daos@daos.groups.io>
Subject: Re: [daos] How to figure out the all the daos_io_server are talking?

 

Thanks for the positive feedback, much appreciated.

 

We are looking to add information such as swim activity and other metrics to the dmg system query capability.

 

Regards,

Tom Nabarro – DCG/ESAD

M: +44 (0)7786 260986

Skype: tom.nabarro

 

From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of Colin Ngam
Sent: Tuesday, March 31, 2020 4:00 PM
To:
daos@daos.groups.io
Subject: Re: [daos] How to figure out the all the daos_io_server are talking?

 

Excellent!

 

I have seen messages that shows correct daos_io_server joined but 1 of the daos_io_server has no swim messages. I assume that if you set DEBUG mode they will all have swim messages flowing .. I will check nice time around.

 

The dmg command has feature grown nicely.

 

Thanks Tom.

 

Colin

 

From: <daos@daos.groups.io> on behalf of "Nabarro, Tom" <tom.nabarro@...>
Reply-To: "
daos@daos.groups.io" <daos@daos.groups.io>
Date: Tuesday, March 31, 2020 at 9:41 AM
To: "
daos@daos.groups.io" <daos@daos.groups.io>
Subject: Re: [daos] How to figure out the all the daos_io_server are talking?

 

“dmg -l <access_point_host> system query --verbose” will give you details of ranks that have previously joined the system including current state. This is an area of active development.

 

https://daos-stack.github.io/admin/administration/ “system operations“ section

 

Regards,

Tom Nabarro – DCG/ESAD

M: +44 (0)7786 260986

Skype: tom.nabarro

 

From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of Colin Ngam
Sent: Tuesday, March 31, 2020 12:32 AM
To:
daos@daos.groups.io
Subject: [daos] How to figure out the all the daos_io_server are talking?

 

Hi,

 

The daos_io_server say is it up and may not be  communicating with others. I set the DEBUG flag and watch for swim messages.

 

Is there a way to figure out that all the daos_io_server are healthy and functional?

 

Thanks.

 

Colin

---------------------------------------------------------------------
Intel Corporation (UK) Limited
Registered No. 1134945 (England)
Registered Office: Pipers Way, Swindon SN3 1RJ
VAT No: 860 2173 47

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

---------------------------------------------------------------------
Intel Corporation (UK) Limited
Registered No. 1134945 (England)
Registered Office: Pipers Way, Swindon SN3 1RJ
VAT No: 860 2173 47

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

---------------------------------------------------------------------
Intel Corporation (UK) Limited
Registered No. 1134945 (England)
Registered Office: Pipers Way, Swindon SN3 1RJ
VAT No: 860 2173 47

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


Re: Need advice debugging "No route to host" failures

Oganezov, Alexander A
 

Hi Kevan,

 

In our scenarios/cases so far we do not see any error before “No route to host”, so it sounds like you might be hitting a different issue there.

Can you provide more information as to what your clients are and which params are being used?

 

Based on your description, yes, it does not look like FI_UNIVERSE_SIZE would make any difference there given the default size of 256 which you do not cross in your case.

 

Also, to verify, when you see those errors you also have the Xuezhaos patch, from below, correct?

 

As to allocation failures – we have not had any reports of that yet using latest daos (which includes xuezhaos patch).

 

Thanks,

~~Alex.

 

 

From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of Kevan Rehm
Sent: Friday, April 17, 2020 12:27 PM
To: daos@daos.groups.io
Subject: Re: [daos] Need advice debugging "No route to host" failures

 

Alex,

 

I see “No route to host” a lot, including right at this moment, with only 36 clients talking to 2 daos_io_servers.   Your FI_UNIVERSE_SIZE variable probably wouldn’t have an effect in a config this small, correct?

 

The way this always seems to happen is that some other libfabric error occurs first,  e.g. see below, which causes libfabric to close the connection with the server and remove the server’s address from the libfabric AV table.  Mercury doesn’t give up when that happens, it keeps retrying when the retry cannot possibly succeed because the client no longer has the server’s address.   

 

I have been meaning to track down this issue, but other issues keep getting in my way.   I know there’s a spot in libfabric where it shuts down the connection, but then returns FI_EAGAIN, and that is the beginning of at least one of the “No route to host” loops in Mercury.   I am chasing another one at the moment that looks a little different, see below

 

I think we need some code in Mercury to detect when the connection to the server has been shut down, and cancel the affected requests immediately, so that the logs don’t clutter with extraneous downstream errors.

 

This current failure is actually a memory allocation failure in libfabric.   You can see from the bolded messages that if Mercury would retry this request or any other request, it is guaranteed to get “No route to host”.  Does this look familiar to you?   If not, I will start inserting debug messages into libfabric.

 

Thanks for keeping us in the loop,

 

Kevan

 

libfabric:140355:verbs:core:ofi_check_ep_type():624<info> Supported: FI_EP_DGRAM

libfabric:140355:verbs:core:ofi_check_ep_type():624<info> Requested: FI_EP_MSG

libfabric:140355:core:mr:ofi_intercept_mmap():368<debug> intercepted mmap start (nil) len 4194304

libfabric:140355:core:core:ofi_bufpool_grow():87<debug> Allocation failed: Cannot allocate memory

libfabric:140355:verbs:domain:vrb_ep_close():383<info> EP 0x3ed0520 is being closed

libfabric:140355:ofi_rxm:ep_ctrl:rxm_cmap_del_handle():159<debug> marking connection handle: 0x3e18c80 for deletion

libfabric:140355:ofi_rxm:ep_ctrl:rxm_cmap_del_handle():162<debug> [CM] handle: 0x3e18c80 RXM_CMAP_IDLE -> RXM_CMAP_SHUTDOWN

04/17-14:02:59.70 delphi-004 DAOS[140355/140355] hg   ERR  # NA -- Error -- /delphi/krehm/daos/_build.external/mercury/src/na/na_ofi.c:3877

 # na_ofi_msg_send_unexpected(): fi_tsend() unexpected failed, rc: -12 (Cannot allocate memory)

04/17-14:02:59.70 delphi-004 DAOS[140355/140355] hg   ERR  # HG -- Error -- /delphi/krehm/daos/_build.external/mercury/src/mercury_core.c:1985

 # hg_core_forward_na(): Could not post send for input buffer (NA_PROTOCOL_ERROR)

04/17-14:02:59.70 delphi-004 DAOS[140355/140355] hg   DBUG # NA -- Debug -- /delphi/krehm/daos/_build.external/mercury/src/na/na_ofi.c:4626

 # na_ofi_cancel(): Canceling operation ID 0x1d92ca0

04/17-14:02:59.70 delphi-004 DAOS[140355/140355] hg   DBUG # NA -- Debug -- /delphi/krehm/daos/_build.external/mercury/src/na/na_ofi.c:4667

 # na_ofi_cancel(): fi_cancel() rc: 0 (Success)

04/17-14:02:59.70 delphi-004 DAOS[140355/140355] hg   ERR  # HG -- Error -- /delphi/krehm/daos/_build.external/mercury/src/mercury_core.c:4393

 # HG_Core_forward(): Could not forward buffer

04/17-14:02:59.70 delphi-004 DAOS[140355/140355] hg   ERR  # HG -- Error -- /delphi/krehm/daos/_build.external/mercury/src/mercury.c:1919

 # HG_Forward(): Could not forward call (HG_PROTOCOL_ERROR)

04/17-14:02:59.70 delphi-004 DAOS[140355/140355] hg   ERR  src/cart/src/cart/crt_hg.c:1208 crt_hg_req_send(0x3ed0040)  [opc=0x4010001 xid=0x5c rank:tag=0:6] HG_Forward failed, hg_ret: 11

04/17-14:02:59.70 delphi-004 DAOS[140355/140355] hg   DBUG src/cart/src/cart/crt_hg.c:1230 crt_hg_req_send(0x3ed0040)  [opc=0x4010001 xid=0x5c rank:tag=0:6] decref to 5.

04/17-14:02:59.70 delphi-004 DAOS[140355/140355] rpc  ERR  src/cart/src/cart/crt_rpc.c:1034 crt_req_send_immediately(0x3ed0040)  [opc=0x4010001 xid=0x5c rank:tag=0:6] crt_hg_req_send failed, rc: -1020

04/17-14:02:59.70 delphi-004 DAOS[140355/140355] rpc  ERR  src/cart/src/cart/crt_rpc.c:517 crt_req_hg_addr_lookup_cb(0x3ed0040)  [opc=0x4010001 xid=0x5c rank:tag=0:6] crt_req_send_internal() failed, rc -1020

04/17-14:02:59.70 delphi-004 DAOS[140355/140355] rpc  ERR  src/cart/src/cart/crt_context.c:302 crt_rpc_complete(0x3ed0040)  [opc=0x4010001 xid=0x5c rank:tag=0:6] RPC failed; rc: -1020

04/17-14:02:59.70 delphi-004 DAOS[140355/140355] object ERR  src/object/cli_shard.c:263 dc_rw_cb() RPC 1 failed: -1020

 

 

From: <daos@daos.groups.io> on behalf of "Oganezov, Alexander A" <alexander.a.oganezov@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Friday, April 17, 2020 at 10:44 AM
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: Re: [daos] Need advice debugging "No route to host" failures

 

Hi Kevan & others,

 

As we’ve been debugging few other “no route to host” failures here, one thing that also turns out needs to be set on large-scale systems is FI_UNIVERSE_SIZE envariable, which in RXM-based providers (verbs included) dictates internal size of address vector table for peers.

 

With Xuezhaos patch below (already part of latest 0.9 DAOS) and FI_UNIVERSE_SIZE=2048  setting we were able to avoid “No route to host” issues on all known failures so far.

 

This issue is also being tracked in libfabric as https://github.com/ofiwg/libfabric/issues/5843

 

Thanks,

~~Alex.

 

 

From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of Liu, Xuezhao
Sent: Thursday, April 9, 2020 7:24 AM
To: daos@daos.groups.io
Subject: Re: [daos] Need advice debugging "No route to host" failures

 

Hi Kevan,

 

Great, thanks for verifying and the helpful logs.

 

On the server side, what should be the proper behavior if a client tries to connect twice using the same IPoIB port?   Should the server reject the second connection attempt?   Close the first connection and then open the second?

Server should not reject the 2nd connection attempt, the current handling (with the patch) is it will check its local cache (a hash-table) and reuse it if possible.

 

Xuezhao

 

From: <daos@daos.groups.io> on behalf of Kevan Rehm <kevan.rehm@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Thursday, April 9, 2020 at 10:19 PM
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: Re: [daos] Need advice debugging "No route to host" failures

 

Xuezhao,

 

Yes, it works, thank you very much!    I have been trying to figure that one out for several days now! 😊

 

Kevan

 

From: <daos@daos.groups.io> on behalf of "Liu, Xuezhao" <xuezhao.liu@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Thursday, April 9, 2020 at 8:34 AM
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: Re: [daos] Need advice debugging "No route to host" failures

 

Hi Kevan,

 

The log you posted is very helpful. It helped to find a potential bug in mercury code for address management, please find attached small patch.

It needs to patch mercury source code and rebuild it. Commonly I do that by:

cd daos_m_path/_build.external/mercury; git am 0001-NA-OFI-fix-a-potential-bug-in-na_ofi_addr_ht_lookup.patch; cd ../ mercury.build; make clean; make install

 

Could you please have a test to see if it can resolve the problem?

 

Thanks,

Xuezhao

 

From: <daos@daos.groups.io> on behalf of Kevan Rehm <kevan.rehm@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Thursday, April 9, 2020 at 5:50 PM
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: [daos] Need advice debugging "No route to host" failures

 

Greetings,

 

We have been unsuccessful over the last few days debugging “No route to host” failures that we are constantly seeing when running “ior -a DAOS”.  The errors only occur in the log of the daos_io_server on IB1, never on the IB0 server.   Client processes hang when this occurs.  The problem does not appear to be related to our recent verbs;ofi_rxm issues which we think are now resolved.   Any advice you could provide would be greatly appreciated.

 

Our configuration is one dual-daos_io_server node and one client node.   The libfabric transport is verbs;ofi_rxm, both nodes have two interfaces.   The server node’s PMEM is wiped and reformatted before each test run.  We use mpirun to start 16 ior client processes on the client node.  The ior program has been slightly modified so that processes with even-numbered ranks use IB0 while odd-numbered ranks use IB1.   Here is an example command:

 

mpirun --allow-run-as-root -np 16 $X_OPTS /root/ior/ior/src/ior -a DAOS --daos.pool $POOL --daos.cont $CONT --daos.svcl $SVCL -w -r -t 64M -b 1G

 

When we reduce the number of client processes, the problem occurs less frequently.   With only two processes on the client node we haven’t seen the problem.

 

What happens is that some client processes which connect to the IB1 server actually connect twice using the same source IPoIB port number.   Those that do are guaranteed to generate “No route to host” errors in the IB1 server log and then hang.   For example, the following grep of the IB1 server log shows that the connecting processes that were assigned to libfabric fi_addrs 2 and 3 are going to fail.  If you would repeat this same grep on the IB0 server log, you would never see duplicate connections:

 

# grep '<debug> av_insert ' *1.log

libfabric:18490:ofi_rxm:av:ip_av_insert_addr():609<debug> av_insert addr: fi_sockaddr_in://10.0.0.220:20000

libfabric:18490:ofi_rxm:av:ip_av_insert_addr():612<debug> av_insert fi_addr: 0

libfabric:18490:ofi_rxm:av:ip_av_insert_addr():609<debug> av_insert addr: fi_sockaddr_in://10.0.0.222:43782

libfabric:18490:ofi_rxm:av:ip_av_insert_addr():612<debug> av_insert fi_addr: 1

libfabric:18490:ofi_rxm:av:ip_av_insert_addr():609<debug> av_insert addr: fi_sockaddr_in://10.0.0.222:37665

libfabric:18490:ofi_rxm:av:ip_av_insert_addr():612<debug> av_insert fi_addr: 2

libfabric:18490:ofi_rxm:av:ip_av_insert_addr():609<debug> av_insert addr: fi_sockaddr_in://10.0.0.222:37665

libfabric:18490:ofi_rxm:av:ip_av_insert_addr():612<debug> av_insert fi_addr: 2

libfabric:18490:ofi_rxm:av:ip_av_insert_addr():609<debug> av_insert addr: fi_sockaddr_in://10.0.1.222:48826

libfabric:18490:ofi_rxm:av:ip_av_insert_addr():612<debug> av_insert fi_addr: 3

libfabric:18490:ofi_rxm:av:ip_av_insert_addr():609<debug> av_insert addr: fi_sockaddr_in://10.0.1.222:48826

libfabric:18490:ofi_rxm:av:ip_av_insert_addr():612<debug> av_insert fi_addr: 3

libfabric:18490:ofi_rxm:av:ip_av_insert_addr():609<debug> av_insert addr: fi_sockaddr_in://10.0.0.222:39600

libfabric:18490:ofi_rxm:av:ip_av_insert_addr():612<debug> av_insert fi_addr: 4

libfabric:18490:ofi_rxm:av:ip_av_insert_addr():609<debug> av_insert addr: fi_sockaddr_in://10.0.0.222:56137

libfabric:18490:ofi_rxm:av:ip_av_insert_addr():612<debug> av_insert fi_addr: 5

 

Here are the initial failure messages for fi_addr 2 in the IB1 server log.   The messages with “krehm” in them were added by me for debugging purposes:

 

04/08-19:18:08.43 delphi-002 DAOS[18490/18530] hg   DBUG # NA -- Debug -- /root/daos/_build.external/mercury/src/na/na_ofi.c:4442

 # na_ofi_get(): Posting RMA get (op id=0x7fecab64eba0, fi_addr=2, uri=(null))

libfabric:18490:ofi_rxm:ep_ctrl:rxm_ep_prepare_tx():851<warn> krehm: rxm_cmap_acquire_handle() returned NULL, fi_addr: 2

libfabric:18490:ofi_rxm:ep_data:rxm_ep_rma_common():80<warn> krehm: rxm_ep_prepare_tx() returned -113, fi_addr: 2

04/08-19:18:08.43 delphi-002 DAOS[18490/18614] hg   ERR  # NA -- Error -- /root/daos/_build.external/mercury/src/na/na_ofi.c:4462

 # na_ofi_get(): fi_readmsg() failed, rc: -113 (No route to host)

04/08-19:18:08.43 delphi-002 DAOS[18490/18554] hg   DBUG # NA -- Debug -- /root/daos/_build.external/mercury/src/na/na_ofi.c:4442

 # na_ofi_get(): Posting RMA get (op id=0x7fec1b64e820, fi_addr=1, uri=(null))

libfabric:18490:core:mr:ofi_intercept_mmap():368<debug> intercepted mmap start (nil) len 4194304

04/08-19:18:08.43 delphi-002 DAOS[18490/18614] hg   ERR  # HG -- Error -- /root/daos/_build.external/mercury/src/mercury_bulk.c:793

 # hg_bulk_transfer_pieces(): Could not transfer data (NA_PROTOCOL_ERROR)

04/08-19:18:08.43 delphi-002 DAOS[18490/18614] hg   ERR  # HG -- Error -- /root/daos/_build.external/mercury/src/mercury_bulk.c:952

 # hg_bulk_transfer(): Could not transfer data pieces

04/08-19:18:08.43 delphi-002 DAOS[18490/18614] hg   ERR  # HG -- Error -- /root/daos/_build.external/mercury/src/mercury_bulk.c:1793

 # HG_Bulk_transfer_id(): Could not start transfer of bulk data

 

My suspicion is that the server’s cleanup of the first connection from fi_addr 2 throws away the libfabric address of that client, so then when the server tries to initiate a RMA Read request using the second client connection, it has no libfabric address for the client and fails.

 

The failure sequence for fi_addr 3 is identical:

 

04/08-19:18:08.43 delphi-002 DAOS[18490/18614] hg   DBUG # NA -- Debug -- /root/daos/_build.external/mercury/src/na/na_ofi.c:4442

 # na_ofi_get(): Posting RMA get (op id=0x7feab365f550, fi_addr=3, uri=(null))

libfabric:18490:ofi_rxm:ep_ctrl:rxm_ep_prepare_tx():851<warn> krehm: rxm_cmap_acquire_handle() returned NULL, fi_addr: 3

libfabric:18490:ofi_rxm:ep_data:rxm_ep_rma_common():80<warn> krehm: rxm_ep_prepare_tx() returned -113, fi_addr: 3

04/08-19:18:08.43 delphi-002 DAOS[18490/18554] hg   DBUG # NA -- Debug -- /root/daos/_build.external/mercury/src/na/na_ofi.c:2707

 # na_ofi_cq_process_recv_unexpected_event(): unexpected recv msg completion event with tag=140657172791930, len=4294967364

04/08-19:18:08.43 delphi-002 DAOS[18490/18578] object ERR  src/object/srv_obj.c:384 obj_bulk_transfer() crt_bulk_transfer 0 error (-1020).

04/08-19:18:08.43 delphi-002 DAOS[18490/18578] object ERR  src/object/srv_obj.c:1062 obj_local_rw() 1153784440846024704.5704104871060330004.13 data transfer failed, dma 1 rc DER_HG(-1020)

04/08-19:18:08.43 delphi-002 DAOS[18490/18614] hg   ERR  # NA -- Error -- /root/daos/_build.external/mercury/src/na/na_ofi.c:4462

 # na_ofi_get(): fi_readmsg() failed, rc: -113 (No route to host)

04/08-19:18:08.43 delphi-002 DAOS[18490/18614] hg   ERR  # HG -- Error -- /root/daos/_build.external/mercury/src/mercury_bulk.c:793

 # hg_bulk_transfer_pieces(): Could not transfer data (NA_PROTOCOL_ERROR)

04/08-19:18:08.43 delphi-002 DAOS[18490/18614] hg   ERR  # HG -- Error -- /root/daos/_build.external/mercury/src/mercury_bulk.c:952

 # hg_bulk_transfer(): Could not transfer data pieces

04/08-19:18:08.43 delphi-002 DAOS[18490/18614] hg   ERR  # HG -- Error -- /root/daos/_build.external/mercury/src/mercury_bulk.c:1793

 # HG_Bulk_transfer_id(): Could not start transfer of bulk data

 

Could someone suggest ways that we might narrow down the problem?

 

Thanks, Kevan


Re: Need advice debugging "No route to host" failures

Farrell, Patrick Arthur <patrick.farrell@...>
 

Note, if it helps, that this ENOMEM is strongly connected to the number of processes and does not seem to be connected to, eg, transfer size.  More processes makes this much more likely - we start seeing around 30-36 processes on a node.

-Patrick

From: daos@daos.groups.io <daos@daos.groups.io> on behalf of Kevan Rehm <kevan.rehm@...>
Sent: Friday, April 17, 2020 2:27 PM
To: daos@daos.groups.io <daos@daos.groups.io>
Subject: Re: [daos] Need advice debugging "No route to host" failures
 

Alex,

 

I see “No route to host” a lot, including right at this moment, with only 36 clients talking to 2 daos_io_servers.   Your FI_UNIVERSE_SIZE variable probably wouldn’t have an effect in a config this small, correct?

 

The way this always seems to happen is that some other libfabric error occurs first,  e.g. see below, which causes libfabric to close the connection with the server and remove the server’s address from the libfabric AV table.  Mercury doesn’t give up when that happens, it keeps retrying when the retry cannot possibly succeed because the client no longer has the server’s address.   

 

I have been meaning to track down this issue, but other issues keep getting in my way.   I know there’s a spot in libfabric where it shuts down the connection, but then returns FI_EAGAIN, and that is the beginning of at least one of the “No route to host” loops in Mercury.   I am chasing another one at the moment that looks a little different, see below

 

I think we need some code in Mercury to detect when the connection to the server has been shut down, and cancel the affected requests immediately, so that the logs don’t clutter with extraneous downstream errors.

 

This current failure is actually a memory allocation failure in libfabric.   You can see from the bolded messages that if Mercury would retry this request or any other request, it is guaranteed to get “No route to host”.  Does this look familiar to you?   If not, I will start inserting debug messages into libfabric.

 

Thanks for keeping us in the loop,

 

Kevan

 

libfabric:140355:verbs:core:ofi_check_ep_type():624<info> Supported: FI_EP_DGRAM

libfabric:140355:verbs:core:ofi_check_ep_type():624<info> Requested: FI_EP_MSG

libfabric:140355:core:mr:ofi_intercept_mmap():368<debug> intercepted mmap start (nil) len 4194304

libfabric:140355:core:core:ofi_bufpool_grow():87<debug> Allocation failed: Cannot allocate memory

libfabric:140355:verbs:domain:vrb_ep_close():383<info> EP 0x3ed0520 is being closed

libfabric:140355:ofi_rxm:ep_ctrl:rxm_cmap_del_handle():159<debug> marking connection handle: 0x3e18c80 for deletion

libfabric:140355:ofi_rxm:ep_ctrl:rxm_cmap_del_handle():162<debug> [CM] handle: 0x3e18c80 RXM_CMAP_IDLE -> RXM_CMAP_SHUTDOWN

04/17-14:02:59.70 delphi-004 DAOS[140355/140355] hg   ERR  # NA -- Error -- /delphi/krehm/daos/_build.external/mercury/src/na/na_ofi.c:3877

 # na_ofi_msg_send_unexpected(): fi_tsend() unexpected failed, rc: -12 (Cannot allocate memory)

04/17-14:02:59.70 delphi-004 DAOS[140355/140355] hg   ERR  # HG -- Error -- /delphi/krehm/daos/_build.external/mercury/src/mercury_core.c:1985

 # hg_core_forward_na(): Could not post send for input buffer (NA_PROTOCOL_ERROR)

04/17-14:02:59.70 delphi-004 DAOS[140355/140355] hg   DBUG # NA -- Debug -- /delphi/krehm/daos/_build.external/mercury/src/na/na_ofi.c:4626

 # na_ofi_cancel(): Canceling operation ID 0x1d92ca0

04/17-14:02:59.70 delphi-004 DAOS[140355/140355] hg   DBUG # NA -- Debug -- /delphi/krehm/daos/_build.external/mercury/src/na/na_ofi.c:4667

 # na_ofi_cancel(): fi_cancel() rc: 0 (Success)

04/17-14:02:59.70 delphi-004 DAOS[140355/140355] hg   ERR  # HG -- Error -- /delphi/krehm/daos/_build.external/mercury/src/mercury_core.c:4393

 # HG_Core_forward(): Could not forward buffer

04/17-14:02:59.70 delphi-004 DAOS[140355/140355] hg   ERR  # HG -- Error -- /delphi/krehm/daos/_build.external/mercury/src/mercury.c:1919

 # HG_Forward(): Could not forward call (HG_PROTOCOL_ERROR)

04/17-14:02:59.70 delphi-004 DAOS[140355/140355] hg   ERR  src/cart/src/cart/crt_hg.c:1208 crt_hg_req_send(0x3ed0040)  [opc=0x4010001 xid=0x5c rank:tag=0:6] HG_Forward failed, hg_ret: 11

04/17-14:02:59.70 delphi-004 DAOS[140355/140355] hg   DBUG src/cart/src/cart/crt_hg.c:1230 crt_hg_req_send(0x3ed0040)  [opc=0x4010001 xid=0x5c rank:tag=0:6] decref to 5.

04/17-14:02:59.70 delphi-004 DAOS[140355/140355] rpc  ERR  src/cart/src/cart/crt_rpc.c:1034 crt_req_send_immediately(0x3ed0040)  [opc=0x4010001 xid=0x5c rank:tag=0:6] crt_hg_req_send failed, rc: -1020

04/17-14:02:59.70 delphi-004 DAOS[140355/140355] rpc  ERR  src/cart/src/cart/crt_rpc.c:517 crt_req_hg_addr_lookup_cb(0x3ed0040)  [opc=0x4010001 xid=0x5c rank:tag=0:6] crt_req_send_internal() failed, rc -1020

04/17-14:02:59.70 delphi-004 DAOS[140355/140355] rpc  ERR  src/cart/src/cart/crt_context.c:302 crt_rpc_complete(0x3ed0040)  [opc=0x4010001 xid=0x5c rank:tag=0:6] RPC failed; rc: -1020

04/17-14:02:59.70 delphi-004 DAOS[140355/140355] object ERR  src/object/cli_shard.c:263 dc_rw_cb() RPC 1 failed: -1020

 

 

From: <daos@daos.groups.io> on behalf of "Oganezov, Alexander A" <alexander.a.oganezov@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Friday, April 17, 2020 at 10:44 AM
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: Re: [daos] Need advice debugging "No route to host" failures

 

Hi Kevan & others,

 

As we’ve been debugging few other “no route to host” failures here, one thing that also turns out needs to be set on large-scale systems is FI_UNIVERSE_SIZE envariable, which in RXM-based providers (verbs included) dictates internal size of address vector table for peers.

 

With Xuezhaos patch below (already part of latest 0.9 DAOS) and FI_UNIVERSE_SIZE=2048  setting we were able to avoid “No route to host” issues on all known failures so far.

 

This issue is also being tracked in libfabric as https://github.com/ofiwg/libfabric/issues/5843

 

Thanks,

~~Alex.

 

 

From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of Liu, Xuezhao
Sent: Thursday, April 9, 2020 7:24 AM
To: daos@daos.groups.io
Subject: Re: [daos] Need advice debugging "No route to host" failures

 

Hi Kevan,

 

Great, thanks for verifying and the helpful logs.

 

On the server side, what should be the proper behavior if a client tries to connect twice using the same IPoIB port?   Should the server reject the second connection attempt?   Close the first connection and then open the second?

Server should not reject the 2nd connection attempt, the current handling (with the patch) is it will check its local cache (a hash-table) and reuse it if possible.

 

Xuezhao

 

From: <daos@daos.groups.io> on behalf of Kevan Rehm <kevan.rehm@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Thursday, April 9, 2020 at 10:19 PM
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: Re: [daos] Need advice debugging "No route to host" failures

 

Xuezhao,

 

Yes, it works, thank you very much!    I have been trying to figure that one out for several days now! 😊

 

Kevan

 

From: <daos@daos.groups.io> on behalf of "Liu, Xuezhao" <xuezhao.liu@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Thursday, April 9, 2020 at 8:34 AM
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: Re: [daos] Need advice debugging "No route to host" failures

 

Hi Kevan,

 

The log you posted is very helpful. It helped to find a potential bug in mercury code for address management, please find attached small patch.

It needs to patch mercury source code and rebuild it. Commonly I do that by:

cd daos_m_path/_build.external/mercury; git am 0001-NA-OFI-fix-a-potential-bug-in-na_ofi_addr_ht_lookup.patch; cd ../ mercury.build; make clean; make install

 

Could you please have a test to see if it can resolve the problem?

 

Thanks,

Xuezhao

 

From: <daos@daos.groups.io> on behalf of Kevan Rehm <kevan.rehm@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Thursday, April 9, 2020 at 5:50 PM
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: [daos] Need advice debugging "No route to host" failures

 

Greetings,

 

We have been unsuccessful over the last few days debugging “No route to host” failures that we are constantly seeing when running “ior -a DAOS”.  The errors only occur in the log of the daos_io_server on IB1, never on the IB0 server.   Client processes hang when this occurs.  The problem does not appear to be related to our recent verbs;ofi_rxm issues which we think are now resolved.   Any advice you could provide would be greatly appreciated.

 

Our configuration is one dual-daos_io_server node and one client node.   The libfabric transport is verbs;ofi_rxm, both nodes have two interfaces.   The server node’s PMEM is wiped and reformatted before each test run.  We use mpirun to start 16 ior client processes on the client node.  The ior program has been slightly modified so that processes with even-numbered ranks use IB0 while odd-numbered ranks use IB1.   Here is an example command:

 

mpirun --allow-run-as-root -np 16 $X_OPTS /root/ior/ior/src/ior -a DAOS --daos.pool $POOL --daos.cont $CONT --daos.svcl $SVCL -w -r -t 64M -b 1G

 

When we reduce the number of client processes, the problem occurs less frequently.   With only two processes on the client node we haven’t seen the problem.

 

What happens is that some client processes which connect to the IB1 server actually connect twice using the same source IPoIB port number.   Those that do are guaranteed to generate “No route to host” errors in the IB1 server log and then hang.   For example, the following grep of the IB1 server log shows that the connecting processes that were assigned to libfabric fi_addrs 2 and 3 are going to fail.  If you would repeat this same grep on the IB0 server log, you would never see duplicate connections:

 

# grep '<debug> av_insert ' *1.log

libfabric:18490:ofi_rxm:av:ip_av_insert_addr():609<debug> av_insert addr: fi_sockaddr_in://10.0.0.220:20000

libfabric:18490:ofi_rxm:av:ip_av_insert_addr():612<debug> av_insert fi_addr: 0

libfabric:18490:ofi_rxm:av:ip_av_insert_addr():609<debug> av_insert addr: fi_sockaddr_in://10.0.0.222:43782

libfabric:18490:ofi_rxm:av:ip_av_insert_addr():612<debug> av_insert fi_addr: 1

libfabric:18490:ofi_rxm:av:ip_av_insert_addr():609<debug> av_insert addr: fi_sockaddr_in://10.0.0.222:37665

libfabric:18490:ofi_rxm:av:ip_av_insert_addr():612<debug> av_insert fi_addr: 2

libfabric:18490:ofi_rxm:av:ip_av_insert_addr():609<debug> av_insert addr: fi_sockaddr_in://10.0.0.222:37665

libfabric:18490:ofi_rxm:av:ip_av_insert_addr():612<debug> av_insert fi_addr: 2

libfabric:18490:ofi_rxm:av:ip_av_insert_addr():609<debug> av_insert addr: fi_sockaddr_in://10.0.1.222:48826

libfabric:18490:ofi_rxm:av:ip_av_insert_addr():612<debug> av_insert fi_addr: 3

libfabric:18490:ofi_rxm:av:ip_av_insert_addr():609<debug> av_insert addr: fi_sockaddr_in://10.0.1.222:48826

libfabric:18490:ofi_rxm:av:ip_av_insert_addr():612<debug> av_insert fi_addr: 3

libfabric:18490:ofi_rxm:av:ip_av_insert_addr():609<debug> av_insert addr: fi_sockaddr_in://10.0.0.222:39600

libfabric:18490:ofi_rxm:av:ip_av_insert_addr():612<debug> av_insert fi_addr: 4

libfabric:18490:ofi_rxm:av:ip_av_insert_addr():609<debug> av_insert addr: fi_sockaddr_in://10.0.0.222:56137

libfabric:18490:ofi_rxm:av:ip_av_insert_addr():612<debug> av_insert fi_addr: 5

 

Here are the initial failure messages for fi_addr 2 in the IB1 server log.   The messages with “krehm” in them were added by me for debugging purposes:

 

04/08-19:18:08.43 delphi-002 DAOS[18490/18530] hg   DBUG # NA -- Debug -- /root/daos/_build.external/mercury/src/na/na_ofi.c:4442

 # na_ofi_get(): Posting RMA get (op id=0x7fecab64eba0, fi_addr=2, uri=(null))

libfabric:18490:ofi_rxm:ep_ctrl:rxm_ep_prepare_tx():851<warn> krehm: rxm_cmap_acquire_handle() returned NULL, fi_addr: 2

libfabric:18490:ofi_rxm:ep_data:rxm_ep_rma_common():80<warn> krehm: rxm_ep_prepare_tx() returned -113, fi_addr: 2

04/08-19:18:08.43 delphi-002 DAOS[18490/18614] hg   ERR  # NA -- Error -- /root/daos/_build.external/mercury/src/na/na_ofi.c:4462

 # na_ofi_get(): fi_readmsg() failed, rc: -113 (No route to host)

04/08-19:18:08.43 delphi-002 DAOS[18490/18554] hg   DBUG # NA -- Debug -- /root/daos/_build.external/mercury/src/na/na_ofi.c:4442

 # na_ofi_get(): Posting RMA get (op id=0x7fec1b64e820, fi_addr=1, uri=(null))

libfabric:18490:core:mr:ofi_intercept_mmap():368<debug> intercepted mmap start (nil) len 4194304

04/08-19:18:08.43 delphi-002 DAOS[18490/18614] hg   ERR  # HG -- Error -- /root/daos/_build.external/mercury/src/mercury_bulk.c:793

 # hg_bulk_transfer_pieces(): Could not transfer data (NA_PROTOCOL_ERROR)

04/08-19:18:08.43 delphi-002 DAOS[18490/18614] hg   ERR  # HG -- Error -- /root/daos/_build.external/mercury/src/mercury_bulk.c:952

 # hg_bulk_transfer(): Could not transfer data pieces

04/08-19:18:08.43 delphi-002 DAOS[18490/18614] hg   ERR  # HG -- Error -- /root/daos/_build.external/mercury/src/mercury_bulk.c:1793

 # HG_Bulk_transfer_id(): Could not start transfer of bulk data

 

My suspicion is that the server’s cleanup of the first connection from fi_addr 2 throws away the libfabric address of that client, so then when the server tries to initiate a RMA Read request using the second client connection, it has no libfabric address for the client and fails.

 

The failure sequence for fi_addr 3 is identical:

 

04/08-19:18:08.43 delphi-002 DAOS[18490/18614] hg   DBUG # NA -- Debug -- /root/daos/_build.external/mercury/src/na/na_ofi.c:4442

 # na_ofi_get(): Posting RMA get (op id=0x7feab365f550, fi_addr=3, uri=(null))

libfabric:18490:ofi_rxm:ep_ctrl:rxm_ep_prepare_tx():851<warn> krehm: rxm_cmap_acquire_handle() returned NULL, fi_addr: 3

libfabric:18490:ofi_rxm:ep_data:rxm_ep_rma_common():80<warn> krehm: rxm_ep_prepare_tx() returned -113, fi_addr: 3

04/08-19:18:08.43 delphi-002 DAOS[18490/18554] hg   DBUG # NA -- Debug -- /root/daos/_build.external/mercury/src/na/na_ofi.c:2707

 # na_ofi_cq_process_recv_unexpected_event(): unexpected recv msg completion event with tag=140657172791930, len=4294967364

04/08-19:18:08.43 delphi-002 DAOS[18490/18578] object ERR  src/object/srv_obj.c:384 obj_bulk_transfer() crt_bulk_transfer 0 error (-1020).

04/08-19:18:08.43 delphi-002 DAOS[18490/18578] object ERR  src/object/srv_obj.c:1062 obj_local_rw() 1153784440846024704.5704104871060330004.13 data transfer failed, dma 1 rc DER_HG(-1020)

04/08-19:18:08.43 delphi-002 DAOS[18490/18614] hg   ERR  # NA -- Error -- /root/daos/_build.external/mercury/src/na/na_ofi.c:4462

 # na_ofi_get(): fi_readmsg() failed, rc: -113 (No route to host)

04/08-19:18:08.43 delphi-002 DAOS[18490/18614] hg   ERR  # HG -- Error -- /root/daos/_build.external/mercury/src/mercury_bulk.c:793

 # hg_bulk_transfer_pieces(): Could not transfer data (NA_PROTOCOL_ERROR)

04/08-19:18:08.43 delphi-002 DAOS[18490/18614] hg   ERR  # HG -- Error -- /root/daos/_build.external/mercury/src/mercury_bulk.c:952

 # hg_bulk_transfer(): Could not transfer data pieces

04/08-19:18:08.43 delphi-002 DAOS[18490/18614] hg   ERR  # HG -- Error -- /root/daos/_build.external/mercury/src/mercury_bulk.c:1793

 # HG_Bulk_transfer_id(): Could not start transfer of bulk data

 

Could someone suggest ways that we might narrow down the problem?

 

Thanks, Kevan


Re: Need advice debugging "No route to host" failures

Kevan Rehm
 

Alex,

 

I see “No route to host” a lot, including right at this moment, with only 36 clients talking to 2 daos_io_servers.   Your FI_UNIVERSE_SIZE variable probably wouldn’t have an effect in a config this small, correct?

 

The way this always seems to happen is that some other libfabric error occurs first,  e.g. see below, which causes libfabric to close the connection with the server and remove the server’s address from the libfabric AV table.  Mercury doesn’t give up when that happens, it keeps retrying when the retry cannot possibly succeed because the client no longer has the server’s address.   

 

I have been meaning to track down this issue, but other issues keep getting in my way.   I know there’s a spot in libfabric where it shuts down the connection, but then returns FI_EAGAIN, and that is the beginning of at least one of the “No route to host” loops in Mercury.   I am chasing another one at the moment that looks a little different, see below

 

I think we need some code in Mercury to detect when the connection to the server has been shut down, and cancel the affected requests immediately, so that the logs don’t clutter with extraneous downstream errors.

 

This current failure is actually a memory allocation failure in libfabric.   You can see from the bolded messages that if Mercury would retry this request or any other request, it is guaranteed to get “No route to host”.  Does this look familiar to you?   If not, I will start inserting debug messages into libfabric.

 

Thanks for keeping us in the loop,

 

Kevan

 

libfabric:140355:verbs:core:ofi_check_ep_type():624<info> Supported: FI_EP_DGRAM

libfabric:140355:verbs:core:ofi_check_ep_type():624<info> Requested: FI_EP_MSG

libfabric:140355:core:mr:ofi_intercept_mmap():368<debug> intercepted mmap start (nil) len 4194304

libfabric:140355:core:core:ofi_bufpool_grow():87<debug> Allocation failed: Cannot allocate memory

libfabric:140355:verbs:domain:vrb_ep_close():383<info> EP 0x3ed0520 is being closed

libfabric:140355:ofi_rxm:ep_ctrl:rxm_cmap_del_handle():159<debug> marking connection handle: 0x3e18c80 for deletion

libfabric:140355:ofi_rxm:ep_ctrl:rxm_cmap_del_handle():162<debug> [CM] handle: 0x3e18c80 RXM_CMAP_IDLE -> RXM_CMAP_SHUTDOWN

04/17-14:02:59.70 delphi-004 DAOS[140355/140355] hg   ERR  # NA -- Error -- /delphi/krehm/daos/_build.external/mercury/src/na/na_ofi.c:3877

 # na_ofi_msg_send_unexpected(): fi_tsend() unexpected failed, rc: -12 (Cannot allocate memory)

04/17-14:02:59.70 delphi-004 DAOS[140355/140355] hg   ERR  # HG -- Error -- /delphi/krehm/daos/_build.external/mercury/src/mercury_core.c:1985

 # hg_core_forward_na(): Could not post send for input buffer (NA_PROTOCOL_ERROR)

04/17-14:02:59.70 delphi-004 DAOS[140355/140355] hg   DBUG # NA -- Debug -- /delphi/krehm/daos/_build.external/mercury/src/na/na_ofi.c:4626

 # na_ofi_cancel(): Canceling operation ID 0x1d92ca0

04/17-14:02:59.70 delphi-004 DAOS[140355/140355] hg   DBUG # NA -- Debug -- /delphi/krehm/daos/_build.external/mercury/src/na/na_ofi.c:4667

 # na_ofi_cancel(): fi_cancel() rc: 0 (Success)

04/17-14:02:59.70 delphi-004 DAOS[140355/140355] hg   ERR  # HG -- Error -- /delphi/krehm/daos/_build.external/mercury/src/mercury_core.c:4393

 # HG_Core_forward(): Could not forward buffer

04/17-14:02:59.70 delphi-004 DAOS[140355/140355] hg   ERR  # HG -- Error -- /delphi/krehm/daos/_build.external/mercury/src/mercury.c:1919

 # HG_Forward(): Could not forward call (HG_PROTOCOL_ERROR)

04/17-14:02:59.70 delphi-004 DAOS[140355/140355] hg   ERR  src/cart/src/cart/crt_hg.c:1208 crt_hg_req_send(0x3ed0040)  [opc=0x4010001 xid=0x5c rank:tag=0:6] HG_Forward failed, hg_ret: 11

04/17-14:02:59.70 delphi-004 DAOS[140355/140355] hg   DBUG src/cart/src/cart/crt_hg.c:1230 crt_hg_req_send(0x3ed0040)  [opc=0x4010001 xid=0x5c rank:tag=0:6] decref to 5.

04/17-14:02:59.70 delphi-004 DAOS[140355/140355] rpc  ERR  src/cart/src/cart/crt_rpc.c:1034 crt_req_send_immediately(0x3ed0040)  [opc=0x4010001 xid=0x5c rank:tag=0:6] crt_hg_req_send failed, rc: -1020

04/17-14:02:59.70 delphi-004 DAOS[140355/140355] rpc  ERR  src/cart/src/cart/crt_rpc.c:517 crt_req_hg_addr_lookup_cb(0x3ed0040)  [opc=0x4010001 xid=0x5c rank:tag=0:6] crt_req_send_internal() failed, rc -1020

04/17-14:02:59.70 delphi-004 DAOS[140355/140355] rpc  ERR  src/cart/src/cart/crt_context.c:302 crt_rpc_complete(0x3ed0040)  [opc=0x4010001 xid=0x5c rank:tag=0:6] RPC failed; rc: -1020

04/17-14:02:59.70 delphi-004 DAOS[140355/140355] object ERR  src/object/cli_shard.c:263 dc_rw_cb() RPC 1 failed: -1020

 

 

From: <daos@daos.groups.io> on behalf of "Oganezov, Alexander A" <alexander.a.oganezov@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Friday, April 17, 2020 at 10:44 AM
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: Re: [daos] Need advice debugging "No route to host" failures

 

Hi Kevan & others,

 

As we’ve been debugging few other “no route to host” failures here, one thing that also turns out needs to be set on large-scale systems is FI_UNIVERSE_SIZE envariable, which in RXM-based providers (verbs included) dictates internal size of address vector table for peers.

 

With Xuezhaos patch below (already part of latest 0.9 DAOS) and FI_UNIVERSE_SIZE=2048  setting we were able to avoid “No route to host” issues on all known failures so far.

 

This issue is also being tracked in libfabric as https://github.com/ofiwg/libfabric/issues/5843

 

Thanks,

~~Alex.

 

 

From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of Liu, Xuezhao
Sent: Thursday, April 9, 2020 7:24 AM
To: daos@daos.groups.io
Subject: Re: [daos] Need advice debugging "No route to host" failures

 

Hi Kevan,

 

Great, thanks for verifying and the helpful logs.

 

On the server side, what should be the proper behavior if a client tries to connect twice using the same IPoIB port?   Should the server reject the second connection attempt?   Close the first connection and then open the second?

Server should not reject the 2nd connection attempt, the current handling (with the patch) is it will check its local cache (a hash-table) and reuse it if possible.

 

Xuezhao

 

From: <daos@daos.groups.io> on behalf of Kevan Rehm <kevan.rehm@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Thursday, April 9, 2020 at 10:19 PM
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: Re: [daos] Need advice debugging "No route to host" failures

 

Xuezhao,

 

Yes, it works, thank you very much!    I have been trying to figure that one out for several days now! 😊

 

Kevan

 

From: <daos@daos.groups.io> on behalf of "Liu, Xuezhao" <xuezhao.liu@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Thursday, April 9, 2020 at 8:34 AM
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: Re: [daos] Need advice debugging "No route to host" failures

 

Hi Kevan,

 

The log you posted is very helpful. It helped to find a potential bug in mercury code for address management, please find attached small patch.

It needs to patch mercury source code and rebuild it. Commonly I do that by:

cd daos_m_path/_build.external/mercury; git am 0001-NA-OFI-fix-a-potential-bug-in-na_ofi_addr_ht_lookup.patch; cd ../ mercury.build; make clean; make install

 

Could you please have a test to see if it can resolve the problem?

 

Thanks,

Xuezhao

 

From: <daos@daos.groups.io> on behalf of Kevan Rehm <kevan.rehm@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Thursday, April 9, 2020 at 5:50 PM
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: [daos] Need advice debugging "No route to host" failures

 

Greetings,

 

We have been unsuccessful over the last few days debugging “No route to host” failures that we are constantly seeing when running “ior -a DAOS”.  The errors only occur in the log of the daos_io_server on IB1, never on the IB0 server.   Client processes hang when this occurs.  The problem does not appear to be related to our recent verbs;ofi_rxm issues which we think are now resolved.   Any advice you could provide would be greatly appreciated.

 

Our configuration is one dual-daos_io_server node and one client node.   The libfabric transport is verbs;ofi_rxm, both nodes have two interfaces.   The server node’s PMEM is wiped and reformatted before each test run.  We use mpirun to start 16 ior client processes on the client node.  The ior program has been slightly modified so that processes with even-numbered ranks use IB0 while odd-numbered ranks use IB1.   Here is an example command:

 

mpirun --allow-run-as-root -np 16 $X_OPTS /root/ior/ior/src/ior -a DAOS --daos.pool $POOL --daos.cont $CONT --daos.svcl $SVCL -w -r -t 64M -b 1G

 

When we reduce the number of client processes, the problem occurs less frequently.   With only two processes on the client node we haven’t seen the problem.

 

What happens is that some client processes which connect to the IB1 server actually connect twice using the same source IPoIB port number.   Those that do are guaranteed to generate “No route to host” errors in the IB1 server log and then hang.   For example, the following grep of the IB1 server log shows that the connecting processes that were assigned to libfabric fi_addrs 2 and 3 are going to fail.  If you would repeat this same grep on the IB0 server log, you would never see duplicate connections:

 

# grep '<debug> av_insert ' *1.log

libfabric:18490:ofi_rxm:av:ip_av_insert_addr():609<debug> av_insert addr: fi_sockaddr_in://10.0.0.220:20000

libfabric:18490:ofi_rxm:av:ip_av_insert_addr():612<debug> av_insert fi_addr: 0

libfabric:18490:ofi_rxm:av:ip_av_insert_addr():609<debug> av_insert addr: fi_sockaddr_in://10.0.0.222:43782

libfabric:18490:ofi_rxm:av:ip_av_insert_addr():612<debug> av_insert fi_addr: 1

libfabric:18490:ofi_rxm:av:ip_av_insert_addr():609<debug> av_insert addr: fi_sockaddr_in://10.0.0.222:37665

libfabric:18490:ofi_rxm:av:ip_av_insert_addr():612<debug> av_insert fi_addr: 2

libfabric:18490:ofi_rxm:av:ip_av_insert_addr():609<debug> av_insert addr: fi_sockaddr_in://10.0.0.222:37665

libfabric:18490:ofi_rxm:av:ip_av_insert_addr():612<debug> av_insert fi_addr: 2

libfabric:18490:ofi_rxm:av:ip_av_insert_addr():609<debug> av_insert addr: fi_sockaddr_in://10.0.1.222:48826

libfabric:18490:ofi_rxm:av:ip_av_insert_addr():612<debug> av_insert fi_addr: 3

libfabric:18490:ofi_rxm:av:ip_av_insert_addr():609<debug> av_insert addr: fi_sockaddr_in://10.0.1.222:48826

libfabric:18490:ofi_rxm:av:ip_av_insert_addr():612<debug> av_insert fi_addr: 3

libfabric:18490:ofi_rxm:av:ip_av_insert_addr():609<debug> av_insert addr: fi_sockaddr_in://10.0.0.222:39600

libfabric:18490:ofi_rxm:av:ip_av_insert_addr():612<debug> av_insert fi_addr: 4

libfabric:18490:ofi_rxm:av:ip_av_insert_addr():609<debug> av_insert addr: fi_sockaddr_in://10.0.0.222:56137

libfabric:18490:ofi_rxm:av:ip_av_insert_addr():612<debug> av_insert fi_addr: 5

 

Here are the initial failure messages for fi_addr 2 in the IB1 server log.   The messages with “krehm” in them were added by me for debugging purposes:

 

04/08-19:18:08.43 delphi-002 DAOS[18490/18530] hg   DBUG # NA -- Debug -- /root/daos/_build.external/mercury/src/na/na_ofi.c:4442

 # na_ofi_get(): Posting RMA get (op id=0x7fecab64eba0, fi_addr=2, uri=(null))

libfabric:18490:ofi_rxm:ep_ctrl:rxm_ep_prepare_tx():851<warn> krehm: rxm_cmap_acquire_handle() returned NULL, fi_addr: 2

libfabric:18490:ofi_rxm:ep_data:rxm_ep_rma_common():80<warn> krehm: rxm_ep_prepare_tx() returned -113, fi_addr: 2

04/08-19:18:08.43 delphi-002 DAOS[18490/18614] hg   ERR  # NA -- Error -- /root/daos/_build.external/mercury/src/na/na_ofi.c:4462

 # na_ofi_get(): fi_readmsg() failed, rc: -113 (No route to host)

04/08-19:18:08.43 delphi-002 DAOS[18490/18554] hg   DBUG # NA -- Debug -- /root/daos/_build.external/mercury/src/na/na_ofi.c:4442

 # na_ofi_get(): Posting RMA get (op id=0x7fec1b64e820, fi_addr=1, uri=(null))

libfabric:18490:core:mr:ofi_intercept_mmap():368<debug> intercepted mmap start (nil) len 4194304

04/08-19:18:08.43 delphi-002 DAOS[18490/18614] hg   ERR  # HG -- Error -- /root/daos/_build.external/mercury/src/mercury_bulk.c:793

 # hg_bulk_transfer_pieces(): Could not transfer data (NA_PROTOCOL_ERROR)

04/08-19:18:08.43 delphi-002 DAOS[18490/18614] hg   ERR  # HG -- Error -- /root/daos/_build.external/mercury/src/mercury_bulk.c:952

 # hg_bulk_transfer(): Could not transfer data pieces

04/08-19:18:08.43 delphi-002 DAOS[18490/18614] hg   ERR  # HG -- Error -- /root/daos/_build.external/mercury/src/mercury_bulk.c:1793

 # HG_Bulk_transfer_id(): Could not start transfer of bulk data

 

My suspicion is that the server’s cleanup of the first connection from fi_addr 2 throws away the libfabric address of that client, so then when the server tries to initiate a RMA Read request using the second client connection, it has no libfabric address for the client and fails.

 

The failure sequence for fi_addr 3 is identical:

 

04/08-19:18:08.43 delphi-002 DAOS[18490/18614] hg   DBUG # NA -- Debug -- /root/daos/_build.external/mercury/src/na/na_ofi.c:4442

 # na_ofi_get(): Posting RMA get (op id=0x7feab365f550, fi_addr=3, uri=(null))

libfabric:18490:ofi_rxm:ep_ctrl:rxm_ep_prepare_tx():851<warn> krehm: rxm_cmap_acquire_handle() returned NULL, fi_addr: 3

libfabric:18490:ofi_rxm:ep_data:rxm_ep_rma_common():80<warn> krehm: rxm_ep_prepare_tx() returned -113, fi_addr: 3

04/08-19:18:08.43 delphi-002 DAOS[18490/18554] hg   DBUG # NA -- Debug -- /root/daos/_build.external/mercury/src/na/na_ofi.c:2707

 # na_ofi_cq_process_recv_unexpected_event(): unexpected recv msg completion event with tag=140657172791930, len=4294967364

04/08-19:18:08.43 delphi-002 DAOS[18490/18578] object ERR  src/object/srv_obj.c:384 obj_bulk_transfer() crt_bulk_transfer 0 error (-1020).

04/08-19:18:08.43 delphi-002 DAOS[18490/18578] object ERR  src/object/srv_obj.c:1062 obj_local_rw() 1153784440846024704.5704104871060330004.13 data transfer failed, dma 1 rc DER_HG(-1020)

04/08-19:18:08.43 delphi-002 DAOS[18490/18614] hg   ERR  # NA -- Error -- /root/daos/_build.external/mercury/src/na/na_ofi.c:4462

 # na_ofi_get(): fi_readmsg() failed, rc: -113 (No route to host)

04/08-19:18:08.43 delphi-002 DAOS[18490/18614] hg   ERR  # HG -- Error -- /root/daos/_build.external/mercury/src/mercury_bulk.c:793

 # hg_bulk_transfer_pieces(): Could not transfer data (NA_PROTOCOL_ERROR)

04/08-19:18:08.43 delphi-002 DAOS[18490/18614] hg   ERR  # HG -- Error -- /root/daos/_build.external/mercury/src/mercury_bulk.c:952

 # hg_bulk_transfer(): Could not transfer data pieces

04/08-19:18:08.43 delphi-002 DAOS[18490/18614] hg   ERR  # HG -- Error -- /root/daos/_build.external/mercury/src/mercury_bulk.c:1793

 # HG_Bulk_transfer_id(): Could not start transfer of bulk data

 

Could someone suggest ways that we might narrow down the problem?

 

Thanks, Kevan


Re: Need advice debugging "No route to host" failures

Oganezov, Alexander A
 

Hi Kevan & others,

 

As we’ve been debugging few other “no route to host” failures here, one thing that also turns out needs to be set on large-scale systems is FI_UNIVERSE_SIZE envariable, which in RXM-based providers (verbs included) dictates internal size of address vector table for peers.

 

With Xuezhaos patch below (already part of latest 0.9 DAOS) and FI_UNIVERSE_SIZE=2048  setting we were able to avoid “No route to host” issues on all known failures so far.

 

This issue is also being tracked in libfabric as https://github.com/ofiwg/libfabric/issues/5843

 

Thanks,

~~Alex.

 

 

From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of Liu, Xuezhao
Sent: Thursday, April 9, 2020 7:24 AM
To: daos@daos.groups.io
Subject: Re: [daos] Need advice debugging "No route to host" failures

 

Hi Kevan,

 

Great, thanks for verifying and the helpful logs.

 

On the server side, what should be the proper behavior if a client tries to connect twice using the same IPoIB port?   Should the server reject the second connection attempt?   Close the first connection and then open the second?

Server should not reject the 2nd connection attempt, the current handling (with the patch) is it will check its local cache (a hash-table) and reuse it if possible.

 

Xuezhao

 

From: <daos@daos.groups.io> on behalf of Kevan Rehm <kevan.rehm@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Thursday, April 9, 2020 at 10:19 PM
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: Re: [daos] Need advice debugging "No route to host" failures

 

Xuezhao,

 

Yes, it works, thank you very much!    I have been trying to figure that one out for several days now! 😊

 

Kevan

 

From: <daos@daos.groups.io> on behalf of "Liu, Xuezhao" <xuezhao.liu@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Thursday, April 9, 2020 at 8:34 AM
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: Re: [daos] Need advice debugging "No route to host" failures

 

Hi Kevan,

 

The log you posted is very helpful. It helped to find a potential bug in mercury code for address management, please find attached small patch.

It needs to patch mercury source code and rebuild it. Commonly I do that by:

cd daos_m_path/_build.external/mercury; git am 0001-NA-OFI-fix-a-potential-bug-in-na_ofi_addr_ht_lookup.patch; cd ../ mercury.build; make clean; make install

 

Could you please have a test to see if it can resolve the problem?

 

Thanks,

Xuezhao

 

From: <daos@daos.groups.io> on behalf of Kevan Rehm <kevan.rehm@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Thursday, April 9, 2020 at 5:50 PM
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: [daos] Need advice debugging "No route to host" failures

 

Greetings,

 

We have been unsuccessful over the last few days debugging “No route to host” failures that we are constantly seeing when running “ior -a DAOS”.  The errors only occur in the log of the daos_io_server on IB1, never on the IB0 server.   Client processes hang when this occurs.  The problem does not appear to be related to our recent verbs;ofi_rxm issues which we think are now resolved.   Any advice you could provide would be greatly appreciated.

 

Our configuration is one dual-daos_io_server node and one client node.   The libfabric transport is verbs;ofi_rxm, both nodes have two interfaces.   The server node’s PMEM is wiped and reformatted before each test run.  We use mpirun to start 16 ior client processes on the client node.  The ior program has been slightly modified so that processes with even-numbered ranks use IB0 while odd-numbered ranks use IB1.   Here is an example command:

 

mpirun --allow-run-as-root -np 16 $X_OPTS /root/ior/ior/src/ior -a DAOS --daos.pool $POOL --daos.cont $CONT --daos.svcl $SVCL -w -r -t 64M -b 1G

 

When we reduce the number of client processes, the problem occurs less frequently.   With only two processes on the client node we haven’t seen the problem.

 

What happens is that some client processes which connect to the IB1 server actually connect twice using the same source IPoIB port number.   Those that do are guaranteed to generate “No route to host” errors in the IB1 server log and then hang.   For example, the following grep of the IB1 server log shows that the connecting processes that were assigned to libfabric fi_addrs 2 and 3 are going to fail.  If you would repeat this same grep on the IB0 server log, you would never see duplicate connections:

 

# grep '<debug> av_insert ' *1.log

libfabric:18490:ofi_rxm:av:ip_av_insert_addr():609<debug> av_insert addr: fi_sockaddr_in://10.0.0.220:20000

libfabric:18490:ofi_rxm:av:ip_av_insert_addr():612<debug> av_insert fi_addr: 0

libfabric:18490:ofi_rxm:av:ip_av_insert_addr():609<debug> av_insert addr: fi_sockaddr_in://10.0.0.222:43782

libfabric:18490:ofi_rxm:av:ip_av_insert_addr():612<debug> av_insert fi_addr: 1

libfabric:18490:ofi_rxm:av:ip_av_insert_addr():609<debug> av_insert addr: fi_sockaddr_in://10.0.0.222:37665

libfabric:18490:ofi_rxm:av:ip_av_insert_addr():612<debug> av_insert fi_addr: 2

libfabric:18490:ofi_rxm:av:ip_av_insert_addr():609<debug> av_insert addr: fi_sockaddr_in://10.0.0.222:37665

libfabric:18490:ofi_rxm:av:ip_av_insert_addr():612<debug> av_insert fi_addr: 2

libfabric:18490:ofi_rxm:av:ip_av_insert_addr():609<debug> av_insert addr: fi_sockaddr_in://10.0.1.222:48826

libfabric:18490:ofi_rxm:av:ip_av_insert_addr():612<debug> av_insert fi_addr: 3

libfabric:18490:ofi_rxm:av:ip_av_insert_addr():609<debug> av_insert addr: fi_sockaddr_in://10.0.1.222:48826

libfabric:18490:ofi_rxm:av:ip_av_insert_addr():612<debug> av_insert fi_addr: 3

libfabric:18490:ofi_rxm:av:ip_av_insert_addr():609<debug> av_insert addr: fi_sockaddr_in://10.0.0.222:39600

libfabric:18490:ofi_rxm:av:ip_av_insert_addr():612<debug> av_insert fi_addr: 4

libfabric:18490:ofi_rxm:av:ip_av_insert_addr():609<debug> av_insert addr: fi_sockaddr_in://10.0.0.222:56137

libfabric:18490:ofi_rxm:av:ip_av_insert_addr():612<debug> av_insert fi_addr: 5

 

Here are the initial failure messages for fi_addr 2 in the IB1 server log.   The messages with “krehm” in them were added by me for debugging purposes:

 

04/08-19:18:08.43 delphi-002 DAOS[18490/18530] hg   DBUG # NA -- Debug -- /root/daos/_build.external/mercury/src/na/na_ofi.c:4442

 # na_ofi_get(): Posting RMA get (op id=0x7fecab64eba0, fi_addr=2, uri=(null))

libfabric:18490:ofi_rxm:ep_ctrl:rxm_ep_prepare_tx():851<warn> krehm: rxm_cmap_acquire_handle() returned NULL, fi_addr: 2

libfabric:18490:ofi_rxm:ep_data:rxm_ep_rma_common():80<warn> krehm: rxm_ep_prepare_tx() returned -113, fi_addr: 2

04/08-19:18:08.43 delphi-002 DAOS[18490/18614] hg   ERR  # NA -- Error -- /root/daos/_build.external/mercury/src/na/na_ofi.c:4462

 # na_ofi_get(): fi_readmsg() failed, rc: -113 (No route to host)

04/08-19:18:08.43 delphi-002 DAOS[18490/18554] hg   DBUG # NA -- Debug -- /root/daos/_build.external/mercury/src/na/na_ofi.c:4442

 # na_ofi_get(): Posting RMA get (op id=0x7fec1b64e820, fi_addr=1, uri=(null))

libfabric:18490:core:mr:ofi_intercept_mmap():368<debug> intercepted mmap start (nil) len 4194304

04/08-19:18:08.43 delphi-002 DAOS[18490/18614] hg   ERR  # HG -- Error -- /root/daos/_build.external/mercury/src/mercury_bulk.c:793

 # hg_bulk_transfer_pieces(): Could not transfer data (NA_PROTOCOL_ERROR)

04/08-19:18:08.43 delphi-002 DAOS[18490/18614] hg   ERR  # HG -- Error -- /root/daos/_build.external/mercury/src/mercury_bulk.c:952

 # hg_bulk_transfer(): Could not transfer data pieces

04/08-19:18:08.43 delphi-002 DAOS[18490/18614] hg   ERR  # HG -- Error -- /root/daos/_build.external/mercury/src/mercury_bulk.c:1793

 # HG_Bulk_transfer_id(): Could not start transfer of bulk data

 

My suspicion is that the server’s cleanup of the first connection from fi_addr 2 throws away the libfabric address of that client, so then when the server tries to initiate a RMA Read request using the second client connection, it has no libfabric address for the client and fails.

 

The failure sequence for fi_addr 3 is identical:

 

04/08-19:18:08.43 delphi-002 DAOS[18490/18614] hg   DBUG # NA -- Debug -- /root/daos/_build.external/mercury/src/na/na_ofi.c:4442

 # na_ofi_get(): Posting RMA get (op id=0x7feab365f550, fi_addr=3, uri=(null))

libfabric:18490:ofi_rxm:ep_ctrl:rxm_ep_prepare_tx():851<warn> krehm: rxm_cmap_acquire_handle() returned NULL, fi_addr: 3

libfabric:18490:ofi_rxm:ep_data:rxm_ep_rma_common():80<warn> krehm: rxm_ep_prepare_tx() returned -113, fi_addr: 3

04/08-19:18:08.43 delphi-002 DAOS[18490/18554] hg   DBUG # NA -- Debug -- /root/daos/_build.external/mercury/src/na/na_ofi.c:2707

 # na_ofi_cq_process_recv_unexpected_event(): unexpected recv msg completion event with tag=140657172791930, len=4294967364

04/08-19:18:08.43 delphi-002 DAOS[18490/18578] object ERR  src/object/srv_obj.c:384 obj_bulk_transfer() crt_bulk_transfer 0 error (-1020).

04/08-19:18:08.43 delphi-002 DAOS[18490/18578] object ERR  src/object/srv_obj.c:1062 obj_local_rw() 1153784440846024704.5704104871060330004.13 data transfer failed, dma 1 rc DER_HG(-1020)

04/08-19:18:08.43 delphi-002 DAOS[18490/18614] hg   ERR  # NA -- Error -- /root/daos/_build.external/mercury/src/na/na_ofi.c:4462

 # na_ofi_get(): fi_readmsg() failed, rc: -113 (No route to host)

04/08-19:18:08.43 delphi-002 DAOS[18490/18614] hg   ERR  # HG -- Error -- /root/daos/_build.external/mercury/src/mercury_bulk.c:793

 # hg_bulk_transfer_pieces(): Could not transfer data (NA_PROTOCOL_ERROR)

04/08-19:18:08.43 delphi-002 DAOS[18490/18614] hg   ERR  # HG -- Error -- /root/daos/_build.external/mercury/src/mercury_bulk.c:952

 # hg_bulk_transfer(): Could not transfer data pieces

04/08-19:18:08.43 delphi-002 DAOS[18490/18614] hg   ERR  # HG -- Error -- /root/daos/_build.external/mercury/src/mercury_bulk.c:1793

 # HG_Bulk_transfer_id(): Could not start transfer of bulk data

 

Could someone suggest ways that we might narrow down the problem?

 

Thanks, Kevan


potential removal of daos_server --recreate-superblocks option

Nabarro, Tom
 

Trying to gauge usage of the “—recreate-superblocks” option, does anyone rely on it? If anyone is could you please provide your use case.

 

It is a developer shortcut to avoid the need to run “dmg storage format” when starting daos_server from scratch.

 

It's proving to be inconvenient side door from the standpoint of control-plane server side development and I would like to remove it.

 

Tom

 

---------------------------------------------------------------------
Intel Corporation (UK) Limited
Registered No. 1134945 (England)
Registered Office: Pipers Way, Swindon SN3 1RJ
VAT No: 860 2173 47

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


Re: Broken build - FUSE changes

Patrick Farrell <paf@...>
 

OK, thanks - Sorry to prod, it just wasn't clear what the plan was.  Thanks for the workaround.

Regards,
-Patrick

From: daos@daos.groups.io <daos@daos.groups.io> on behalf of Olivier, Jeffrey V <jeffrey.v.olivier@...>
Sent: Wednesday, April 15, 2020 10:02 AM
To: daos@daos.groups.io <daos@daos.groups.io>
Subject: Re: [daos] Broken build - FUSE changes
 

Hi Patrick,

 

Yes, it’s a bug.   Whether we revert and fix or just fix it, it is only a temporary issue.

 

Jeff

 

From: <daos@daos.groups.io> on behalf of "Farrell, Patrick Arthur" <patrick.farrell@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Wednesday, April 15, 2020 at 8:55 AM
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: Re: [daos] Broken build - FUSE changes

 

Brian,

 

OK, I'm happy to pass that option.  Is there no concern over the build failure when that option is not passed?

 

-Patrick


From: daos@daos.groups.io <daos@daos.groups.io> on behalf of Murrell, Brian <brian.murrell@...>
Sent: Wednesday, April 15, 2020 8:08 AM
To: daos@daos.groups.io <daos@daos.groups.io>
Subject: Re: [daos] Broken build - FUSE changes

 

On Tue, 2020-04-14 at 20:04 +0000, Farrell, Patrick Arthur wrote:
> Good afternoon,

Hi.

> The most recent commit (dab7da05cd06b80bd2751a783090ea8a2d4b96b3) has
> broken the build for me, with compile errors in fuse on CentOS
> 7.7.  It refers to adding fuse3-devel to the build requirements, but
> configure doesn't require this (it looks like it was added to the RPM
> instructions, not sure how that interacts with scons).
>
> In any case, installing fuse3-devel didn't help,

You have to also tell scons that you want to use the installed fuse3-
devel rather than still try to build it locally with:

$ scons USE_INSTALLED=fuse

By default scons tries to build everything with no regard to which
components are installed on the system and using them instead.

Cheers,
b.




Re: Broken build - FUSE changes

Olivier, Jeffrey V
 

Hi Patrick,

 

Yes, it’s a bug.   Whether we revert and fix or just fix it, it is only a temporary issue.

 

Jeff

 

From: <daos@daos.groups.io> on behalf of "Farrell, Patrick Arthur" <patrick.farrell@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Wednesday, April 15, 2020 at 8:55 AM
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: Re: [daos] Broken build - FUSE changes

 

Brian,

 

OK, I'm happy to pass that option.  Is there no concern over the build failure when that option is not passed?

 

-Patrick


From: daos@daos.groups.io <daos@daos.groups.io> on behalf of Murrell, Brian <brian.murrell@...>
Sent: Wednesday, April 15, 2020 8:08 AM
To: daos@daos.groups.io <daos@daos.groups.io>
Subject: Re: [daos] Broken build - FUSE changes

 

On Tue, 2020-04-14 at 20:04 +0000, Farrell, Patrick Arthur wrote:
> Good afternoon,

Hi.

> The most recent commit (dab7da05cd06b80bd2751a783090ea8a2d4b96b3) has
> broken the build for me, with compile errors in fuse on CentOS
> 7.7.  It refers to adding fuse3-devel to the build requirements, but
> configure doesn't require this (it looks like it was added to the RPM
> instructions, not sure how that interacts with scons).
>
> In any case, installing fuse3-devel didn't help,

You have to also tell scons that you want to use the installed fuse3-
devel rather than still try to build it locally with:

$ scons USE_INSTALLED=fuse

By default scons tries to build everything with no regard to which
components are installed on the system and using them instead.

Cheers,
b.




Re: Broken build - FUSE changes

Farrell, Patrick Arthur <patrick.farrell@...>
 

Brian,

OK, I'm happy to pass that option.  Is there no concern over the build failure when that option is not passed?

-Patrick

From: daos@daos.groups.io <daos@daos.groups.io> on behalf of Murrell, Brian <brian.murrell@...>
Sent: Wednesday, April 15, 2020 8:08 AM
To: daos@daos.groups.io <daos@daos.groups.io>
Subject: Re: [daos] Broken build - FUSE changes
 
On Tue, 2020-04-14 at 20:04 +0000, Farrell, Patrick Arthur wrote:
> Good afternoon,

Hi.

> The most recent commit (dab7da05cd06b80bd2751a783090ea8a2d4b96b3) has
> broken the build for me, with compile errors in fuse on CentOS
> 7.7.  It refers to adding fuse3-devel to the build requirements, but
> configure doesn't require this (it looks like it was added to the RPM
> instructions, not sure how that interacts with scons).
>
> In any case, installing fuse3-devel didn't help,

You have to also tell scons that you want to use the installed fuse3-
devel rather than still try to build it locally with:

$ scons USE_INSTALLED=fuse

By default scons tries to build everything with no regard to which
components are installed on the system and using them instead.

Cheers,
b.