Date   

Re: 'access_points' must contain resolvable addresses complaint

Nabarro, Tom
 

Hello,

 

I would check if there are any clues from comparing with an example config file (under utils/config/examples dir) and then verify that it works with the IP addresses (maybe just try one to start with and use double quotes like in the examples) just to narrow down the problem.

 

Regards,

Tom

 

From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of Hannappel, Juergen
Sent: Wednesday, February 16, 2022 9:18 AM
To: daos@daos.groups.io
Subject: [daos] 'access_points' must contain resolvable addresses complaint

 

Hello,
I try to set up daos on a small 4-node cluster, following the recipe on https://docs.daos.io/v2.0/QSG/setup_centos/
When I try to start the server it complains:
Feb 16 09:16:11 asapo-srv09.desy.de daos_server[251023]: ERROR: retrieve replicas from config: serverconfig: code = 705 description = "invalid list of access points in configuration"
Feb 16 09:16:11 asapo-srv09.desy.de daos_server[251023]: ERROR: serverconfig: code = 705 resolution = "'access_points' must contain resolvable addresses; fix the configuration and restart the cont>

Config is:
[root@asapo-srv09 tmp]# grep -v ^# /etc/daos/daos_server.yml
access_points: ['asapo-srv09', 'asapo-srv10', 'asapo-srv11', 'asapo-srv12']
provider: ofi+verbs;ofi_rxm
bdev_include: ["01:00.0","02:00.0"]

The node names are resolvable:
root@asapo-srv09 tmp]# host asapo-srv09
asapo-srv09.desy.de has address 131.169.183.155
[root@asapo-srv09 tmp]# host asapo-srv10
asapo-srv10.desy.de has address 131.169.183.157
... and so on.

The result was the same when I added the domain name to the host names in the config file or used the ip adresses.
Probably a stupid error on my side, any hints?

Thanks in advance


Re: 'access_points' must contain resolvable addresses complaint

Hennecke, Michael
 

Hi,

 

Does it work with a single host in the access_points list? 

You should add a line with “name: daos_server” above the access_points.

Anything useful in the daos server logfile?

 

Best,

Michael

 

From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of Hannappel, Juergen
Sent: Wednesday, 16 February 2022 10:18
To: daos@daos.groups.io
Subject: [daos] 'access_points' must contain resolvable addresses complaint

 

Hello,
I try to set up daos on a small 4-node cluster, following the recipe on https://docs.daos.io/v2.0/QSG/setup_centos/
When I try to start the server it complains:
Feb 16 09:16:11 asapo-srv09.desy.de daos_server[251023]: ERROR: retrieve replicas from config: serverconfig: code = 705 description = "invalid list of access points in configuration"
Feb 16 09:16:11 asapo-srv09.desy.de daos_server[251023]: ERROR: serverconfig: code = 705 resolution = "'access_points' must contain resolvable addresses; fix the configuration and restart the cont>

Config is:
[root@asapo-srv09 tmp]# grep -v ^# /etc/daos/daos_server.yml
access_points: ['asapo-srv09', 'asapo-srv10', 'asapo-srv11', 'asapo-srv12']
provider: ofi+verbs;ofi_rxm
bdev_include: ["01:00.0","02:00.0"]

The node names are resolvable:
root@asapo-srv09 tmp]# host asapo-srv09
asapo-srv09.desy.de has address 131.169.183.155
[root@asapo-srv09 tmp]# host asapo-srv10
asapo-srv10.desy.de has address 131.169.183.157
... and so on.

The result was the same when I added the domain name to the host names in the config file or used the ip adresses.
Probably a stupid error on my side, any hints?

Thanks in advance

Intel Deutschland GmbH
Registered Address: Am Campeon 10, 85579 Neubiberg, Germany
Tel: +49 89 99 8853-0, www.intel.de
Managing Directors: Christin Eisenschmid, Sharon Heck, Tiffany Doon Silva  
Chairperson of the Supervisory Board: Nicole Lau
Registered Office: Munich
Commercial Register: Amtsgericht Muenchen HRB 186928


'access_points' must contain resolvable addresses complaint

Hannappel, Juergen
 

Hello,
I try to set up daos on a small 4-node cluster, following the recipe on https://docs.daos.io/v2.0/QSG/setup_centos/
When I try to start the server it complains:
Feb 16 09:16:11 asapo-srv09.desy.de daos_server[251023]: ERROR: retrieve replicas from config: serverconfig: code = 705 description = "invalid list of access points in configuration"
Feb 16 09:16:11 asapo-srv09.desy.de daos_server[251023]: ERROR: serverconfig: code = 705 resolution = "'access_points' must contain resolvable addresses; fix the configuration and restart the cont>

Config is:
[root@asapo-srv09 tmp]# grep -v ^# /etc/daos/daos_server.yml
access_points: ['asapo-srv09', 'asapo-srv10', 'asapo-srv11', 'asapo-srv12']
provider: ofi+verbs;ofi_rxm
bdev_include: ["01:00.0","02:00.0"]

The node names are resolvable:
root@asapo-srv09 tmp]# host asapo-srv09
asapo-srv09.desy.de has address 131.169.183.155
[root@asapo-srv09 tmp]# host asapo-srv10
asapo-srv10.desy.de has address 131.169.183.157
... and so on.

The result was the same when I added the domain name to the host names in the config file or used the ip adresses.
Probably a stupid error on my side, any hints?

Thanks in advance


Re: Consistency problem in daos

@stephen.tao
 

Are there two situations when trying abort after submission failure:
1. The peer1 node submitted successfully, but did not receive the abort request due to network reasons (Should it go back even if it receive abort request? )
2. The peer2 submission failed and received an abort request, so abort was completed
Is there any inconsistency at this time? 

Thansks for your reponding, best regards!


Re: High latency in metada write

shadow_vector@...
 

Hi Liang:

   Thank you for the concern and interpretation. I get the reason. So I think meta data write in array update interface would take less time than the vos_perf test before due to less data would be written in SCM. Is there something wrong with my understanding? Looking forward to the benchmark result.

Best Regards!


Re: Announcement: DAOS 2.0.1 is generally available

Prantis, Kelsey
 

One small error correction due to typo, highlighting that the text below should say *2.0.1* release as the subject line does, not 2.0.2.

 

Kelsey

 

From: <daos@daos.groups.io> on behalf of "Prantis, Kelsey" <kelsey.prantis@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Tuesday, February 1, 2022 at 7:14 PM
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: [daos] Announcement: DAOS 2.0.1 is generally available

 

All,

 

We are pleased to announce that DAOS 2.0.1 release is now generally available. This maintenance release contains the following updates on top of DAOS 2.0.0:

 

  • DAOS 2.0.1 includes fixes to the EC, VOS and Object services, as well as improvements to the control system and dfuse. It also includes numerous updates to the test and build infrastructure.
  • log4j-core has been updated from 2.16.0 to 2.17.1 DAOS-8929.
  • libfabric has been updated from 1.14.0~rc3-2 to 1.14.0-1. This also fixes the DAOS 2.0.0 known limitation with MOFED > 5.4-1.0.3.0 described in DAOS-9376.
  • mercury has been updated from 2.1.0~rc4-1 to 2.1.0~rc4-3. This fixes the high CPU utilization issue in DAOS 2.0.0 described in DAOS-9325
  • spdk has been updated from 21.07-8 to 21.07-11 (minor fixes only).

 

There are a number of resources available for the release:

As always, feel free to use this mailing list for any issues you may find with the release, or our JIRA bug tracking system, available at https://daosio.atlassian.net/jira, or on our Slack channel, available at https://daos-stack.slack.com.

 

Regards,

 

Kelsey Prantis

Senior Software Engineering Manager

Extreme Storage Architecture and Development Division

Intel

 

 


Announcement: DAOS 2.0.1 is generally available

Prantis, Kelsey
 

All,

 

We are pleased to announce that DAOS 2.0.2 release is now generally available. This maintenance release contains the following updates on top of DAOS 2.0.0:

 

  • DAOS 2.0.1 includes fixes to the EC, VOS and Object services, as well as improvements to the control system and dfuse. It also includes numerous updates to the test and build infrastructure.
  • log4j-core has been updated from 2.16.0 to 2.17.1 DAOS-8929.
  • libfabric has been updated from 1.14.0~rc3-2 to 1.14.0-1. This also fixes the DAOS 2.0.0 known limitation with MOFED > 5.4-1.0.3.0 described in DAOS-9376.
  • mercury has been updated from 2.1.0~rc4-1 to 2.1.0~rc4-3. This fixes the high CPU utilization issue in DAOS 2.0.0 described in DAOS-9325
  • spdk has been updated from 21.07-8 to 21.07-11 (minor fixes only).

 

There are a number of resources available for the release:

As always, feel free to use this mailing list for any issues you may find with the release, or our JIRA bug tracking system, available at https://daosio.atlassian.net/jira, or on our Slack channel, available at https://daos-stack.slack.com.

 

Regards,

 

Kelsey Prantis

Senior Software Engineering Manager

Extreme Storage Architecture and Development Division

Intel

 

 


Re: Consistency problem in daos

Lombardi, Johann
 

Hi there,

 

If abort is called either by the application directly (daos_tx_abort()) or by DAOS internally (something went wrong during commit), then all changes associated with the transaction are discarded and consistency is preserved. None of the uncommitted modifications are visible to external readers.

 

Cheers,

Johann

 

From: <daos@daos.groups.io> on behalf of "stephen.tao@..." <stephen.tao@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Saturday 29 January 2022 at 10:47
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: [daos] Consistency problem in daos

 

1. Will abort be triggered if a transaction fails in the commit phase in Daos? (I read the code myself and found that this happens when the submission fails. At this time, some nodes may submit successfully)
2. If abort is triggered, what is the meaning? Will strong consistency be lost?

---------------------------------------------------------------------
Intel Corporation SAS (French simplified joint stock company)
Registered headquarters: "Les Montalets"- 2, rue de Paris,
92196 Meudon Cedex, France
Registration Number:  302 456 199 R.C.S. NANTERRE
Capital: 4,572,000 Euros

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


Consistency problem in daos

@stephen.tao
 

1. Will abort be triggered if a transaction fails in the commit phase in Daos? (I read the code myself and found that this happens when the submission fails. At this time, some nodes may submit successfully)
2. If abort is triggered, what is the meaning? Will strong consistency be lost?


Re: External API

d.korekovcev@...
 

Thanks! 


Re: High latency in metada write

Zhen, Liang
 

Hi, in the test, it writes 1K to DAOS server, the engine actually does:

  1. Search the dkey in SCM
  2. Create index for the dkey if it does not exist (b+tree stored in SCM)
  3. Do the same for akey
  4. Copy 1K data to SCM
  5. All the above writes to SCM are in the same PMDK transaction which has its own cost.

This is the reason that VOS write latency is higher than one SCM write. Array write latency should roughly be the same as single value, we will do some benchmark and check if there is any issue.

 

Liang

 

From: daos@daos.groups.io <daos@daos.groups.io> on behalf of shadow_vector@... <shadow_vector@...>
Date: Thursday, January 27, 2022 at 2:25 PM
To: daos@daos.groups.io <daos@daos.groups.io>
Subject: Re: [daos] High latency in metada write

Hi Johann

So you mean that although the operation over SCM is fast
but several sequential operations over SCM may result in the high latency
Here is the test in my server with vos_perf



Using array type
the latency is much higher(wrose than my test). The type "array" here means to the array in DAOS interface? Is there something wrong?

Another question, DAOS write the data buffer before the SCM operation in VOS.  I think the leader would waite
until follower complete all the operation and reply and this result in a higher latency.  Is there some problem with my understanding?
I'm confuesd with the array update flow now.


Best Regards!

Jan 


Re: High latency in metada write

shadow_vector@...
 

Hi Johann:

So you mean that although the operation over SCM is fast, but several sequential operations over SCM may result in the high latency?
Here is the test in my server with vos_perf:


Using array type, the latency is much higher(wrose than my test). The type "array" here means to the array in DAOS interface? Is there something wrong?

Another question, DAOS write the data buffer before the SCM operation in VOS.  I think the leader would waite,until follower complete all the operation and reply and this result in a higher latency.  Is there some problem with my understanding?
I'm confuesd with the array update flow now.


Best Regards!

Jan 


Re: High latency in metada write

Lombardi, Johann
 

Hi,

 

A DAOS update operation actually results in several sequential latency-sensitive operations over SCM to locate the object/dkey/akey (and create the associated trees if those don’t exist already) that you want to update. Once this is resolved, the 4K data buffer will be stored in either SCM or NVMe. DAOS uses SCM internally to accelerate the sequential metadata operations.

 

Cheers,

Johann

 

From: <daos@daos.groups.io> on behalf of "shadow_vector@..." <shadow_vector@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Wednesday 26 January 2022 at 03:26
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: Re: [daos] High latency in metada write

 

Hi Johann:

Tansks for the information. I have checked the latency in the online documentation, about 12 us for update, much lower than that in my test. But I think SCM would be much faster while the 4K write latency is about 10us using NVMe. Does the SCM just resolve the WA problem?

Best Regards!

---------------------------------------------------------------------
Intel Corporation SAS (French simplified joint stock company)
Registered headquarters: "Les Montalets"- 2, rue de Paris,
92196 Meudon Cedex, France
Registration Number:  302 456 199 R.C.S. NANTERRE
Capital: 4,572,000 Euros

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


Re: High latency in metada write

shadow_vector@...
 

Hi Johann:

Tansks for the information. I have checked the latency in the online documentation, about 12 us for update, much lower than that in my test. But I think SCM would be much faster while the 4K write latency is about 10us using NVMe. Does the SCM just resolve the WA problem?

Best Regards!


Re: High latency in metada write

Lombardi, Johann
 

Hi there,

 

I don’t think that we have ever run any benchmarks with only 2x Optane DIMMs. Maybe you could start by running vos_perf and compare to the results we have in the online documentation: https://docs.daos.io/v2.0/admin/performance_tuning/#daos_perf-vos_perf

 

Cheers,

Johann

 

From: <daos@daos.groups.io> on behalf of "shadow_vector@..." <shadow_vector@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Tuesday 25 January 2022 at 10:29
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: Re: [daos] High latency in metada write

 

Thansks for reponseing. And can you share your latency in you internal testing?

Best Regards!

---------------------------------------------------------------------
Intel Corporation SAS (French simplified joint stock company)
Registered headquarters: "Les Montalets"- 2, rue de Paris,
92196 Meudon Cedex, France
Registration Number:  302 456 199 R.C.S. NANTERRE
Capital: 4,572,000 Euros

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


Re: High latency in metada write

shadow_vector@...
 

Thansks for reponseing. And can you share your latency in you internal testing?

Best Regards!


Re: External API

Jacque, Kristin
 

When I replied last week I forgot to mention, we do have a control API written in Go, which you could also use for this purpose: https://github.com/daos-stack/daos/tree/master/src/control/lib/control

 

Hopefully either that or calling dmg will be sufficient. 😊 Just avoid calling the gRPC methods directly, as we don’t guarantee those for external developers.

 

Kris

 

From: Jacque, Kristin
Sent: Friday, January 21, 2022 6:22 PM
To: daos@daos.groups.io
Subject: RE: [daos] External API

 

You should use dmg storage format. As things stand, we don’t enable users to call the administrative gRPC methods directly. The API is internal and may change.

 

Thanks,

Kris

 

From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of d.korekovcev@...
Sent: Friday, January 21, 2022 1:52 AM
To: daos@daos.groups.io
Subject: [daos] External API

 

After services started, I'll format pmem and nvme (from code).

Need to exec 'dmg storage format' or can call 'ctlpb.CtlSvcClient.StorageFormat'?

Can I use dRPC for control daos_server? 

Or in the next versions of the API will be changed?


Re: External API

Jacque, Kristin
 

You should use dmg storage format. As things stand, we don’t enable users to call the administrative gRPC methods directly. The API is internal and may change.

 

Thanks,

Kris

 

From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of d.korekovcev@...
Sent: Friday, January 21, 2022 1:52 AM
To: daos@daos.groups.io
Subject: [daos] External API

 

After services started, I'll format pmem and nvme (from code).

Need to exec 'dmg storage format' or can call 'ctlpb.CtlSvcClient.StorageFormat'?

Can I use dRPC for control daos_server? 

Or in the next versions of the API will be changed?


External API

d.korekovcev@...
 

After services started, I'll format pmem and nvme (from code).

Need to exec 'dmg storage format' or can call 'ctlpb.CtlSvcClient.StorageFormat'?

Can I use dRPC for control daos_server? 

Or in the next versions of the API will be changed?


Re: High latency in metada write

shadow_vector@...
 


Here is my HW configuration:

CPU: Intel(R) Xeon(R) Gold 6248R CPU @ 3.00GHz
SCM:2 SCM ,256GB
I use array interface in this test and set the obj class to OC_RP_3GX.  32GB for hugepage config;The IO size is 4K,only write req was sent.


Thanks for response.

121 - 140 of 1653