Date   

Re: High latency in metada write

Zhen, Liang
 

Hi, in the test, it writes 1K to DAOS server, the engine actually does:

  1. Search the dkey in SCM
  2. Create index for the dkey if it does not exist (b+tree stored in SCM)
  3. Do the same for akey
  4. Copy 1K data to SCM
  5. All the above writes to SCM are in the same PMDK transaction which has its own cost.

This is the reason that VOS write latency is higher than one SCM write. Array write latency should roughly be the same as single value, we will do some benchmark and check if there is any issue.

 

Liang

 

From: daos@daos.groups.io <daos@daos.groups.io> on behalf of shadow_vector@... <shadow_vector@...>
Date: Thursday, January 27, 2022 at 2:25 PM
To: daos@daos.groups.io <daos@daos.groups.io>
Subject: Re: [daos] High latency in metada write

Hi Johann

So you mean that although the operation over SCM is fast
but several sequential operations over SCM may result in the high latency
Here is the test in my server with vos_perf



Using array type
the latency is much higher(wrose than my test). The type "array" here means to the array in DAOS interface? Is there something wrong?

Another question, DAOS write the data buffer before the SCM operation in VOS.  I think the leader would waite
until follower complete all the operation and reply and this result in a higher latency.  Is there some problem with my understanding?
I'm confuesd with the array update flow now.


Best Regards!

Jan 


Re: High latency in metada write

shadow_vector@...
 

Hi Johann:

So you mean that although the operation over SCM is fast, but several sequential operations over SCM may result in the high latency?
Here is the test in my server with vos_perf:


Using array type, the latency is much higher(wrose than my test). The type "array" here means to the array in DAOS interface? Is there something wrong?

Another question, DAOS write the data buffer before the SCM operation in VOS.  I think the leader would waite,until follower complete all the operation and reply and this result in a higher latency.  Is there some problem with my understanding?
I'm confuesd with the array update flow now.


Best Regards!

Jan 


Re: High latency in metada write

Lombardi, Johann
 

Hi,

 

A DAOS update operation actually results in several sequential latency-sensitive operations over SCM to locate the object/dkey/akey (and create the associated trees if those don’t exist already) that you want to update. Once this is resolved, the 4K data buffer will be stored in either SCM or NVMe. DAOS uses SCM internally to accelerate the sequential metadata operations.

 

Cheers,

Johann

 

From: <daos@daos.groups.io> on behalf of "shadow_vector@..." <shadow_vector@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Wednesday 26 January 2022 at 03:26
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: Re: [daos] High latency in metada write

 

Hi Johann:

Tansks for the information. I have checked the latency in the online documentation, about 12 us for update, much lower than that in my test. But I think SCM would be much faster while the 4K write latency is about 10us using NVMe. Does the SCM just resolve the WA problem?

Best Regards!

---------------------------------------------------------------------
Intel Corporation SAS (French simplified joint stock company)
Registered headquarters: "Les Montalets"- 2, rue de Paris,
92196 Meudon Cedex, France
Registration Number:  302 456 199 R.C.S. NANTERRE
Capital: 4,572,000 Euros

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


Re: High latency in metada write

shadow_vector@...
 

Hi Johann:

Tansks for the information. I have checked the latency in the online documentation, about 12 us for update, much lower than that in my test. But I think SCM would be much faster while the 4K write latency is about 10us using NVMe. Does the SCM just resolve the WA problem?

Best Regards!


Re: High latency in metada write

Lombardi, Johann
 

Hi there,

 

I don’t think that we have ever run any benchmarks with only 2x Optane DIMMs. Maybe you could start by running vos_perf and compare to the results we have in the online documentation: https://docs.daos.io/v2.0/admin/performance_tuning/#daos_perf-vos_perf

 

Cheers,

Johann

 

From: <daos@daos.groups.io> on behalf of "shadow_vector@..." <shadow_vector@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Tuesday 25 January 2022 at 10:29
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: Re: [daos] High latency in metada write

 

Thansks for reponseing. And can you share your latency in you internal testing?

Best Regards!

---------------------------------------------------------------------
Intel Corporation SAS (French simplified joint stock company)
Registered headquarters: "Les Montalets"- 2, rue de Paris,
92196 Meudon Cedex, France
Registration Number:  302 456 199 R.C.S. NANTERRE
Capital: 4,572,000 Euros

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


Re: High latency in metada write

shadow_vector@...
 

Thansks for reponseing. And can you share your latency in you internal testing?

Best Regards!


Re: External API

Jacque, Kristin
 

When I replied last week I forgot to mention, we do have a control API written in Go, which you could also use for this purpose: https://github.com/daos-stack/daos/tree/master/src/control/lib/control

 

Hopefully either that or calling dmg will be sufficient. 😊 Just avoid calling the gRPC methods directly, as we don’t guarantee those for external developers.

 

Kris

 

From: Jacque, Kristin
Sent: Friday, January 21, 2022 6:22 PM
To: daos@daos.groups.io
Subject: RE: [daos] External API

 

You should use dmg storage format. As things stand, we don’t enable users to call the administrative gRPC methods directly. The API is internal and may change.

 

Thanks,

Kris

 

From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of d.korekovcev@...
Sent: Friday, January 21, 2022 1:52 AM
To: daos@daos.groups.io
Subject: [daos] External API

 

After services started, I'll format pmem and nvme (from code).

Need to exec 'dmg storage format' or can call 'ctlpb.CtlSvcClient.StorageFormat'?

Can I use dRPC for control daos_server? 

Or in the next versions of the API will be changed?


Re: External API

Jacque, Kristin
 

You should use dmg storage format. As things stand, we don’t enable users to call the administrative gRPC methods directly. The API is internal and may change.

 

Thanks,

Kris

 

From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of d.korekovcev@...
Sent: Friday, January 21, 2022 1:52 AM
To: daos@daos.groups.io
Subject: [daos] External API

 

After services started, I'll format pmem and nvme (from code).

Need to exec 'dmg storage format' or can call 'ctlpb.CtlSvcClient.StorageFormat'?

Can I use dRPC for control daos_server? 

Or in the next versions of the API will be changed?


External API

d.korekovcev@...
 

After services started, I'll format pmem and nvme (from code).

Need to exec 'dmg storage format' or can call 'ctlpb.CtlSvcClient.StorageFormat'?

Can I use dRPC for control daos_server? 

Or in the next versions of the API will be changed?


Re: High latency in metada write

shadow_vector@...
 


Here is my HW configuration:

CPU: Intel(R) Xeon(R) Gold 6248R CPU @ 3.00GHz
SCM:2 SCM ,256GB
I use array interface in this test and set the obj class to OC_RP_3GX.  32GB for hugepage config;The IO size is 4K,only write req was sent.


Thanks for response.


Re: Post DAOS 2.0 installation and comments and questions

Omer
 

Might take a few days till I'm back working on that setup, but will update. As far as I remember, no output what so ever was printed though.


Re: Post DAOS 2.0 installation and comments and questions

Pittman, Ashley M
 

Omer,

 

I’d be interested in seeing the stdout/stderr and logging from dfuse where this is happening.  As you say the -f option should make it run in the foreground so if you’re not passing this then it should background itself so you can get back a prompt.  Even when it does go into the background it’ll wait until the fuse mount is registered with the kernel before detaching the terminal so that it can report any error to the user so there is code in there to delay and then exit after forking so there is potential for an issue there although I’ve not known one before and we do have tests which use both methods.

 

Ashley.

 

From: daos@daos.groups.io <daos@daos.groups.io> on behalf of Omer via groups.io <omer.caspi@...>
Date: Monday, 17 January 2022 at 14:32
To: daos@daos.groups.io <daos@daos.groups.io>
Subject: Re: [daos] Post DAOS 2.0 installation and comments and questions

Hi,

Thanks for your reply.

1. I saw :). Cool.
2. I used to build DAOS and install it, so I had a script that did that for me, but yes, with RPM based solution this shouldn't be a concern. I fixed my unit file to do what I suggested to overcome this.
3. I tried " dfuse --pool=pool0 --container=cont0 -m /mnt/dfuse" and after a while " dfuse -f --pool=pool0 --container=cont0 -m /mnt/dfuse". I'm guessing the 2nd version is expected to not return as the '-f' suggests, but I didn't get a command prompt back either way.

Omer


Re: Post DAOS 2.0 installation and comments and questions

Omer
 

Hi,

Thanks for your reply.

1. I saw :). Cool.
2. I used to build DAOS and install it, so I had a script that did that for me, but yes, with RPM based solution this shouldn't be a concern. I fixed my unit file to do what I suggested to overcome this.
3. I tried " dfuse --pool=pool0 --container=cont0 -m /mnt/dfuse" and after a while " dfuse -f --pool=pool0 --container=cont0 -m /mnt/dfuse". I'm guessing the 2nd version is expected to not return as the '-f' suggests, but I didn't get a command prompt back either way.

Omer


Re: Post DAOS 2.0 installation and comments and questions

Lombardi, Johann
 

Hi there,

 

Thanks for reaching out.

  1. The online doc has been fixed and now specifies a different URL for CentOS7 and 8.
  2. /var/run/daos_agent should be automatically created after enabling/starting the daos_agent systemd service (i.e. “systemctl enable/start daos_agent”).
    We have some instructions in the documentation on how to start the agent manually (w/o systemd), see https://docs.daos.io/v2.0/admin/predeployment_check/#runtime-directory-setup

That being said, this should not be required if you use RPMs/systemd.

  1. Could you please advise what option(s) you pass to the dfuse command line? Can you set “export D_LOG_MASK=debug” and then run dfuse with the -f option and see what it logs?

 

Cheers,

Johann

 

 

From: <daos@daos.groups.io> on behalf of "Omer via groups.io" <omer.caspi@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Wednesday 12 January 2022 at 12:18
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: [daos] Post DAOS 2.0 installation and comments and questions

 

Some post DAOS 2.0 installation and comments and questions:

1. We use a mix of CentOS 7 and 8 in our setup. However, the DAOS online documentation at "https://docs.daos.io/v2.0/QSG/setup_centos/" states the following URLas the repository to use when installing DAOS:

"https://packages.daos.io/v2.0/CentOS7/packages/x86_64/daos_packages.repo".

Of course, this lead to some installation problems as the is a specific CentOS 7 URL. Fixing the repository URL solved this problem, but only after some confusion about what going on. This probably needs some documentation clarification.

2. The DAOS agent creates a socket in /var/run/daos_agent. This isn't a persistent directory and the service doesn't create it, and it needs to be created on every boot for the agent to work properly.
a. This isn't mentioned in the documentation.
b. A systemd unit addition like "ExecStartPre=mkdir -p /var/run/daos_agent" is probably a good idea

3. Not sure why, but mounting a POSIX container using dfuse doesn't return for me. The process doesn't return. During that time, the mount is working (or so it seems), as a `mount` command does list a dfuse mount in the relevant directory, but the dfuse command didn't return to the CLI prompt.

---------------------------------------------------------------------
Intel Corporation SAS (French simplified joint stock company)
Registered headquarters: "Les Montalets"- 2, rue de Paris,
92196 Meudon Cedex, France
Registration Number:  302 456 199 R.C.S. NANTERRE
Capital: 4,572,000 Euros

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


Re: High latency in metada write

Lombardi, Johann
 

Hi there,

 

Could you please tell us more about your HW configuration and the test case (e.g. I/O size, read or write, object class, …)? We definitely see lower latency in our internal testing.

 

Cheers,

Johann

 

From: <daos@daos.groups.io> on behalf of "shadow_vector@..." <shadow_vector@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Tuesday 11 January 2022 at 12:26
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: [daos] High latency in metada write

 

Hello everyone:
  Recently, I did some testing with daos array interface, found that after finishing nvme bio, it took me a long time for writing metadata and resource release ; The total latency of single channel IO is about 80us, and  vos_update_end taks about 30 us(about 17us in dkey akey update via SCM); I thought that IO via SCM should be very fast, but through the test, it's slower than the nvme IO. Is evtree operation takes too long or are there other reasons?

Best regards!

---------------------------------------------------------------------
Intel Corporation SAS (French simplified joint stock company)
Registered headquarters: "Les Montalets"- 2, rue de Paris,
92196 Meudon Cedex, France
Registration Number:  302 456 199 R.C.S. NANTERRE
Capital: 4,572,000 Euros

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


Post DAOS 2.0 installation and comments and questions

Omer
 

Some post DAOS 2.0 installation and comments and questions:

1. We use a mix of CentOS 7 and 8 in our setup. However, the DAOS online documentation at "https://docs.daos.io/v2.0/QSG/setup_centos/" states the following URLas the repository to use when installing DAOS:

"https://packages.daos.io/v2.0/CentOS7/packages/x86_64/daos_packages.repo".

Of course, this lead to some installation problems as the is a specific CentOS 7 URL. Fixing the repository URL solved this problem, but only after some confusion about what going on. This probably needs some documentation clarification.

2. The DAOS agent creates a socket in /var/run/daos_agent. This isn't a persistent directory and the service doesn't create it, and it needs to be created on every boot for the agent to work properly.
a. This isn't mentioned in the documentation.
b. A systemd unit addition like "ExecStartPre=mkdir -p /var/run/daos_agent" is probably a good idea

3. Not sure why, but mounting a POSIX container using dfuse doesn't return for me. The process doesn't return. During that time, the mount is working (or so it seems), as a `mount` command does list a dfuse mount in the relevant directory, but the dfuse command didn't return to the CLI prompt.


High latency in metada write

shadow_vector@...
 

Hello everyone:
  Recently, I did some testing with daos array interface, found that after finishing nvme bio, it took me a long time for writing metadata and resource release ; The total latency of single channel IO is about 80us, and  vos_update_end taks about 30 us(about 17us in dkey akey update via SCM); I thought that IO via SCM should be very fast, but through the test, it's slower than the nvme IO. Is evtree operation takes too long or are there other reasons?

Best regards!


Re: Questions about ULT Schedule

Niu, Yawei
 

Hi,

 

In your example, the ULT will be tracked in the wait list from “ABT_future” on waiting, once it’s waked up, it’ll be pushed back to a runnable ULT FIFO list (per ABT pool, maintained by Argobots internally), and being executed once server scheduler picked it again.

 

Thanks

-Niu

 

From: daos@daos.groups.io <daos@daos.groups.io> on behalf of 段世博 <duanshibo.d@...>
Date: Sunday, January 2, 2022 at 9:00 PM
To: daos@daos.groups.io <daos@daos.groups.io>
Subject: Re: [daos] Questions about ULT Schedule

I have one more question A ULT is suspended during execution, where it is when it was awakened. For example, dtx_leader_exec_ops_ult hangs while waiting for ABT_future,

When the future meets the conditions, the ULT is awakened. Where is the awakened ULT stored (fifo_list or other list), and when will the ULT continue to execute?


Re: Unable to build libmfu properly

Bohning, Dalton
 

I’m not sure why the libraries aren’t being linked/loaded – maybe someone more knowledgeable on the DAOS team has an idea. The make command does include the libraries, and the libraries are available, so I wouldn’t think there should be an issue.

 

~Dalton Bohning

 

From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of netsurfed
Sent: Thursday, December 30, 2021 5:32 PM
To: daos@daos.groups.io
Subject: Re: [daos] Unable to build libmfu properly

 

Yes, the same errors.

daos_agent@sw2:~/git/mpifileutils/build$ export LD_LIBRARY_PATH=${MY_DAOS_INSTALL_PATH}/lib64:$LD_LIBRARY_PATH

daos_agent@sw2:~/git/mpifileutils/build$ echo $LD_LIBRARY_PATH

/home/daos_agent/git/daos/build/lib64:

daos_agent@sw2:~/git/mpifileutils/build$ make

[ 32%] Built target mfu_o

[ 34%] Built target mfu-static

[ 36%] Built target mfu

[ 37%] Linking C executable dbcast

/usr/bin/ld: ../common/libmfu.so: undefined reference to `d_hash_rec_insert'

/usr/bin/ld: ../common/libmfu.so: undefined reference to `dfs_lookup'

/usr/bin/ld: ../common/libmfu.so: undefined reference to `d_hash_rec_unlinked'

/usr/bin/ld: ../common/libmfu.so: undefined reference to `dfs_stat'

/usr/bin/ld: ../common/libmfu.so: undefined reference to `dfs_readdir'

/usr/bin/ld: ../common/libmfu.so: undefined reference to `d_hash_table_create'

/usr/bin/ld: ../common/libmfu.so: undefined reference to `dfs_obj_anchor_split'

/usr/bin/ld: ../common/libmfu.so: undefined reference to `dfs_release'

/usr/bin/ld: ../common/libmfu.so: undefined reference to `d_hash_rec_find'

/usr/bin/ld: ../common/libmfu.so: undefined reference to `dfs_obj_anchor_set'

collect2: error: ld returned 1 exit status

make[2]: *** [src/dbcast/CMakeFiles/dbcast.dir/build.make:90: src/dbcast/dbcast] Error 1

make[1]: *** [CMakeFiles/Makefile2:539: src/dbcast/CMakeFiles/dbcast.dir/all] Error 2

make: *** [Makefile:130: all] Error 2

 


Re: Unable to build libmfu properly

netsurfed
 

Yes, the same errors.

daos_agent@sw2:~/git/mpifileutils/build$ export LD_LIBRARY_PATH=${MY_DAOS_INSTALL_PATH}/lib64:$LD_LIBRARY_PATH
daos_agent@sw2:~/git/mpifileutils/build$ echo $LD_LIBRARY_PATH
/home/daos_agent/git/daos/build/lib64:
daos_agent@sw2:~/git/mpifileutils/build$ make
[ 32%] Built target mfu_o
[ 34%] Built target mfu-static
[ 36%] Built target mfu
[ 37%] Linking C executable dbcast
/usr/bin/ld: ../common/libmfu.so: undefined reference to `d_hash_rec_insert'
/usr/bin/ld: ../common/libmfu.so: undefined reference to `dfs_lookup'
/usr/bin/ld: ../common/libmfu.so: undefined reference to `d_hash_rec_unlinked'
/usr/bin/ld: ../common/libmfu.so: undefined reference to `dfs_stat'
/usr/bin/ld: ../common/libmfu.so: undefined reference to `dfs_readdir'
/usr/bin/ld: ../common/libmfu.so: undefined reference to `d_hash_table_create'
/usr/bin/ld: ../common/libmfu.so: undefined reference to `dfs_obj_anchor_split'
/usr/bin/ld: ../common/libmfu.so: undefined reference to `dfs_release'
/usr/bin/ld: ../common/libmfu.so: undefined reference to `d_hash_rec_find'
/usr/bin/ld: ../common/libmfu.so: undefined reference to `dfs_obj_anchor_set'
collect2: error: ld returned 1 exit status
make[2]: *** [src/dbcast/CMakeFiles/dbcast.dir/build.make:90: src/dbcast/dbcast] Error 1
make[1]: *** [CMakeFiles/Makefile2:539: src/dbcast/CMakeFiles/dbcast.dir/all] Error 2
make: *** [Makefile:130: all] Error 2
 

141 - 160 of 1663