Date   

Re: DKey/AKey/value count/size limitations

Patrick Farrell <paf@...>
 

Johann,

Thanks.

One question for you - Referring to moving to daos.io.  Currently, that's the user manual, and the placement readme is internals documentation.

Is the intention to migrate away from the readmes in the source for internals docs?  I ask just because it seems to be working well - Despite the placement doc being out of date, in general the readmes seem to be updated as code changes.  It would be a shame to lose that.

- Patrick


From: daos@daos.groups.io <daos@daos.groups.io> on behalf of Lombardi, Johann <johann.lombardi@...>
Sent: Monday, November 25, 2019 3:40 AM
To: daos@daos.groups.io <daos@daos.groups.io>; Olivier, Jeffrey V <jeffrey.v.olivier@...>
Subject: Re: [daos] DKey/AKey/value count/size limitations
 

Our developer documentation has some basic text describing redundancy groups:  https://github.com/daos-stack/daos/blob/master/src/placement/README.md#ring-placement-map

That being said, this is in the context of ring placement and we have since moved to jump consistent hash. We need to update the documentation and move it to http://daos.io.

 

Data stored under a dkey is indeed limited by the size of a pool shard. This might also cause the space usage to be unbalanced across targets if you have a non-widely-striped object with a lot of dkeys. That’s why we have a feature called “progressive layout” in the roadmap to reshard an object dynamically over more and more targets as the number of dkey increases. This will be based on GIGA+ (https://www.pdl.cmu.edu/PDL-FTP/PDSI/CMU-PDL-08-110.pdf).

 

Cheers,

Johann

 

From: <daos@daos.groups.io> on behalf of Patrick Farrell <paf@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Friday 22 November 2019 at 22:15
To: "Olivier, Jeffrey V" <jeffrey.v.olivier@...>, "daos@daos.groups.io" <daos@daos.groups.io>
Subject: Re: [daos] DKey/AKey/value count/size limitations

 

Thanks, Jeff - That seems reasonable.

 

A related question for you:  You used the term redundancy group.  Is the concept of a redundancy group detailed a bit anywhere in the documentation/readmes?  It's used in a number of places, but I have not found a place which describes the exact definition or its role.  I feel like I must be missing something - the documentation is in general very thorough - but I've done a little digging and come up empty on this one.


From: daos@daos.groups.io <daos@daos.groups.io> on behalf of Olivier, Jeffrey V <jeffrey.v.olivier@...>
Sent: Friday, November 22, 2019 3:04 PM
To: daos@daos.groups.io <daos@daos.groups.io>
Subject: Re: [daos] DKey/AKey/value count/size limitations

 

Hi Patrick,

 

There are no built in limits on how much data you can associate with a single dkey.   There are practical limitations though since data under a dkey maps to a particular redundancy group and the associated storage targets are limited by the amount of local storage allocated to them.  

 

We don’t do anything to prevent a client from stuffing everything under a single dkey using the DAOS API and I don’t believe we have any plans to mitigate that problem.   However, it is mitigated somewhat by the fact that a given client will have a storage allocation that will be isolated from clients in other projects that are using different pools.   A client will not be able to exceed the capacity allocated to their pool/project.

 

-Jeff

 

From: <daos@daos.groups.io> on behalf of Patrick Farrell <paf@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Friday, November 22, 2019 at 11:21 AM
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: [daos] DKey/AKey/value count/size limitations

 

Good afternoon,

I am wondering about any limitations on the total amount of data that can be associated with a given individual DKey.

Is there a limit on the number of AKeys that can be associated with a given DKey?  Is there a limit on the total size/amount of data than can be associated with a given AKey?

The concern here is the concept of a client "stuffing" a given DKey, adding more and more data to it, and therefore causing an imbalance in space usage between servers, potentially running one target in a pool out of space.
Is there anything today that would prevent that, or is there a future plan to deal with that sort of issue?  (In the context of consistent hashing, space rebalancing between targets in a pool is a challenging problem, so it seems important to stay away from significant imbalance.)

Regards,
Patrick

---------------------------------------------------------------------
Intel Corporation SAS (French simplified joint stock company)
Registered headquarters: "Les Montalets"- 2, rue de Paris,
92196 Meudon Cedex, France
Registration Number:  302 456 199 R.C.S. NANTERRE
Capital: 4,572,000 Euros

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


Re: [External] Re: [daos] Does DAOS support infiniband now?

Shengyu SY19 Zhang
 

Hello Joel,

As the shown in the output.log, there is only one version of libfabrics installed in my machine, and actually I don't nave other software which depends libfabraics installed.
From you guide to set FI_LOG_LEVEL=debug, I can see the following message, may be helpful:

libfabric:123445:verbs:fabric:fi_ibv_set_default_attr():1263<info> Ignoring provider default value for tx rma_iov_limit as it is greater than the value supported by domain: mlx5_0
libfabric:123445:verbs:fabric:fi_ibv_get_matching_info():1365<info> hints->ep_attr->rx_ctx_cnt != FI_SHARED_CONTEXT. Skipping XRC FI_EP_MSG endpoints
ERROR: daos_io_server:0 libfabric:123445:verbs:core:fi_ibv_check_hints():231<info> Unsupported capabilities
libfabric:123445:verbs:core:fi_ibv_check_hints():232<info> Supported: FI_MSG, FI_RECV, FI_SEND, FI_LOCAL_COMM, FI_REMOTE_COMM
libfabric:123445:verbs:core:fi_ibv_check_hints():232<info> Requested: FI_MSG, FI_RMA, FI_READ, FI_RECV, FI_SEND, FI_REMOTE_READ
ERROR: daos_io_server:0 libfabric:123445:verbs:fabric:fi_ibv_get_rai_id():179<info> rdma_bind_addr: No such device(19)
ERROR: daos_io_server:0 libfabric:123445:verbs:fabric:fi_ibv_get_rai_id():179<info> rdma_bind_addr: No such device(19)
ERROR: daos_io_server:0 libfabric:123445:verbs:fabric:fi_ibv_get_rai_id():179<info> rdma_bind_addr: Invalid argument(22)
ERROR: daos_io_server:0 libfabric:123445:verbs:fabric:fi_ibv_get_rai_id():179<info> rdma_bind_addr: Invalid argument(22)
ERROR: daos_io_server:0 libfabric:123445:verbs:fabric:fi_ibv_get_rai_id():179<info> rdma_bind_addr: Invalid argument(22)
ERROR: daos_io_server:0 libfabric:123445:core:core:ofi_layering_ok():795<info> Need core provider, skipping ofi_rxd
libfabric:123445:core:core:ofi_layering_ok():795<info> Need core provider, skipping ofi_mrail


Regards,
Shengyu.

-----Original Message-----
From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of Rosenzweig, Joel B
Sent: Saturday, November 23, 2019 3:20 AM
To: daos@daos.groups.io
Subject: Re: [External] Re: [daos] Does DAOS support infiniband now?

Hi Shengyu,

The debug output showed me that when daos_server is started via orterun, libfabric is not finding provider support for ofi_rxm at least. I'm still wondering if you have two different versions of libfabric installed on your machine.

Can you run these commands and provide the output?

1) ldd install/bin/daos_server
2) modify your orterun command to run ldd on daos_server. For example, I run this command locally:
orterun --allow-run-as-root --map-by node --mca btl tcp,self --mca oob tcp -np 1 --hostfile /home/jbrosenz/daos/hostfile --enable-recovery --report-uri /tmp/urifile ldd /home/jbrosenz/daos/install/bin/daos_server
3) which fi_info
4) ldd over each version of fi_info found

From the data you provide, I'll understand if the libfabric being used by daos_server when executed directly by you in the shell is the same libfabric being used by daos_server when executed via orterun. Your original "daos_server network scan" output showed support for ofi+verbs;ofi_rxm but your debug output showed that when daos_server was started (via orterun), libfabric could not find support for the very same providers. If there are two different versions being used with different configurations, it would explain the failure. If it's a single installation/configuration, then that will lead the debug in another direction.

Depending on what you find through 1-4, you might find it helpful to export the environment variable FI_LOG_LEVEL=debug which will instruct libfabric to output a good deal of debug info.

Regards,
Joel

-----Original Message-----
From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of Shengyu SY19 Zhang
Sent: Friday, November 22, 2019 12:59 AM
To: daos@daos.groups.io
Subject: Re: [External] Re: [daos] Does DAOS support infiniband now?

Hello Joel,

Please see those files in attachment.
I have tried two machines, one have full provider shows in fi_info (verbs and rxm), another doesn't show verbs, but they are same can't start io_server. I found the project conflicts with mellanox drivers, therefor I remove it and use yum package only, however still keep not working.


Regards,
Shengyu

-----Original Message-----
From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of Rosenzweig, Joel B
Sent: Friday, November 22, 2019 6:35 AM
To: daos@daos.groups.io
Subject: Re: [External] Re: [daos] Does DAOS support infiniband now?

Hi Shengyu,

Can you share your daos_server.yml so we can see how you enabled the provider? And, can you share the log files daos_control.log and server.log so we can see more context?

Thank you,
Joel

-----Original Message-----
From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of Shengyu SY19 Zhang
Sent: Wednesday, November 20, 2019 9:23 PM
To: daos@daos.groups.io
Subject: Re: [External] Re: [daos] Does DAOS support infiniband now?

Hello,

Thank you for your help Alex, Joel and Kevin, I have checked those steps that you provided:

Ibstat:
State: Active
Physical state: LinkUp

Ifconfig:
flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 2044

fi_info:
verbs:
version: 1.0
ofi_rxm:
version: 1.0
ofi_rxd:
version: 1.0

And network is good since I can run SPDK NVMe-oF over Infiniband with good working.
I also specified "ofi+verbs;ofi_rxm", the same error occurred, the ioserver will be stopped after a while, and print log as I provided previously.

And I noticed, whatever I specify ofi+verbs, ofi_rxm, or ofi+verbs;ofi_rxm, the log keep shows No provider found for "verbs;ofi_rxm" provider on domain "ib0", is it the cause?

BTW: it is working under ofi+sockets.


Regards,
Shengyu.

-----Original Message-----
From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of Oganezov, Alexander A
Sent: Thursday, November 21, 2019 7:13 AM
To: daos@daos.groups.io
Subject: [External] Re: [daos] Does DAOS support infiniband now?

Hi Shengyu,

However if I specify either ofi+verbs or ofi_rxm, the same error will happen, and io_server will stop.
na_ofi.c:1609
# na_ofi_domain_open(): No provider found for "verbs;ofi_rxm" provider on domain "ib0"
To use supported verbs provider you need to have "ofi+verbs;ofi_rxm" in the provider string.

~~Alex.

-----Original Message-----
From: daos@daos.groups.io [mailto:daos@daos.groups.io] On Behalf Of Rosenzweig, Joel B
Sent: Wednesday, November 20, 2019 7:37 AM
To: daos@daos.groups.io
Subject: Re: [daos] Does DAOS support infiniband now?

Hi Shengyu,

The daos_server network scan uses information provided by libfabric to determine available devices and providers. It then cross references that list of devices with device names obtained from hwloc to convert libfabric device names (as necessary) to those you'd find via ifconfig. Therefore, if "daos_server network scan" displays a device and provider, it means that support for that via libfabric has been provided. However, as Kevin pointed out, it's possible that the device itself was down, and that could certainly generate an error like what you encountered. There's another possibility, that you might have more than one version of libfabric installed in your environment. I have run into this situation in our lab environment. You might check your target system to see if it has more than one libfabric library with different provider support.

Regards,
Joel

-----Original Message-----
From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of Harms, Kevin via Groups.Io
Sent: Wednesday, November 20, 2019 10:04 AM
To: daos@daos.groups.io
Subject: Re: [daos] Does DAOS support infiniband now?

Shengyu,

I have tried IB and it works. Verify the libfabric verbs provider is available.

fi_info -l

you should see these:

ofi\_rxm:
version: 1.0

verbs:
version: 1.0

See here for details:

https://daos-stack.github.io/admin/deployment/#network-interface-detection-and-selection

You might also want to confirm ib0 is in the UP state:

[root@daos01 ~]# ifconfig ib0
ib0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 4092
inet 172.25.6.101 netmask 255.255.0.0 broadcast 172.25.255.255

kevin

________________________________________
From: daos@daos.groups.io <daos@daos.groups.io> on behalf of Shengyu SY19 Zhang <zhangsy19@...>
Sent: Wednesday, November 20, 2019 2:54 AM
To: daos@daos.groups.io
Subject: [daos] Does DAOS support infiniband now?

Hello,

I use daos_server network scan, it shows as following:
fabric_iface: ib0
provider: ofi+verbs;ofi_rxm
pinned_numa_node: 1

However if I specify either ofi+verbs or ofi_rxm, the same error will happen, and io_server will stop.
na_ofi.c:1609
# na_ofi_domain_open(): No provider found for "verbs;ofi_rxm" provider on domain "ib0"

The ib0 is Mellanox nic over Infiniband network.

Regards,
Shengyu.


Re: DKey/AKey/value count/size limitations

Lombardi, Johann
 

Our developer documentation has some basic text describing redundancy groups:  https://github.com/daos-stack/daos/blob/master/src/placement/README.md#ring-placement-map

That being said, this is in the context of ring placement and we have since moved to jump consistent hash. We need to update the documentation and move it to http://daos.io.

 

Data stored under a dkey is indeed limited by the size of a pool shard. This might also cause the space usage to be unbalanced across targets if you have a non-widely-striped object with a lot of dkeys. That’s why we have a feature called “progressive layout” in the roadmap to reshard an object dynamically over more and more targets as the number of dkey increases. This will be based on GIGA+ (https://www.pdl.cmu.edu/PDL-FTP/PDSI/CMU-PDL-08-110.pdf).

 

Cheers,

Johann

 

From: <daos@daos.groups.io> on behalf of Patrick Farrell <paf@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Friday 22 November 2019 at 22:15
To: "Olivier, Jeffrey V" <jeffrey.v.olivier@...>, "daos@daos.groups.io" <daos@daos.groups.io>
Subject: Re: [daos] DKey/AKey/value count/size limitations

 

Thanks, Jeff - That seems reasonable.

 

A related question for you:  You used the term redundancy group.  Is the concept of a redundancy group detailed a bit anywhere in the documentation/readmes?  It's used in a number of places, but I have not found a place which describes the exact definition or its role.  I feel like I must be missing something - the documentation is in general very thorough - but I've done a little digging and come up empty on this one.


From: daos@daos.groups.io <daos@daos.groups.io> on behalf of Olivier, Jeffrey V <jeffrey.v.olivier@...>
Sent: Friday, November 22, 2019 3:04 PM
To: daos@daos.groups.io <daos@daos.groups.io>
Subject: Re: [daos] DKey/AKey/value count/size limitations

 

Hi Patrick,

 

There are no built in limits on how much data you can associate with a single dkey.   There are practical limitations though since data under a dkey maps to a particular redundancy group and the associated storage targets are limited by the amount of local storage allocated to them.  

 

We don’t do anything to prevent a client from stuffing everything under a single dkey using the DAOS API and I don’t believe we have any plans to mitigate that problem.   However, it is mitigated somewhat by the fact that a given client will have a storage allocation that will be isolated from clients in other projects that are using different pools.   A client will not be able to exceed the capacity allocated to their pool/project.

 

-Jeff

 

From: <daos@daos.groups.io> on behalf of Patrick Farrell <paf@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Friday, November 22, 2019 at 11:21 AM
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: [daos] DKey/AKey/value count/size limitations

 

Good afternoon,

I am wondering about any limitations on the total amount of data that can be associated with a given individual DKey.

Is there a limit on the number of AKeys that can be associated with a given DKey?  Is there a limit on the total size/amount of data than can be associated with a given AKey?

The concern here is the concept of a client "stuffing" a given DKey, adding more and more data to it, and therefore causing an imbalance in space usage between servers, potentially running one target in a pool out of space.
Is there anything today that would prevent that, or is there a future plan to deal with that sort of issue?  (In the context of consistent hashing, space rebalancing between targets in a pool is a challenging problem, so it seems important to stay away from significant imbalance.)

Regards,
Patrick

---------------------------------------------------------------------
Intel Corporation SAS (French simplified joint stock company)
Registered headquarters: "Les Montalets"- 2, rue de Paris,
92196 Meudon Cedex, France
Registration Number:  302 456 199 R.C.S. NANTERRE
Capital: 4,572,000 Euros

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


Re: DKey/AKey/value count/size limitations

Olivier, Jeffrey V
 

Yes, it would ultimately be restricted by the size of a single shard.

 

From: <daos@daos.groups.io> on behalf of Colin Ngam <cngam@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Friday, November 22, 2019 at 2:12 PM
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: Re: [daos] DKey/AKey/value count/size limitations

 

Hi Jeff,

 

Does that mean that the data of the Dkey is restricted by the size of a single Shard?

 

Thanks.

 

Colin

 

From: <daos@daos.groups.io> on behalf of "Olivier, Jeffrey V" <jeffrey.v.olivier@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Friday, November 22, 2019 at 3:04 PM
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: Re: [daos] DKey/AKey/value count/size limitations

 

Hi Patrick,

 

There are no built in limits on how much data you can associate with a single dkey.   There are practical limitations though since data under a dkey maps to a particular redundancy group and the associated storage targets are limited by the amount of local storage allocated to them.  

 

We don’t do anything to prevent a client from stuffing everything under a single dkey using the DAOS API and I don’t believe we have any plans to mitigate that problem.   However, it is mitigated somewhat by the fact that a given client will have a storage allocation that will be isolated from clients in other projects that are using different pools.   A client will not be able to exceed the capacity allocated to their pool/project.

 

-Jeff

 

From: <daos@daos.groups.io> on behalf of Patrick Farrell <paf@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Friday, November 22, 2019 at 11:21 AM
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: [daos] DKey/AKey/value count/size limitations

 

Good afternoon,

I am wondering about any limitations on the total amount of data that can be associated with a given individual DKey.

Is there a limit on the number of AKeys that can be associated with a given DKey?  Is there a limit on the total size/amount of data than can be associated with a given AKey?

The concern here is the concept of a client "stuffing" a given DKey, adding more and more data to it, and therefore causing an imbalance in space usage between servers, potentially running one target in a pool out of space.
Is there anything today that would prevent that, or is there a future plan to deal with that sort of issue?  (In the context of consistent hashing, space rebalancing between targets in a pool is a challenging problem, so it seems important to stay away from significant imbalance.)

Regards,
Patrick


Re: DKey/AKey/value count/size limitations

Patrick Farrell <paf@...>
 

Thanks, Jeff - That seems reasonable.

A related question for you:  You used the term redundancy group.  Is the concept of a redundancy group detailed a bit anywhere in the documentation/readmes?  It's used in a number of places, but I have not found a place which describes the exact definition or its role.  I feel like I must be missing something - the documentation is in general very thorough - but I've done a little digging and come up empty on this one.


From: daos@daos.groups.io <daos@daos.groups.io> on behalf of Olivier, Jeffrey V <jeffrey.v.olivier@...>
Sent: Friday, November 22, 2019 3:04 PM
To: daos@daos.groups.io <daos@daos.groups.io>
Subject: Re: [daos] DKey/AKey/value count/size limitations
 

Hi Patrick,

 

There are no built in limits on how much data you can associate with a single dkey.   There are practical limitations though since data under a dkey maps to a particular redundancy group and the associated storage targets are limited by the amount of local storage allocated to them.  

 

We don’t do anything to prevent a client from stuffing everything under a single dkey using the DAOS API and I don’t believe we have any plans to mitigate that problem.   However, it is mitigated somewhat by the fact that a given client will have a storage allocation that will be isolated from clients in other projects that are using different pools.   A client will not be able to exceed the capacity allocated to their pool/project.

 

-Jeff

 

From: <daos@daos.groups.io> on behalf of Patrick Farrell <paf@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Friday, November 22, 2019 at 11:21 AM
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: [daos] DKey/AKey/value count/size limitations

 

Good afternoon,

I am wondering about any limitations on the total amount of data that can be associated with a given individual DKey.

Is there a limit on the number of AKeys that can be associated with a given DKey?  Is there a limit on the total size/amount of data than can be associated with a given AKey?

The concern here is the concept of a client "stuffing" a given DKey, adding more and more data to it, and therefore causing an imbalance in space usage between servers, potentially running one target in a pool out of space.
Is there anything today that would prevent that, or is there a future plan to deal with that sort of issue?  (In the context of consistent hashing, space rebalancing between targets in a pool is a challenging problem, so it seems important to stay away from significant imbalance.)

Regards,
Patrick


Re: DKey/AKey/value count/size limitations

Colin Ngam
 

Hi Jeff,

 

Does that mean that the data of the Dkey is restricted by the size of a single Shard?

 

Thanks.

 

Colin

 

From: <daos@daos.groups.io> on behalf of "Olivier, Jeffrey V" <jeffrey.v.olivier@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Friday, November 22, 2019 at 3:04 PM
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: Re: [daos] DKey/AKey/value count/size limitations

 

Hi Patrick,

 

There are no built in limits on how much data you can associate with a single dkey.   There are practical limitations though since data under a dkey maps to a particular redundancy group and the associated storage targets are limited by the amount of local storage allocated to them.  

 

We don’t do anything to prevent a client from stuffing everything under a single dkey using the DAOS API and I don’t believe we have any plans to mitigate that problem.   However, it is mitigated somewhat by the fact that a given client will have a storage allocation that will be isolated from clients in other projects that are using different pools.   A client will not be able to exceed the capacity allocated to their pool/project.

 

-Jeff

 

From: <daos@daos.groups.io> on behalf of Patrick Farrell <paf@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Friday, November 22, 2019 at 11:21 AM
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: [daos] DKey/AKey/value count/size limitations

 

Good afternoon,

I am wondering about any limitations on the total amount of data that can be associated with a given individual DKey.

Is there a limit on the number of AKeys that can be associated with a given DKey?  Is there a limit on the total size/amount of data than can be associated with a given AKey?

The concern here is the concept of a client "stuffing" a given DKey, adding more and more data to it, and therefore causing an imbalance in space usage between servers, potentially running one target in a pool out of space.
Is there anything today that would prevent that, or is there a future plan to deal with that sort of issue?  (In the context of consistent hashing, space rebalancing between targets in a pool is a challenging problem, so it seems important to stay away from significant imbalance.)

Regards,
Patrick


Re: DKey/AKey/value count/size limitations

Olivier, Jeffrey V
 

Hi Patrick,

 

There are no built in limits on how much data you can associate with a single dkey.   There are practical limitations though since data under a dkey maps to a particular redundancy group and the associated storage targets are limited by the amount of local storage allocated to them.  

 

We don’t do anything to prevent a client from stuffing everything under a single dkey using the DAOS API and I don’t believe we have any plans to mitigate that problem.   However, it is mitigated somewhat by the fact that a given client will have a storage allocation that will be isolated from clients in other projects that are using different pools.   A client will not be able to exceed the capacity allocated to their pool/project.

 

-Jeff

 

From: <daos@daos.groups.io> on behalf of Patrick Farrell <paf@...>
Reply-To: "daos@daos.groups.io" <daos@daos.groups.io>
Date: Friday, November 22, 2019 at 11:21 AM
To: "daos@daos.groups.io" <daos@daos.groups.io>
Subject: [daos] DKey/AKey/value count/size limitations

 

Good afternoon,

I am wondering about any limitations on the total amount of data that can be associated with a given individual DKey.

Is there a limit on the number of AKeys that can be associated with a given DKey?  Is there a limit on the total size/amount of data than can be associated with a given AKey?

The concern here is the concept of a client "stuffing" a given DKey, adding more and more data to it, and therefore causing an imbalance in space usage between servers, potentially running one target in a pool out of space.
Is there anything today that would prevent that, or is there a future plan to deal with that sort of issue?  (In the context of consistent hashing, space rebalancing between targets in a pool is a challenging problem, so it seems important to stay away from significant imbalance.)

Regards,
Patrick


Re: [External] Re: [daos] Does DAOS support infiniband now?

Rosenzweig, Joel B <joel.b.rosenzweig@...>
 

Hi Shengyu,

The debug output showed me that when daos_server is started via orterun, libfabric is not finding provider support for ofi_rxm at least. I'm still wondering if you have two different versions of libfabric installed on your machine.

Can you run these commands and provide the output?

1) ldd install/bin/daos_server
2) modify your orterun command to run ldd on daos_server. For example, I run this command locally:
orterun --allow-run-as-root --map-by node --mca btl tcp,self --mca oob tcp -np 1 --hostfile /home/jbrosenz/daos/hostfile --enable-recovery --report-uri /tmp/urifile ldd /home/jbrosenz/daos/install/bin/daos_server
3) which fi_info
4) ldd over each version of fi_info found

From the data you provide, I'll understand if the libfabric being used by daos_server when executed directly by you in the shell is the same libfabric being used by daos_server when executed via orterun. Your original "daos_server network scan" output showed support for ofi+verbs;ofi_rxm but your debug output showed that when daos_server was started (via orterun), libfabric could not find support for the very same providers. If there are two different versions being used with different configurations, it would explain the failure. If it's a single installation/configuration, then that will lead the debug in another direction.

Depending on what you find through 1-4, you might find it helpful to export the environment variable FI_LOG_LEVEL=debug which will instruct libfabric to output a good deal of debug info.

Regards,
Joel

-----Original Message-----
From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of Shengyu SY19 Zhang
Sent: Friday, November 22, 2019 12:59 AM
To: daos@daos.groups.io
Subject: Re: [External] Re: [daos] Does DAOS support infiniband now?

Hello Joel,

Please see those files in attachment.
I have tried two machines, one have full provider shows in fi_info (verbs and rxm), another doesn't show verbs, but they are same can't start io_server. I found the project conflicts with mellanox drivers, therefor I remove it and use yum package only, however still keep not working.


Regards,
Shengyu

-----Original Message-----
From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of Rosenzweig, Joel B
Sent: Friday, November 22, 2019 6:35 AM
To: daos@daos.groups.io
Subject: Re: [External] Re: [daos] Does DAOS support infiniband now?

Hi Shengyu,

Can you share your daos_server.yml so we can see how you enabled the provider? And, can you share the log files daos_control.log and server.log so we can see more context?

Thank you,
Joel

-----Original Message-----
From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of Shengyu SY19 Zhang
Sent: Wednesday, November 20, 2019 9:23 PM
To: daos@daos.groups.io
Subject: Re: [External] Re: [daos] Does DAOS support infiniband now?

Hello,

Thank you for your help Alex, Joel and Kevin, I have checked those steps that you provided:

Ibstat:
State: Active
Physical state: LinkUp

Ifconfig:
flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 2044

fi_info:
verbs:
version: 1.0
ofi_rxm:
version: 1.0
ofi_rxd:
version: 1.0

And network is good since I can run SPDK NVMe-oF over Infiniband with good working.
I also specified "ofi+verbs;ofi_rxm", the same error occurred, the ioserver will be stopped after a while, and print log as I provided previously.

And I noticed, whatever I specify ofi+verbs, ofi_rxm, or ofi+verbs;ofi_rxm, the log keep shows No provider found for "verbs;ofi_rxm" provider on domain "ib0", is it the cause?

BTW: it is working under ofi+sockets.


Regards,
Shengyu.

-----Original Message-----
From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of Oganezov, Alexander A
Sent: Thursday, November 21, 2019 7:13 AM
To: daos@daos.groups.io
Subject: [External] Re: [daos] Does DAOS support infiniband now?

Hi Shengyu,

However if I specify either ofi+verbs or ofi_rxm, the same error will happen, and io_server will stop.
na_ofi.c:1609
# na_ofi_domain_open(): No provider found for "verbs;ofi_rxm" provider on domain "ib0"
To use supported verbs provider you need to have "ofi+verbs;ofi_rxm" in the provider string.

~~Alex.

-----Original Message-----
From: daos@daos.groups.io [mailto:daos@daos.groups.io] On Behalf Of Rosenzweig, Joel B
Sent: Wednesday, November 20, 2019 7:37 AM
To: daos@daos.groups.io
Subject: Re: [daos] Does DAOS support infiniband now?

Hi Shengyu,

The daos_server network scan uses information provided by libfabric to determine available devices and providers. It then cross references that list of devices with device names obtained from hwloc to convert libfabric device names (as necessary) to those you'd find via ifconfig. Therefore, if "daos_server network scan" displays a device and provider, it means that support for that via libfabric has been provided. However, as Kevin pointed out, it's possible that the device itself was down, and that could certainly generate an error like what you encountered. There's another possibility, that you might have more than one version of libfabric installed in your environment. I have run into this situation in our lab environment. You might check your target system to see if it has more than one libfabric library with different provider support.

Regards,
Joel

-----Original Message-----
From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of Harms, Kevin via Groups.Io
Sent: Wednesday, November 20, 2019 10:04 AM
To: daos@daos.groups.io
Subject: Re: [daos] Does DAOS support infiniband now?

Shengyu,

I have tried IB and it works. Verify the libfabric verbs provider is available.

fi_info -l

you should see these:

ofi\_rxm:
version: 1.0

verbs:
version: 1.0

See here for details:

https://daos-stack.github.io/admin/deployment/#network-interface-detection-and-selection

You might also want to confirm ib0 is in the UP state:

[root@daos01 ~]# ifconfig ib0
ib0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 4092
inet 172.25.6.101 netmask 255.255.0.0 broadcast 172.25.255.255

kevin

________________________________________
From: daos@daos.groups.io <daos@daos.groups.io> on behalf of Shengyu SY19 Zhang <zhangsy19@...>
Sent: Wednesday, November 20, 2019 2:54 AM
To: daos@daos.groups.io
Subject: [daos] Does DAOS support infiniband now?

Hello,

I use daos_server network scan, it shows as following:
fabric_iface: ib0
provider: ofi+verbs;ofi_rxm
pinned_numa_node: 1

However if I specify either ofi+verbs or ofi_rxm, the same error will happen, and io_server will stop.
na_ofi.c:1609
# na_ofi_domain_open(): No provider found for "verbs;ofi_rxm" provider on domain "ib0"

The ib0 is Mellanox nic over Infiniband network.

Regards,
Shengyu.


DKey/AKey/value count/size limitations

Patrick Farrell <paf@...>
 

Good afternoon,

I am wondering about any limitations on the total amount of data that can be associated with a given individual DKey.

Is there a limit on the number of AKeys that can be associated with a given DKey?  Is there a limit on the total size/amount of data than can be associated with a given AKey?

The concern here is the concept of a client "stuffing" a given DKey, adding more and more data to it, and therefore causing an imbalance in space usage between servers, potentially running one target in a pool out of space.
Is there anything today that would prevent that, or is there a future plan to deal with that sort of issue?  (In the context of consistent hashing, space rebalancing between targets in a pool is a challenging problem, so it seems important to stay away from significant imbalance.)

Regards,
Patrick


Re: [External] Re: [daos] Does DAOS support infiniband now?

Shengyu SY19 Zhang
 

Hello Joel,

Please see those files in attachment.
I have tried two machines, one have full provider shows in fi_info (verbs and rxm), another doesn't show verbs, but they are same can't start io_server. I found the project conflicts with mellanox drivers, therefor I remove it and use yum package only, however still keep not working.


Regards,
Shengyu

-----Original Message-----
From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of Rosenzweig, Joel B
Sent: Friday, November 22, 2019 6:35 AM
To: daos@daos.groups.io
Subject: Re: [External] Re: [daos] Does DAOS support infiniband now?

Hi Shengyu,

Can you share your daos_server.yml so we can see how you enabled the provider? And, can you share the log files daos_control.log and server.log so we can see more context?

Thank you,
Joel

-----Original Message-----
From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of Shengyu SY19 Zhang
Sent: Wednesday, November 20, 2019 9:23 PM
To: daos@daos.groups.io
Subject: Re: [External] Re: [daos] Does DAOS support infiniband now?

Hello,

Thank you for your help Alex, Joel and Kevin, I have checked those steps that you provided:

Ibstat:
State: Active
Physical state: LinkUp

Ifconfig:
flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 2044

fi_info:
verbs:
version: 1.0
ofi_rxm:
version: 1.0
ofi_rxd:
version: 1.0

And network is good since I can run SPDK NVMe-oF over Infiniband with good working.
I also specified "ofi+verbs;ofi_rxm", the same error occurred, the ioserver will be stopped after a while, and print log as I provided previously.

And I noticed, whatever I specify ofi+verbs, ofi_rxm, or ofi+verbs;ofi_rxm, the log keep shows No provider found for "verbs;ofi_rxm" provider on domain "ib0", is it the cause?

BTW: it is working under ofi+sockets.


Regards,
Shengyu.

-----Original Message-----
From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of Oganezov, Alexander A
Sent: Thursday, November 21, 2019 7:13 AM
To: daos@daos.groups.io
Subject: [External] Re: [daos] Does DAOS support infiniband now?

Hi Shengyu,

However if I specify either ofi+verbs or ofi_rxm, the same error will happen, and io_server will stop.
na_ofi.c:1609
# na_ofi_domain_open(): No provider found for "verbs;ofi_rxm" provider on domain "ib0"
To use supported verbs provider you need to have "ofi+verbs;ofi_rxm" in the provider string.

~~Alex.

-----Original Message-----
From: daos@daos.groups.io [mailto:daos@daos.groups.io] On Behalf Of Rosenzweig, Joel B
Sent: Wednesday, November 20, 2019 7:37 AM
To: daos@daos.groups.io
Subject: Re: [daos] Does DAOS support infiniband now?

Hi Shengyu,

The daos_server network scan uses information provided by libfabric to determine available devices and providers. It then cross references that list of devices with device names obtained from hwloc to convert libfabric device names (as necessary) to those you'd find via ifconfig. Therefore, if "daos_server network scan" displays a device and provider, it means that support for that via libfabric has been provided. However, as Kevin pointed out, it's possible that the device itself was down, and that could certainly generate an error like what you encountered. There's another possibility, that you might have more than one version of libfabric installed in your environment. I have run into this situation in our lab environment. You might check your target system to see if it has more than one libfabric library with different provider support.

Regards,
Joel

-----Original Message-----
From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of Harms, Kevin via Groups.Io
Sent: Wednesday, November 20, 2019 10:04 AM
To: daos@daos.groups.io
Subject: Re: [daos] Does DAOS support infiniband now?

Shengyu,

I have tried IB and it works. Verify the libfabric verbs provider is available.

fi_info -l

you should see these:

ofi\_rxm:
version: 1.0

verbs:
version: 1.0

See here for details:

https://daos-stack.github.io/admin/deployment/#network-interface-detection-and-selection

You might also want to confirm ib0 is in the UP state:

[root@daos01 ~]# ifconfig ib0
ib0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 4092
inet 172.25.6.101 netmask 255.255.0.0 broadcast 172.25.255.255

kevin

________________________________________
From: daos@daos.groups.io <daos@daos.groups.io> on behalf of Shengyu SY19 Zhang <zhangsy19@...>
Sent: Wednesday, November 20, 2019 2:54 AM
To: daos@daos.groups.io
Subject: [daos] Does DAOS support infiniband now?

Hello,

I use daos_server network scan, it shows as following:
fabric_iface: ib0
provider: ofi+verbs;ofi_rxm
pinned_numa_node: 1

However if I specify either ofi+verbs or ofi_rxm, the same error will happen, and io_server will stop.
na_ofi.c:1609
# na_ofi_domain_open(): No provider found for "verbs;ofi_rxm" provider on domain "ib0"

The ib0 is Mellanox nic over Infiniband network.

Regards,
Shengyu.


DUG'19 slides

Lombardi, Johann
 

Hi there,

 

All slide decks presented at the user group have been uploaded to the wiki: https://wiki.hpdd.intel.com/display/DC/DUG19

Many thanks to all the presenters!

 

Cheers,

Johann

---------------------------------------------------------------------
Intel Corporation SAS (French simplified joint stock company)
Registered headquarters: "Les Montalets"- 2, rue de Paris,
92196 Meudon Cedex, France
Registration Number:  302 456 199 R.C.S. NANTERRE
Capital: 4,572,000 Euros

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


Re: [External] Re: [daos] Does DAOS support infiniband now?

Rosenzweig, Joel B <joel.b.rosenzweig@...>
 

Hi Shengyu,

Can you share your daos_server.yml so we can see how you enabled the provider? And, can you share the log files daos_control.log and server.log so we can see more context?

Thank you,
Joel

-----Original Message-----
From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of Shengyu SY19 Zhang
Sent: Wednesday, November 20, 2019 9:23 PM
To: daos@daos.groups.io
Subject: Re: [External] Re: [daos] Does DAOS support infiniband now?

Hello,

Thank you for your help Alex, Joel and Kevin, I have checked those steps that you provided:

Ibstat:
State: Active
Physical state: LinkUp

Ifconfig:
flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 2044

fi_info:
verbs:
version: 1.0
ofi_rxm:
version: 1.0
ofi_rxd:
version: 1.0

And network is good since I can run SPDK NVMe-oF over Infiniband with good working.
I also specified "ofi+verbs;ofi_rxm", the same error occurred, the ioserver will be stopped after a while, and print log as I provided previously.

And I noticed, whatever I specify ofi+verbs, ofi_rxm, or ofi+verbs;ofi_rxm, the log keep shows No provider found for "verbs;ofi_rxm" provider on domain "ib0", is it the cause?

BTW: it is working under ofi+sockets.


Regards,
Shengyu.

-----Original Message-----
From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of Oganezov, Alexander A
Sent: Thursday, November 21, 2019 7:13 AM
To: daos@daos.groups.io
Subject: [External] Re: [daos] Does DAOS support infiniband now?

Hi Shengyu,

However if I specify either ofi+verbs or ofi_rxm, the same error will happen, and io_server will stop.
na_ofi.c:1609
# na_ofi_domain_open(): No provider found for "verbs;ofi_rxm" provider on domain "ib0"
To use supported verbs provider you need to have "ofi+verbs;ofi_rxm" in the provider string.

~~Alex.

-----Original Message-----
From: daos@daos.groups.io [mailto:daos@daos.groups.io] On Behalf Of Rosenzweig, Joel B
Sent: Wednesday, November 20, 2019 7:37 AM
To: daos@daos.groups.io
Subject: Re: [daos] Does DAOS support infiniband now?

Hi Shengyu,

The daos_server network scan uses information provided by libfabric to determine available devices and providers. It then cross references that list of devices with device names obtained from hwloc to convert libfabric device names (as necessary) to those you'd find via ifconfig. Therefore, if "daos_server network scan" displays a device and provider, it means that support for that via libfabric has been provided. However, as Kevin pointed out, it's possible that the device itself was down, and that could certainly generate an error like what you encountered. There's another possibility, that you might have more than one version of libfabric installed in your environment. I have run into this situation in our lab environment. You might check your target system to see if it has more than one libfabric library with different provider support.

Regards,
Joel

-----Original Message-----
From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of Harms, Kevin via Groups.Io
Sent: Wednesday, November 20, 2019 10:04 AM
To: daos@daos.groups.io
Subject: Re: [daos] Does DAOS support infiniband now?

Shengyu,

I have tried IB and it works. Verify the libfabric verbs provider is available.

fi_info -l

you should see these:

ofi\_rxm:
version: 1.0

verbs:
version: 1.0

See here for details:

https://daos-stack.github.io/admin/deployment/#network-interface-detection-and-selection

You might also want to confirm ib0 is in the UP state:

[root@daos01 ~]# ifconfig ib0
ib0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 4092
inet 172.25.6.101 netmask 255.255.0.0 broadcast 172.25.255.255

kevin

________________________________________
From: daos@daos.groups.io <daos@daos.groups.io> on behalf of Shengyu SY19 Zhang <zhangsy19@...>
Sent: Wednesday, November 20, 2019 2:54 AM
To: daos@daos.groups.io
Subject: [daos] Does DAOS support infiniband now?

Hello,

I use daos_server network scan, it shows as following:
fabric_iface: ib0
provider: ofi+verbs;ofi_rxm
pinned_numa_node: 1

However if I specify either ofi+verbs or ofi_rxm, the same error will happen, and io_server will stop.
na_ofi.c:1609
# na_ofi_domain_open(): No provider found for "verbs;ofi_rxm" provider on domain "ib0"

The ib0 is Mellanox nic over Infiniband network.

Regards,
Shengyu.


Re: Missing packages in Docker image for `dcpm` deployments

Alex Barcelo
 

So I just pushed the minimal PR #1459 which enables the CentOS and Ubuntu images to be used as the runtime, as in my scenario.

The other Dockerfiles are not used (per the documentation) so I assume that those are indeed only for building and CI purposes.


Re: Missing packages in Docker image for `dcpm` deployments

Nabarro, Tom
 

Yes I think they probably need to be updated then, in our CI system I think they were originally created for distro build testing and possibly the packages weren’t available on all the target distros.

PR would be appreciated and we can get feedback from other team members on build/test/docker concerns.

 

Regards,

Tom Nabarro – DCG/ESAD

M: +44 (0)7786 260986

Skype: tom.nabarro

 

From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of Alex Barcelo via Groups.Io
Sent: Thursday, November 21, 2019 12:35 PM
To: daos@daos.groups.io
Subject: Re: [daos] Missing packages in Docker image for `dcpm` deployments

 

Hi Tom,

The Dockerfiles provided are for build, not runtime.

Well, according to: https://daos-stack.github.io/admin/installation/#running-daos-service-in-docker

it is also for runtime. The instructions build DAOS and then run everything inside that same container.

Is that right?

---------------------------------------------------------------------
Intel Corporation (UK) Limited
Registered No. 1134945 (England)
Registered Office: Pipers Way, Swindon SN3 1RJ
VAT No: 860 2173 47

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


Re: Missing packages in Docker image for `dcpm` deployments

Alex Barcelo
 

Hi Tom,

The Dockerfiles provided are for build, not runtime.

Well, according to: https://daos-stack.github.io/admin/installation/#running-daos-service-in-docker

it is also for runtime. The instructions build DAOS and then run everything inside that same container.

Is that right?


Re: Missing packages in Docker image for `dcpm` deployments

Nabarro, Tom
 

Hello Alex,

 

The Dockerfiles provided are for build, not runtime.

The package dependencies you are referring to are enforced in the RPM daos.spec IIRC.

 

Regards,

Tom Nabarro – DCG/ESAD

M: +44 (0)7786 260986

Skype: tom.nabarro

 

From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of Alex Barcelo via Groups.Io
Sent: Thursday, November 21, 2019 11:01 AM
To: daos@daos.groups.io
Subject: [daos] Missing packages in Docker image for `dcpm` deployments

 

I was trying to use the Docker build instructions to use that in a machine that has OptaneDC persistent memory modules, and I found that, at least the following packages seem to be missing:

  • ndctl
  • ipmctl

If those packages are not installed, the orterun command ends up hanging or failing altogether.

I believe that the proper way to handle that would be to add the installation of them into the daos/utils/docker/Dockerfile* files. Could this be patched into the repository? May I do a PR? Or I am looking into my problem from the wrong perspective maybe?

---------------------------------------------------------------------
Intel Corporation (UK) Limited
Registered No. 1134945 (England)
Registered Office: Pipers Way, Swindon SN3 1RJ
VAT No: 860 2173 47

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


Missing packages in Docker image for `dcpm` deployments

Alex Barcelo
 

I was trying to use the Docker build instructions to use that in a machine that has OptaneDC persistent memory modules, and I found that, at least the following packages seem to be missing:

  • ndctl
  • ipmctl

If those packages are not installed, the orterun command ends up hanging or failing altogether.

I believe that the proper way to handle that would be to add the installation of them into the daos/utils/docker/Dockerfile* files. Could this be patched into the repository? May I do a PR? Or I am looking into my problem from the wrong perspective maybe?


Re: [External] Re: [daos] Does DAOS support infiniband now?

Shengyu SY19 Zhang
 

Hello,

Thank you for your help Alex, Joel and Kevin, I have checked those steps that you provided:

Ibstat:
State: Active
Physical state: LinkUp

Ifconfig:
flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 2044

fi_info:
verbs:
version: 1.0
ofi_rxm:
version: 1.0
ofi_rxd:
version: 1.0

And network is good since I can run SPDK NVMe-oF over Infiniband with good working.
I also specified "ofi+verbs;ofi_rxm", the same error occurred, the ioserver will be stopped after a while, and print log as I provided previously.

And I noticed, whatever I specify ofi+verbs, ofi_rxm, or ofi+verbs;ofi_rxm, the log keep shows No provider found for "verbs;ofi_rxm" provider on domain "ib0", is it the cause?

BTW: it is working under ofi+sockets.


Regards,
Shengyu.

-----Original Message-----
From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of Oganezov, Alexander A
Sent: Thursday, November 21, 2019 7:13 AM
To: daos@daos.groups.io
Subject: [External] Re: [daos] Does DAOS support infiniband now?

Hi Shengyu,

However if I specify either ofi+verbs or ofi_rxm, the same error will happen, and io_server will stop.
na_ofi.c:1609
# na_ofi_domain_open(): No provider found for "verbs;ofi_rxm" provider on domain "ib0"
To use supported verbs provider you need to have "ofi+verbs;ofi_rxm" in the provider string.

~~Alex.

-----Original Message-----
From: daos@daos.groups.io [mailto:daos@daos.groups.io] On Behalf Of Rosenzweig, Joel B
Sent: Wednesday, November 20, 2019 7:37 AM
To: daos@daos.groups.io
Subject: Re: [daos] Does DAOS support infiniband now?

Hi Shengyu,

The daos_server network scan uses information provided by libfabric to determine available devices and providers. It then cross references that list of devices with device names obtained from hwloc to convert libfabric device names (as necessary) to those you'd find via ifconfig. Therefore, if "daos_server network scan" displays a device and provider, it means that support for that via libfabric has been provided. However, as Kevin pointed out, it's possible that the device itself was down, and that could certainly generate an error like what you encountered. There's another possibility, that you might have more than one version of libfabric installed in your environment. I have run into this situation in our lab environment. You might check your target system to see if it has more than one libfabric library with different provider support.

Regards,
Joel

-----Original Message-----
From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of Harms, Kevin via Groups.Io
Sent: Wednesday, November 20, 2019 10:04 AM
To: daos@daos.groups.io
Subject: Re: [daos] Does DAOS support infiniband now?

Shengyu,

I have tried IB and it works. Verify the libfabric verbs provider is available.

fi_info -l

you should see these:

ofi\_rxm:
version: 1.0

verbs:
version: 1.0

See here for details:

https://daos-stack.github.io/admin/deployment/#network-interface-detection-and-selection

You might also want to confirm ib0 is in the UP state:

[root@daos01 ~]# ifconfig ib0
ib0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 4092
inet 172.25.6.101 netmask 255.255.0.0 broadcast 172.25.255.255

kevin

________________________________________
From: daos@daos.groups.io <daos@daos.groups.io> on behalf of Shengyu SY19 Zhang <zhangsy19@...>
Sent: Wednesday, November 20, 2019 2:54 AM
To: daos@daos.groups.io
Subject: [daos] Does DAOS support infiniband now?

Hello,

I use daos_server network scan, it shows as following:
fabric_iface: ib0
provider: ofi+verbs;ofi_rxm
pinned_numa_node: 1

However if I specify either ofi+verbs or ofi_rxm, the same error will happen, and io_server will stop.
na_ofi.c:1609
# na_ofi_domain_open(): No provider found for "verbs;ofi_rxm" provider on domain "ib0"

The ib0 is Mellanox nic over Infiniband network.

Regards,
Shengyu.


SPDK upgrade

Nabarro, Tom
 

Since landing commit b17231aa88d84e5b733b2ea2a97be0038776de71 to upgrade the version of SPDK used by DAOS (to v19.04.1), when building from source please remove your SPDK external build directory (daos/_build.externals/spdk or equivalent if specifying build prefix) in order to pull in the updated version (if problems persist, remove any spdk libraries from daos/install/lib/).

 

Whilst we have been careful to perform a reasonable amount of validation with the updated version of SPDK, there is obviously a chance of unexpected behavior having been introduced. Please don't hesitate to report bugs on the usual channels.

 

Best regards,

Tom Nabarro BEng (hons)

Extreme Storage Architecture & Development

Intel Corporation

E: tom.nabarro@...

M: +44 (0)7786 260986

Skype: tom.nabarro

---------------------------------------------------------------------
Intel Corporation (UK) Limited
Registered No. 1134945 (England)
Registered Office: Pipers Way, Swindon SN3 1RJ
VAT No: 860 2173 47

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


Re: Does DAOS support infiniband now?

Oganezov, Alexander A
 

Hi Shengyu,

However if I specify either ofi+verbs or ofi_rxm, the same error will happen, and io_server will stop.
na_ofi.c:1609
# na_ofi_domain_open(): No provider found for "verbs;ofi_rxm" provider on domain "ib0"
To use supported verbs provider you need to have "ofi+verbs;ofi_rxm" in the provider string.

~~Alex.

-----Original Message-----
From: daos@daos.groups.io [mailto:daos@daos.groups.io] On Behalf Of Rosenzweig, Joel B
Sent: Wednesday, November 20, 2019 7:37 AM
To: daos@daos.groups.io
Subject: Re: [daos] Does DAOS support infiniband now?

Hi Shengyu,

The daos_server network scan uses information provided by libfabric to determine available devices and providers. It then cross references that list of devices with device names obtained from hwloc to convert libfabric device names (as necessary) to those you'd find via ifconfig. Therefore, if "daos_server network scan" displays a device and provider, it means that support for that via libfabric has been provided. However, as Kevin pointed out, it's possible that the device itself was down, and that could certainly generate an error like what you encountered. There's another possibility, that you might have more than one version of libfabric installed in your environment. I have run into this situation in our lab environment. You might check your target system to see if it has more than one libfabric library with different provider support.

Regards,
Joel

-----Original Message-----
From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of Harms, Kevin via Groups.Io
Sent: Wednesday, November 20, 2019 10:04 AM
To: daos@daos.groups.io
Subject: Re: [daos] Does DAOS support infiniband now?

Shengyu,

I have tried IB and it works. Verify the libfabric verbs provider is available.

fi_info -l

you should see these:

ofi\_rxm:
version: 1.0

verbs:
version: 1.0

See here for details:

https://daos-stack.github.io/admin/deployment/#network-interface-detection-and-selection

You might also want to confirm ib0 is in the UP state:

[root@daos01 ~]# ifconfig ib0
ib0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 4092
inet 172.25.6.101 netmask 255.255.0.0 broadcast 172.25.255.255

kevin

________________________________________
From: daos@daos.groups.io <daos@daos.groups.io> on behalf of Shengyu SY19 Zhang <zhangsy19@...>
Sent: Wednesday, November 20, 2019 2:54 AM
To: daos@daos.groups.io
Subject: [daos] Does DAOS support infiniband now?

Hello,

I use daos_server network scan, it shows as following:
fabric_iface: ib0
provider: ofi+verbs;ofi_rxm
pinned_numa_node: 1

However if I specify either ofi+verbs or ofi_rxm, the same error will happen, and io_server will stop.
na_ofi.c:1609
# na_ofi_domain_open(): No provider found for "verbs;ofi_rxm" provider on domain "ib0"

The ib0 is Mellanox nic over Infiniband network.

Regards,
Shengyu.

1341 - 1360 of 1624