Re: DKey/AKey/value count/size limitations
Patrick Farrell <paf@...>
Johann,
Thanks.
One question for you - Referring to moving to daos.io. Currently, that's the user manual, and the placement readme is internals documentation.
Is the intention to migrate away from the readmes in the source for internals docs? I ask just because it seems to be working well - Despite the placement doc being out of date, in general the readmes seem to be updated as code changes. It would be a shame
to lose that.
- Patrick
From: daos@daos.groups.io <daos@daos.groups.io> on behalf of Lombardi, Johann <johann.lombardi@...>
Sent: Monday, November 25, 2019 3:40 AM To: daos@daos.groups.io <daos@daos.groups.io>; Olivier, Jeffrey V <jeffrey.v.olivier@...> Subject: Re: [daos] DKey/AKey/value count/size limitations Our developer documentation has some basic text describing redundancy groups: https://github.com/daos-stack/daos/blob/master/src/placement/README.md#ring-placement-map That being said, this is in the context of ring placement and we have since moved to jump consistent hash. We need to update the documentation and move it to http://daos.io.
Data stored under a dkey is indeed limited by the size of a pool shard. This might also cause the space usage to be unbalanced across targets if you have a non-widely-striped object with a lot of dkeys. That’s why we have a feature called “progressive layout” in the roadmap to reshard an object dynamically over more and more targets as the number of dkey increases. This will be based on GIGA+ (https://www.pdl.cmu.edu/PDL-FTP/PDSI/CMU-PDL-08-110.pdf).
Cheers, Johann
From:
<daos@daos.groups.io> on behalf of Patrick Farrell <paf@...>
Thanks, Jeff - That seems reasonable.
A related question for you: You used the term redundancy group. Is the concept of a redundancy group detailed a bit anywhere in the documentation/readmes? It's used in a number of places, but I have not found a place which describes the exact definition or its role. I feel like I must be missing something - the documentation is in general very thorough - but I've done a little digging and come up empty on this one. From: daos@daos.groups.io <daos@daos.groups.io> on behalf of Olivier, Jeffrey V <jeffrey.v.olivier@...>
Hi Patrick,
There are no built in limits on how much data you can associate with a single dkey. There are practical limitations though since data under a dkey maps to a particular redundancy group and the associated storage targets are limited by the amount of local storage allocated to them.
We don’t do anything to prevent a client from stuffing everything under a single dkey using the DAOS API and I don’t believe we have any plans to mitigate that problem. However, it is mitigated somewhat by the fact that a given client will have a storage allocation that will be isolated from clients in other projects that are using different pools. A client will not be able to exceed the capacity allocated to their pool/project.
-Jeff
From:
<daos@daos.groups.io> on behalf of Patrick Farrell <paf@...>
Good afternoon, The concern here is the concept of a client "stuffing" a given DKey, adding more and more data to it, and therefore causing an imbalance in space usage between servers, potentially running one target in a pool
out of space. --------------------------------------------------------------------- This e-mail and any attachments may contain confidential material for
|
|
Re: [External] Re: [daos] Does DAOS support infiniband now?
Shengyu SY19 Zhang
Hello Joel,
toggle quoted messageShow quoted text
As the shown in the output.log, there is only one version of libfabrics installed in my machine, and actually I don't nave other software which depends libfabraics installed. From you guide to set FI_LOG_LEVEL=debug, I can see the following message, may be helpful: libfabric:123445:verbs:fabric:fi_ibv_set_default_attr():1263<info> Ignoring provider default value for tx rma_iov_limit as it is greater than the value supported by domain: mlx5_0 libfabric:123445:verbs:fabric:fi_ibv_get_matching_info():1365<info> hints->ep_attr->rx_ctx_cnt != FI_SHARED_CONTEXT. Skipping XRC FI_EP_MSG endpoints ERROR: daos_io_server:0 libfabric:123445:verbs:core:fi_ibv_check_hints():231<info> Unsupported capabilities libfabric:123445:verbs:core:fi_ibv_check_hints():232<info> Supported: FI_MSG, FI_RECV, FI_SEND, FI_LOCAL_COMM, FI_REMOTE_COMM libfabric:123445:verbs:core:fi_ibv_check_hints():232<info> Requested: FI_MSG, FI_RMA, FI_READ, FI_RECV, FI_SEND, FI_REMOTE_READ ERROR: daos_io_server:0 libfabric:123445:verbs:fabric:fi_ibv_get_rai_id():179<info> rdma_bind_addr: No such device(19) ERROR: daos_io_server:0 libfabric:123445:verbs:fabric:fi_ibv_get_rai_id():179<info> rdma_bind_addr: No such device(19) ERROR: daos_io_server:0 libfabric:123445:verbs:fabric:fi_ibv_get_rai_id():179<info> rdma_bind_addr: Invalid argument(22) ERROR: daos_io_server:0 libfabric:123445:verbs:fabric:fi_ibv_get_rai_id():179<info> rdma_bind_addr: Invalid argument(22) ERROR: daos_io_server:0 libfabric:123445:verbs:fabric:fi_ibv_get_rai_id():179<info> rdma_bind_addr: Invalid argument(22) ERROR: daos_io_server:0 libfabric:123445:core:core:ofi_layering_ok():795<info> Need core provider, skipping ofi_rxd libfabric:123445:core:core:ofi_layering_ok():795<info> Need core provider, skipping ofi_mrail Regards, Shengyu.
-----Original Message-----
From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of Rosenzweig, Joel B Sent: Saturday, November 23, 2019 3:20 AM To: daos@daos.groups.io Subject: Re: [External] Re: [daos] Does DAOS support infiniband now? Hi Shengyu, The debug output showed me that when daos_server is started via orterun, libfabric is not finding provider support for ofi_rxm at least. I'm still wondering if you have two different versions of libfabric installed on your machine. Can you run these commands and provide the output? 1) ldd install/bin/daos_server 2) modify your orterun command to run ldd on daos_server. For example, I run this command locally: orterun --allow-run-as-root --map-by node --mca btl tcp,self --mca oob tcp -np 1 --hostfile /home/jbrosenz/daos/hostfile --enable-recovery --report-uri /tmp/urifile ldd /home/jbrosenz/daos/install/bin/daos_server 3) which fi_info 4) ldd over each version of fi_info found From the data you provide, I'll understand if the libfabric being used by daos_server when executed directly by you in the shell is the same libfabric being used by daos_server when executed via orterun. Your original "daos_server network scan" output showed support for ofi+verbs;ofi_rxm but your debug output showed that when daos_server was started (via orterun), libfabric could not find support for the very same providers. If there are two different versions being used with different configurations, it would explain the failure. If it's a single installation/configuration, then that will lead the debug in another direction. Depending on what you find through 1-4, you might find it helpful to export the environment variable FI_LOG_LEVEL=debug which will instruct libfabric to output a good deal of debug info. Regards, Joel -----Original Message----- From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of Shengyu SY19 Zhang Sent: Friday, November 22, 2019 12:59 AM To: daos@daos.groups.io Subject: Re: [External] Re: [daos] Does DAOS support infiniband now? Hello Joel, Please see those files in attachment. I have tried two machines, one have full provider shows in fi_info (verbs and rxm), another doesn't show verbs, but they are same can't start io_server. I found the project conflicts with mellanox drivers, therefor I remove it and use yum package only, however still keep not working. Regards, Shengyu -----Original Message----- From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of Rosenzweig, Joel B Sent: Friday, November 22, 2019 6:35 AM To: daos@daos.groups.io Subject: Re: [External] Re: [daos] Does DAOS support infiniband now? Hi Shengyu, Can you share your daos_server.yml so we can see how you enabled the provider? And, can you share the log files daos_control.log and server.log so we can see more context? Thank you, Joel -----Original Message----- From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of Shengyu SY19 Zhang Sent: Wednesday, November 20, 2019 9:23 PM To: daos@daos.groups.io Subject: Re: [External] Re: [daos] Does DAOS support infiniband now? Hello, Thank you for your help Alex, Joel and Kevin, I have checked those steps that you provided: Ibstat: State: Active Physical state: LinkUp Ifconfig: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 2044 fi_info: verbs: version: 1.0 ofi_rxm: version: 1.0 ofi_rxd: version: 1.0 And network is good since I can run SPDK NVMe-oF over Infiniband with good working. I also specified "ofi+verbs;ofi_rxm", the same error occurred, the ioserver will be stopped after a while, and print log as I provided previously. And I noticed, whatever I specify ofi+verbs, ofi_rxm, or ofi+verbs;ofi_rxm, the log keep shows No provider found for "verbs;ofi_rxm" provider on domain "ib0", is it the cause? BTW: it is working under ofi+sockets. Regards, Shengyu. -----Original Message----- From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of Oganezov, Alexander A Sent: Thursday, November 21, 2019 7:13 AM To: daos@daos.groups.io Subject: [External] Re: [daos] Does DAOS support infiniband now? Hi Shengyu, However if I specify either ofi+verbs or ofi_rxm, the same error will happen, and io_server will stop.To use supported verbs provider you need to have "ofi+verbs;ofi_rxm" in the provider string. ~~Alex. -----Original Message----- From: daos@daos.groups.io [mailto:daos@daos.groups.io] On Behalf Of Rosenzweig, Joel B Sent: Wednesday, November 20, 2019 7:37 AM To: daos@daos.groups.io Subject: Re: [daos] Does DAOS support infiniband now? Hi Shengyu, The daos_server network scan uses information provided by libfabric to determine available devices and providers. It then cross references that list of devices with device names obtained from hwloc to convert libfabric device names (as necessary) to those you'd find via ifconfig. Therefore, if "daos_server network scan" displays a device and provider, it means that support for that via libfabric has been provided. However, as Kevin pointed out, it's possible that the device itself was down, and that could certainly generate an error like what you encountered. There's another possibility, that you might have more than one version of libfabric installed in your environment. I have run into this situation in our lab environment. You might check your target system to see if it has more than one libfabric library with different provider support. Regards, Joel -----Original Message----- From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of Harms, Kevin via Groups.Io Sent: Wednesday, November 20, 2019 10:04 AM To: daos@daos.groups.io Subject: Re: [daos] Does DAOS support infiniband now? Shengyu, I have tried IB and it works. Verify the libfabric verbs provider is available. fi_info -l you should see these: ofi\_rxm: version: 1.0 verbs: version: 1.0 See here for details: https://daos-stack.github.io/admin/deployment/#network-interface-detection-and-selection You might also want to confirm ib0 is in the UP state: [root@daos01 ~]# ifconfig ib0 ib0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 4092 inet 172.25.6.101 netmask 255.255.0.0 broadcast 172.25.255.255 kevin ________________________________________ From: daos@daos.groups.io <daos@daos.groups.io> on behalf of Shengyu SY19 Zhang <zhangsy19@...> Sent: Wednesday, November 20, 2019 2:54 AM To: daos@daos.groups.io Subject: [daos] Does DAOS support infiniband now? Hello, I use daos_server network scan, it shows as following: fabric_iface: ib0 provider: ofi+verbs;ofi_rxm pinned_numa_node: 1 However if I specify either ofi+verbs or ofi_rxm, the same error will happen, and io_server will stop. na_ofi.c:1609 # na_ofi_domain_open(): No provider found for "verbs;ofi_rxm" provider on domain "ib0" The ib0 is Mellanox nic over Infiniband network. Regards, Shengyu.
|
|
Re: DKey/AKey/value count/size limitations
Lombardi, Johann
Our developer documentation has some basic text describing redundancy groups: https://github.com/daos-stack/daos/blob/master/src/placement/README.md#ring-placement-map That being said, this is in the context of ring placement and we have since moved to jump consistent hash. We need to update the documentation and move it to http://daos.io.
Data stored under a dkey is indeed limited by the size of a pool shard. This might also cause the space usage to be unbalanced across targets if you have a non-widely-striped object with a lot of dkeys. That’s why we have a feature called “progressive layout” in the roadmap to reshard an object dynamically over more and more targets as the number of dkey increases. This will be based on GIGA+ (https://www.pdl.cmu.edu/PDL-FTP/PDSI/CMU-PDL-08-110.pdf).
Cheers, Johann
From:
<daos@daos.groups.io> on behalf of Patrick Farrell <paf@...>
Thanks, Jeff - That seems reasonable.
A related question for you: You used the term redundancy group. Is the concept of a redundancy group detailed a bit anywhere in the documentation/readmes? It's used in a number of places, but I have not found a place which describes the exact definition or its role. I feel like I must be missing something - the documentation is in general very thorough - but I've done a little digging and come up empty on this one.
From: daos@daos.groups.io <daos@daos.groups.io> on behalf of Olivier, Jeffrey V <jeffrey.v.olivier@...>
Sent: Friday, November 22, 2019 3:04 PM To: daos@daos.groups.io <daos@daos.groups.io> Subject: Re: [daos] DKey/AKey/value count/size limitations
Hi Patrick,
There are no built in limits on how much data you can associate with a single dkey. There are practical limitations though since data under a dkey maps to a particular redundancy group and the associated storage targets are limited by the amount of local storage allocated to them.
We don’t do anything to prevent a client from stuffing everything under a single dkey using the DAOS API and I don’t believe we have any plans to mitigate that problem. However, it is mitigated somewhat by the fact that a given client will have a storage allocation that will be isolated from clients in other projects that are using different pools. A client will not be able to exceed the capacity allocated to their pool/project.
-Jeff
From:
<daos@daos.groups.io> on behalf of Patrick Farrell <paf@...>
Good afternoon, The concern here is the concept of a client "stuffing" a given DKey, adding more and more data to it, and therefore causing an imbalance in space usage between servers, potentially running one target in a pool
out of space. --------------------------------------------------------------------- This e-mail and any attachments may contain confidential material for
|
|
Re: DKey/AKey/value count/size limitations
Yes, it would ultimately be restricted by the size of a single shard.
From:
<daos@daos.groups.io> on behalf of Colin Ngam <cngam@...>
Hi Jeff,
Does that mean that the data of the Dkey is restricted by the size of a single Shard?
Thanks.
Colin
From:
<daos@daos.groups.io> on behalf of "Olivier, Jeffrey V" <jeffrey.v.olivier@...>
Hi Patrick,
There are no built in limits on how much data you can associate with a single dkey. There are practical limitations though since data under a dkey maps to a particular redundancy group and the associated storage targets are limited by the amount of local storage allocated to them.
We don’t do anything to prevent a client from stuffing everything under a single dkey using the DAOS API and I don’t believe we have any plans to mitigate that problem. However, it is mitigated somewhat by the fact that a given client will have a storage allocation that will be isolated from clients in other projects that are using different pools. A client will not be able to exceed the capacity allocated to their pool/project.
-Jeff
From:
<daos@daos.groups.io> on behalf of Patrick Farrell <paf@...>
Good afternoon, The concern here is the concept of a client "stuffing" a given DKey, adding more and more data to it, and therefore causing an imbalance in space usage between servers, potentially running one target in a pool out of space.
|
|
Re: DKey/AKey/value count/size limitations
Patrick Farrell <paf@...>
Thanks, Jeff - That seems reasonable.
A related question for you: You used the term redundancy group. Is the concept of a redundancy group detailed a bit anywhere in the documentation/readmes? It's used in a number of places, but I have not found a place which describes the exact definition
or its role. I feel like I must be missing something - the documentation is in general very thorough - but I've done a little digging and come up empty on this one.
From: daos@daos.groups.io <daos@daos.groups.io> on behalf of Olivier, Jeffrey V <jeffrey.v.olivier@...>
Sent: Friday, November 22, 2019 3:04 PM To: daos@daos.groups.io <daos@daos.groups.io> Subject: Re: [daos] DKey/AKey/value count/size limitations Hi Patrick,
There are no built in limits on how much data you can associate with a single dkey. There are practical limitations though since data under a dkey maps to a particular redundancy group and the associated storage targets are limited by the amount of local storage allocated to them.
We don’t do anything to prevent a client from stuffing everything under a single dkey using the DAOS API and I don’t believe we have any plans to mitigate that problem. However, it is mitigated somewhat by the fact that a given client will have a storage allocation that will be isolated from clients in other projects that are using different pools. A client will not be able to exceed the capacity allocated to their pool/project.
-Jeff
From:
<daos@daos.groups.io> on behalf of Patrick Farrell <paf@...>
Good afternoon, The concern here is the concept of a client "stuffing" a given DKey, adding more and more data to it, and therefore causing an imbalance in space usage between servers, potentially running one target in a pool out of space.
|
|
Re: DKey/AKey/value count/size limitations
Colin Ngam
Hi Jeff,
Does that mean that the data of the Dkey is restricted by the size of a single Shard?
Thanks.
Colin
From: <daos@daos.groups.io> on behalf of "Olivier, Jeffrey V" <jeffrey.v.olivier@...>
Hi Patrick,
There are no built in limits on how much data you can associate with a single dkey. There are practical limitations though since data under a dkey maps to a particular redundancy group and the associated storage targets are limited by the amount of local storage allocated to them.
We don’t do anything to prevent a client from stuffing everything under a single dkey using the DAOS API and I don’t believe we have any plans to mitigate that problem. However, it is mitigated somewhat by the fact that a given client will have a storage allocation that will be isolated from clients in other projects that are using different pools. A client will not be able to exceed the capacity allocated to their pool/project.
-Jeff
From:
<daos@daos.groups.io> on behalf of Patrick Farrell <paf@...>
Good afternoon, The concern here is the concept of a client "stuffing" a given DKey, adding more and more data to it, and therefore causing an imbalance in space usage between servers, potentially running one target in a pool out of space.
|
|
Re: DKey/AKey/value count/size limitations
Hi Patrick,
There are no built in limits on how much data you can associate with a single dkey. There are practical limitations though since data under a dkey maps to a particular redundancy group and the associated storage targets are limited by the amount of local storage allocated to them.
We don’t do anything to prevent a client from stuffing everything under a single dkey using the DAOS API and I don’t believe we have any plans to mitigate that problem. However, it is mitigated somewhat by the fact that a given client will have a storage allocation that will be isolated from clients in other projects that are using different pools. A client will not be able to exceed the capacity allocated to their pool/project.
-Jeff
From:
<daos@daos.groups.io> on behalf of Patrick Farrell <paf@...>
Good afternoon, The concern here is the concept of a client "stuffing" a given DKey, adding more and more data to it, and therefore causing an imbalance in space usage between servers, potentially running one target in a pool out of space.
|
|
Re: [External] Re: [daos] Does DAOS support infiniband now?
Rosenzweig, Joel B <joel.b.rosenzweig@...>
Hi Shengyu,
toggle quoted messageShow quoted text
The debug output showed me that when daos_server is started via orterun, libfabric is not finding provider support for ofi_rxm at least. I'm still wondering if you have two different versions of libfabric installed on your machine. Can you run these commands and provide the output? 1) ldd install/bin/daos_server 2) modify your orterun command to run ldd on daos_server. For example, I run this command locally: orterun --allow-run-as-root --map-by node --mca btl tcp,self --mca oob tcp -np 1 --hostfile /home/jbrosenz/daos/hostfile --enable-recovery --report-uri /tmp/urifile ldd /home/jbrosenz/daos/install/bin/daos_server 3) which fi_info 4) ldd over each version of fi_info found From the data you provide, I'll understand if the libfabric being used by daos_server when executed directly by you in the shell is the same libfabric being used by daos_server when executed via orterun. Your original "daos_server network scan" output showed support for ofi+verbs;ofi_rxm but your debug output showed that when daos_server was started (via orterun), libfabric could not find support for the very same providers. If there are two different versions being used with different configurations, it would explain the failure. If it's a single installation/configuration, then that will lead the debug in another direction. Depending on what you find through 1-4, you might find it helpful to export the environment variable FI_LOG_LEVEL=debug which will instruct libfabric to output a good deal of debug info. Regards, Joel
-----Original Message-----
From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of Shengyu SY19 Zhang Sent: Friday, November 22, 2019 12:59 AM To: daos@daos.groups.io Subject: Re: [External] Re: [daos] Does DAOS support infiniband now? Hello Joel, Please see those files in attachment. I have tried two machines, one have full provider shows in fi_info (verbs and rxm), another doesn't show verbs, but they are same can't start io_server. I found the project conflicts with mellanox drivers, therefor I remove it and use yum package only, however still keep not working. Regards, Shengyu -----Original Message----- From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of Rosenzweig, Joel B Sent: Friday, November 22, 2019 6:35 AM To: daos@daos.groups.io Subject: Re: [External] Re: [daos] Does DAOS support infiniband now? Hi Shengyu, Can you share your daos_server.yml so we can see how you enabled the provider? And, can you share the log files daos_control.log and server.log so we can see more context? Thank you, Joel -----Original Message----- From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of Shengyu SY19 Zhang Sent: Wednesday, November 20, 2019 9:23 PM To: daos@daos.groups.io Subject: Re: [External] Re: [daos] Does DAOS support infiniband now? Hello, Thank you for your help Alex, Joel and Kevin, I have checked those steps that you provided: Ibstat: State: Active Physical state: LinkUp Ifconfig: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 2044 fi_info: verbs: version: 1.0 ofi_rxm: version: 1.0 ofi_rxd: version: 1.0 And network is good since I can run SPDK NVMe-oF over Infiniband with good working. I also specified "ofi+verbs;ofi_rxm", the same error occurred, the ioserver will be stopped after a while, and print log as I provided previously. And I noticed, whatever I specify ofi+verbs, ofi_rxm, or ofi+verbs;ofi_rxm, the log keep shows No provider found for "verbs;ofi_rxm" provider on domain "ib0", is it the cause? BTW: it is working under ofi+sockets. Regards, Shengyu. -----Original Message----- From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of Oganezov, Alexander A Sent: Thursday, November 21, 2019 7:13 AM To: daos@daos.groups.io Subject: [External] Re: [daos] Does DAOS support infiniband now? Hi Shengyu, However if I specify either ofi+verbs or ofi_rxm, the same error will happen, and io_server will stop.To use supported verbs provider you need to have "ofi+verbs;ofi_rxm" in the provider string. ~~Alex. -----Original Message----- From: daos@daos.groups.io [mailto:daos@daos.groups.io] On Behalf Of Rosenzweig, Joel B Sent: Wednesday, November 20, 2019 7:37 AM To: daos@daos.groups.io Subject: Re: [daos] Does DAOS support infiniband now? Hi Shengyu, The daos_server network scan uses information provided by libfabric to determine available devices and providers. It then cross references that list of devices with device names obtained from hwloc to convert libfabric device names (as necessary) to those you'd find via ifconfig. Therefore, if "daos_server network scan" displays a device and provider, it means that support for that via libfabric has been provided. However, as Kevin pointed out, it's possible that the device itself was down, and that could certainly generate an error like what you encountered. There's another possibility, that you might have more than one version of libfabric installed in your environment. I have run into this situation in our lab environment. You might check your target system to see if it has more than one libfabric library with different provider support. Regards, Joel -----Original Message----- From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of Harms, Kevin via Groups.Io Sent: Wednesday, November 20, 2019 10:04 AM To: daos@daos.groups.io Subject: Re: [daos] Does DAOS support infiniband now? Shengyu, I have tried IB and it works. Verify the libfabric verbs provider is available. fi_info -l you should see these: ofi\_rxm: version: 1.0 verbs: version: 1.0 See here for details: https://daos-stack.github.io/admin/deployment/#network-interface-detection-and-selection You might also want to confirm ib0 is in the UP state: [root@daos01 ~]# ifconfig ib0 ib0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 4092 inet 172.25.6.101 netmask 255.255.0.0 broadcast 172.25.255.255 kevin ________________________________________ From: daos@daos.groups.io <daos@daos.groups.io> on behalf of Shengyu SY19 Zhang <zhangsy19@...> Sent: Wednesday, November 20, 2019 2:54 AM To: daos@daos.groups.io Subject: [daos] Does DAOS support infiniband now? Hello, I use daos_server network scan, it shows as following: fabric_iface: ib0 provider: ofi+verbs;ofi_rxm pinned_numa_node: 1 However if I specify either ofi+verbs or ofi_rxm, the same error will happen, and io_server will stop. na_ofi.c:1609 # na_ofi_domain_open(): No provider found for "verbs;ofi_rxm" provider on domain "ib0" The ib0 is Mellanox nic over Infiniband network. Regards, Shengyu.
|
|
DKey/AKey/value count/size limitations
Patrick Farrell <paf@...>
Good afternoon, The concern here is the concept of a client "stuffing" a given DKey, adding more and more data to it, and therefore causing an imbalance in space usage between servers, potentially running one target in a pool out of space.
|
|
Re: [External] Re: [daos] Does DAOS support infiniband now?
Shengyu SY19 Zhang
Hello Joel,
toggle quoted messageShow quoted text
Please see those files in attachment. I have tried two machines, one have full provider shows in fi_info (verbs and rxm), another doesn't show verbs, but they are same can't start io_server. I found the project conflicts with mellanox drivers, therefor I remove it and use yum package only, however still keep not working. Regards, Shengyu
-----Original Message-----
From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of Rosenzweig, Joel B Sent: Friday, November 22, 2019 6:35 AM To: daos@daos.groups.io Subject: Re: [External] Re: [daos] Does DAOS support infiniband now? Hi Shengyu, Can you share your daos_server.yml so we can see how you enabled the provider? And, can you share the log files daos_control.log and server.log so we can see more context? Thank you, Joel -----Original Message----- From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of Shengyu SY19 Zhang Sent: Wednesday, November 20, 2019 9:23 PM To: daos@daos.groups.io Subject: Re: [External] Re: [daos] Does DAOS support infiniband now? Hello, Thank you for your help Alex, Joel and Kevin, I have checked those steps that you provided: Ibstat: State: Active Physical state: LinkUp Ifconfig: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 2044 fi_info: verbs: version: 1.0 ofi_rxm: version: 1.0 ofi_rxd: version: 1.0 And network is good since I can run SPDK NVMe-oF over Infiniband with good working. I also specified "ofi+verbs;ofi_rxm", the same error occurred, the ioserver will be stopped after a while, and print log as I provided previously. And I noticed, whatever I specify ofi+verbs, ofi_rxm, or ofi+verbs;ofi_rxm, the log keep shows No provider found for "verbs;ofi_rxm" provider on domain "ib0", is it the cause? BTW: it is working under ofi+sockets. Regards, Shengyu. -----Original Message----- From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of Oganezov, Alexander A Sent: Thursday, November 21, 2019 7:13 AM To: daos@daos.groups.io Subject: [External] Re: [daos] Does DAOS support infiniband now? Hi Shengyu, However if I specify either ofi+verbs or ofi_rxm, the same error will happen, and io_server will stop.To use supported verbs provider you need to have "ofi+verbs;ofi_rxm" in the provider string. ~~Alex. -----Original Message----- From: daos@daos.groups.io [mailto:daos@daos.groups.io] On Behalf Of Rosenzweig, Joel B Sent: Wednesday, November 20, 2019 7:37 AM To: daos@daos.groups.io Subject: Re: [daos] Does DAOS support infiniband now? Hi Shengyu, The daos_server network scan uses information provided by libfabric to determine available devices and providers. It then cross references that list of devices with device names obtained from hwloc to convert libfabric device names (as necessary) to those you'd find via ifconfig. Therefore, if "daos_server network scan" displays a device and provider, it means that support for that via libfabric has been provided. However, as Kevin pointed out, it's possible that the device itself was down, and that could certainly generate an error like what you encountered. There's another possibility, that you might have more than one version of libfabric installed in your environment. I have run into this situation in our lab environment. You might check your target system to see if it has more than one libfabric library with different provider support. Regards, Joel -----Original Message----- From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of Harms, Kevin via Groups.Io Sent: Wednesday, November 20, 2019 10:04 AM To: daos@daos.groups.io Subject: Re: [daos] Does DAOS support infiniband now? Shengyu, I have tried IB and it works. Verify the libfabric verbs provider is available. fi_info -l you should see these: ofi\_rxm: version: 1.0 verbs: version: 1.0 See here for details: https://daos-stack.github.io/admin/deployment/#network-interface-detection-and-selection You might also want to confirm ib0 is in the UP state: [root@daos01 ~]# ifconfig ib0 ib0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 4092 inet 172.25.6.101 netmask 255.255.0.0 broadcast 172.25.255.255 kevin ________________________________________ From: daos@daos.groups.io <daos@daos.groups.io> on behalf of Shengyu SY19 Zhang <zhangsy19@...> Sent: Wednesday, November 20, 2019 2:54 AM To: daos@daos.groups.io Subject: [daos] Does DAOS support infiniband now? Hello, I use daos_server network scan, it shows as following: fabric_iface: ib0 provider: ofi+verbs;ofi_rxm pinned_numa_node: 1 However if I specify either ofi+verbs or ofi_rxm, the same error will happen, and io_server will stop. na_ofi.c:1609 # na_ofi_domain_open(): No provider found for "verbs;ofi_rxm" provider on domain "ib0" The ib0 is Mellanox nic over Infiniband network. Regards, Shengyu.
|
|
DUG'19 slides
Lombardi, Johann
Hi there,
All slide decks presented at the user group have been uploaded to the wiki: https://wiki.hpdd.intel.com/display/DC/DUG19 Many thanks to all the presenters!
Cheers, Johann --------------------------------------------------------------------- This e-mail and any attachments may contain confidential material for
|
|
Re: [External] Re: [daos] Does DAOS support infiniband now?
Rosenzweig, Joel B <joel.b.rosenzweig@...>
Hi Shengyu,
toggle quoted messageShow quoted text
Can you share your daos_server.yml so we can see how you enabled the provider? And, can you share the log files daos_control.log and server.log so we can see more context? Thank you, Joel
-----Original Message-----
From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of Shengyu SY19 Zhang Sent: Wednesday, November 20, 2019 9:23 PM To: daos@daos.groups.io Subject: Re: [External] Re: [daos] Does DAOS support infiniband now? Hello, Thank you for your help Alex, Joel and Kevin, I have checked those steps that you provided: Ibstat: State: Active Physical state: LinkUp Ifconfig: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 2044 fi_info: verbs: version: 1.0 ofi_rxm: version: 1.0 ofi_rxd: version: 1.0 And network is good since I can run SPDK NVMe-oF over Infiniband with good working. I also specified "ofi+verbs;ofi_rxm", the same error occurred, the ioserver will be stopped after a while, and print log as I provided previously. And I noticed, whatever I specify ofi+verbs, ofi_rxm, or ofi+verbs;ofi_rxm, the log keep shows No provider found for "verbs;ofi_rxm" provider on domain "ib0", is it the cause? BTW: it is working under ofi+sockets. Regards, Shengyu. -----Original Message----- From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of Oganezov, Alexander A Sent: Thursday, November 21, 2019 7:13 AM To: daos@daos.groups.io Subject: [External] Re: [daos] Does DAOS support infiniband now? Hi Shengyu, However if I specify either ofi+verbs or ofi_rxm, the same error will happen, and io_server will stop.To use supported verbs provider you need to have "ofi+verbs;ofi_rxm" in the provider string. ~~Alex. -----Original Message----- From: daos@daos.groups.io [mailto:daos@daos.groups.io] On Behalf Of Rosenzweig, Joel B Sent: Wednesday, November 20, 2019 7:37 AM To: daos@daos.groups.io Subject: Re: [daos] Does DAOS support infiniband now? Hi Shengyu, The daos_server network scan uses information provided by libfabric to determine available devices and providers. It then cross references that list of devices with device names obtained from hwloc to convert libfabric device names (as necessary) to those you'd find via ifconfig. Therefore, if "daos_server network scan" displays a device and provider, it means that support for that via libfabric has been provided. However, as Kevin pointed out, it's possible that the device itself was down, and that could certainly generate an error like what you encountered. There's another possibility, that you might have more than one version of libfabric installed in your environment. I have run into this situation in our lab environment. You might check your target system to see if it has more than one libfabric library with different provider support. Regards, Joel -----Original Message----- From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of Harms, Kevin via Groups.Io Sent: Wednesday, November 20, 2019 10:04 AM To: daos@daos.groups.io Subject: Re: [daos] Does DAOS support infiniband now? Shengyu, I have tried IB and it works. Verify the libfabric verbs provider is available. fi_info -l you should see these: ofi\_rxm: version: 1.0 verbs: version: 1.0 See here for details: https://daos-stack.github.io/admin/deployment/#network-interface-detection-and-selection You might also want to confirm ib0 is in the UP state: [root@daos01 ~]# ifconfig ib0 ib0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 4092 inet 172.25.6.101 netmask 255.255.0.0 broadcast 172.25.255.255 kevin ________________________________________ From: daos@daos.groups.io <daos@daos.groups.io> on behalf of Shengyu SY19 Zhang <zhangsy19@...> Sent: Wednesday, November 20, 2019 2:54 AM To: daos@daos.groups.io Subject: [daos] Does DAOS support infiniband now? Hello, I use daos_server network scan, it shows as following: fabric_iface: ib0 provider: ofi+verbs;ofi_rxm pinned_numa_node: 1 However if I specify either ofi+verbs or ofi_rxm, the same error will happen, and io_server will stop. na_ofi.c:1609 # na_ofi_domain_open(): No provider found for "verbs;ofi_rxm" provider on domain "ib0" The ib0 is Mellanox nic over Infiniband network. Regards, Shengyu.
|
|
Re: Missing packages in Docker image for `dcpm` deployments
Alex Barcelo
So I just pushed the minimal PR #1459 which enables the CentOS and Ubuntu images to be used as the runtime, as in my scenario. The other Dockerfiles are not used (per the documentation) so I assume that those are indeed only for building and CI purposes.
|
|
Re: Missing packages in Docker image for `dcpm` deployments
Yes I think they probably need to be updated then, in our CI system I think they were originally created for distro build testing and possibly the packages weren’t available on all the target distros. PR would be appreciated and we can get feedback from other team members on build/test/docker concerns.
Regards, Tom Nabarro – DCG/ESAD M: +44 (0)7786 260986 Skype: tom.nabarro
From: daos@daos.groups.io <daos@daos.groups.io>
On Behalf Of Alex Barcelo via Groups.Io
Sent: Thursday, November 21, 2019 12:35 PM To: daos@daos.groups.io Subject: Re: [daos] Missing packages in Docker image for `dcpm` deployments
Hi Tom,
Well, according to: https://daos-stack.github.io/admin/installation/#running-daos-service-in-docker it is also for runtime. The instructions build DAOS and then run everything inside that same container. Is that right? --------------------------------------------------------------------- This e-mail and any attachments may contain confidential material for
|
|
Re: Missing packages in Docker image for `dcpm` deployments
Alex Barcelo
Hi Tom,
Well, according to: https://daos-stack.github.io/admin/installation/#running-daos-service-in-docker it is also for runtime. The instructions build DAOS and then run everything inside that same container. Is that right?
|
|
Re: Missing packages in Docker image for `dcpm` deployments
Hello Alex,
The Dockerfiles provided are for build, not runtime. The package dependencies you are referring to are enforced in the RPM daos.spec IIRC.
Regards, Tom Nabarro – DCG/ESAD M: +44 (0)7786 260986 Skype: tom.nabarro
From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of
Alex Barcelo via Groups.Io
Sent: Thursday, November 21, 2019 11:01 AM To: daos@daos.groups.io Subject: [daos] Missing packages in Docker image for `dcpm` deployments
I was trying to use the Docker build instructions to use that in a machine that has OptaneDC persistent memory modules, and I found that, at least the following packages seem to be missing:
If those packages are not installed, the I believe that the proper way to handle that would be to add the installation of them into the
--------------------------------------------------------------------- This e-mail and any attachments may contain confidential material for
|
|
Missing packages in Docker image for `dcpm` deployments
Alex Barcelo
I was trying to use the Docker build instructions to use that in a machine that has OptaneDC persistent memory modules, and I found that, at least the following packages seem to be missing:
If those packages are not installed, the I believe that the proper way to handle that would be to add the installation of them into the
|
|
Re: [External] Re: [daos] Does DAOS support infiniband now?
Shengyu SY19 Zhang
Hello,
toggle quoted messageShow quoted text
Thank you for your help Alex, Joel and Kevin, I have checked those steps that you provided: Ibstat: State: Active Physical state: LinkUp Ifconfig: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 2044 fi_info: verbs: version: 1.0 ofi_rxm: version: 1.0 ofi_rxd: version: 1.0 And network is good since I can run SPDK NVMe-oF over Infiniband with good working. I also specified "ofi+verbs;ofi_rxm", the same error occurred, the ioserver will be stopped after a while, and print log as I provided previously. And I noticed, whatever I specify ofi+verbs, ofi_rxm, or ofi+verbs;ofi_rxm, the log keep shows No provider found for "verbs;ofi_rxm" provider on domain "ib0", is it the cause? BTW: it is working under ofi+sockets. Regards, Shengyu.
-----Original Message-----
From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of Oganezov, Alexander A Sent: Thursday, November 21, 2019 7:13 AM To: daos@daos.groups.io Subject: [External] Re: [daos] Does DAOS support infiniband now? Hi Shengyu, However if I specify either ofi+verbs or ofi_rxm, the same error will happen, and io_server will stop.To use supported verbs provider you need to have "ofi+verbs;ofi_rxm" in the provider string. ~~Alex. -----Original Message----- From: daos@daos.groups.io [mailto:daos@daos.groups.io] On Behalf Of Rosenzweig, Joel B Sent: Wednesday, November 20, 2019 7:37 AM To: daos@daos.groups.io Subject: Re: [daos] Does DAOS support infiniband now? Hi Shengyu, The daos_server network scan uses information provided by libfabric to determine available devices and providers. It then cross references that list of devices with device names obtained from hwloc to convert libfabric device names (as necessary) to those you'd find via ifconfig. Therefore, if "daos_server network scan" displays a device and provider, it means that support for that via libfabric has been provided. However, as Kevin pointed out, it's possible that the device itself was down, and that could certainly generate an error like what you encountered. There's another possibility, that you might have more than one version of libfabric installed in your environment. I have run into this situation in our lab environment. You might check your target system to see if it has more than one libfabric library with different provider support. Regards, Joel -----Original Message----- From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of Harms, Kevin via Groups.Io Sent: Wednesday, November 20, 2019 10:04 AM To: daos@daos.groups.io Subject: Re: [daos] Does DAOS support infiniband now? Shengyu, I have tried IB and it works. Verify the libfabric verbs provider is available. fi_info -l you should see these: ofi\_rxm: version: 1.0 verbs: version: 1.0 See here for details: https://daos-stack.github.io/admin/deployment/#network-interface-detection-and-selection You might also want to confirm ib0 is in the UP state: [root@daos01 ~]# ifconfig ib0 ib0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 4092 inet 172.25.6.101 netmask 255.255.0.0 broadcast 172.25.255.255 kevin ________________________________________ From: daos@daos.groups.io <daos@daos.groups.io> on behalf of Shengyu SY19 Zhang <zhangsy19@...> Sent: Wednesday, November 20, 2019 2:54 AM To: daos@daos.groups.io Subject: [daos] Does DAOS support infiniband now? Hello, I use daos_server network scan, it shows as following: fabric_iface: ib0 provider: ofi+verbs;ofi_rxm pinned_numa_node: 1 However if I specify either ofi+verbs or ofi_rxm, the same error will happen, and io_server will stop. na_ofi.c:1609 # na_ofi_domain_open(): No provider found for "verbs;ofi_rxm" provider on domain "ib0" The ib0 is Mellanox nic over Infiniband network. Regards, Shengyu.
|
|
SPDK upgrade
Since landing commit b17231aa88d84e5b733b2ea2a97be0038776de71 to upgrade the version of SPDK used by DAOS (to v19.04.1), when building from source please remove your SPDK external build directory (daos/_build.externals/spdk or equivalent if specifying build prefix) in order to pull in the updated version (if problems persist, remove any spdk libraries from daos/install/lib/).
Whilst we have been careful to perform a reasonable amount of validation with the updated version of SPDK, there is obviously a chance of unexpected behavior having been introduced. Please don't hesitate to report bugs on the usual channels.
Best regards, Tom Nabarro BEng (hons) Extreme Storage Architecture & Development Intel Corporation M: +44 (0)7786 260986 Skype: tom.nabarro --------------------------------------------------------------------- This e-mail and any attachments may contain confidential material for
|
|
Re: Does DAOS support infiniband now?
Oganezov, Alexander A
Hi Shengyu,
However if I specify either ofi+verbs or ofi_rxm, the same error will happen, and io_server will stop.To use supported verbs provider you need to have "ofi+verbs;ofi_rxm" in the provider string. ~~Alex. -----Original Message----- From: daos@daos.groups.io [mailto:daos@daos.groups.io] On Behalf Of Rosenzweig, Joel B Sent: Wednesday, November 20, 2019 7:37 AM To: daos@daos.groups.io Subject: Re: [daos] Does DAOS support infiniband now? Hi Shengyu, The daos_server network scan uses information provided by libfabric to determine available devices and providers. It then cross references that list of devices with device names obtained from hwloc to convert libfabric device names (as necessary) to those you'd find via ifconfig. Therefore, if "daos_server network scan" displays a device and provider, it means that support for that via libfabric has been provided. However, as Kevin pointed out, it's possible that the device itself was down, and that could certainly generate an error like what you encountered. There's another possibility, that you might have more than one version of libfabric installed in your environment. I have run into this situation in our lab environment. You might check your target system to see if it has more than one libfabric library with different provider support. Regards, Joel -----Original Message----- From: daos@daos.groups.io <daos@daos.groups.io> On Behalf Of Harms, Kevin via Groups.Io Sent: Wednesday, November 20, 2019 10:04 AM To: daos@daos.groups.io Subject: Re: [daos] Does DAOS support infiniband now? Shengyu, I have tried IB and it works. Verify the libfabric verbs provider is available. fi_info -l you should see these: ofi\_rxm: version: 1.0 verbs: version: 1.0 See here for details: https://daos-stack.github.io/admin/deployment/#network-interface-detection-and-selection You might also want to confirm ib0 is in the UP state: [root@daos01 ~]# ifconfig ib0 ib0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 4092 inet 172.25.6.101 netmask 255.255.0.0 broadcast 172.25.255.255 kevin ________________________________________ From: daos@daos.groups.io <daos@daos.groups.io> on behalf of Shengyu SY19 Zhang <zhangsy19@...> Sent: Wednesday, November 20, 2019 2:54 AM To: daos@daos.groups.io Subject: [daos] Does DAOS support infiniband now? Hello, I use daos_server network scan, it shows as following: fabric_iface: ib0 provider: ofi+verbs;ofi_rxm pinned_numa_node: 1 However if I specify either ofi+verbs or ofi_rxm, the same error will happen, and io_server will stop. na_ofi.c:1609 # na_ofi_domain_open(): No provider found for "verbs;ofi_rxm" provider on domain "ib0" The ib0 is Mellanox nic over Infiniband network. Regards, Shengyu.
|
|