Hi there,
Please find below the DAOS community newsletter for June 2022.
Past Events
- Salishan Conference On High Speed Computing (April 26th)
Accelerating Data-driven Workflows with DAOS
https://salishan.ahsc-nm.org/program4.html
Johann Lombardi (Intel)
- ECP BoF (May 11th at 11am Eastern time)
DAOS Next Generation Storage
https://www.exascaleproject.org/event/ecp-community-bof-days-2022/ (registration
is open)
Kevin Harms (ANL)
Mohamad Chaarawi (Intel)
Johann Lombardi (Intel)
Advanced Storage and Memory Hierarchy in AI and HPC with DAOS Storage
https://reg.oneventseries.intel.com/flow/intel/vision2022/portal-live/page/session-catalog?tab.day=20220511&search=AIML009
Andrey Kudryavtsev (Intel)
- SODACODE 2022 Keynote (May 25th)
Advanced Storage and Memory Hierarchy in AI and HPC with DAOS Storage
Andrey Kudryavtsev (Intel)
- SODACODE 2022 Breakout Session (May 26th)
Accelerating AI with DAOS Storage
Johann Lombardi (Intel)
- ISC’22 BoF (May 30th at 4pm CEST)
Accelerating HPC and AI with DAOS Storage
https://app.swapcard.com/widget/event/isc-high-performance-2022/planning/UGxhbm5pbmdfODYxMTYx
Kevin Harms (ANL)
Adrian Jackson (EPCC)
Michael Hennecke (Intel)
Mohamad Chaarawi (Intel)
Johann Lombardi (Intel)
- ISC’22 IXPUG (June 2nd at 9am CEST)
DAOS Features for Next Generation Platforms
https://www.ixpug.org/events/isc22-ixpug-workshop
Mohamad Chaarawi (Intel)
- ISC’22 Intel Booth (May 30th to June 1st)
DAOS demonstration, Fireside chats, …
Upcoming Events
- ESSA’22: 3rd Workshop on Extreme-Scale Storage and Analysis (June
3rd)
DAOS: Nextgen Storage Stack for HPC and AI
https://sites.google.com/view/essa-2022/
Johann Lombardi (Intel)
- EMOSS’22: Emerging Open Storage Systems and Solutions for Data Intensive Computing (July 1st)
One big happy family: sharing the S3 layer between Ceph, CORTX, and DAOS
https://iosea-project.eu/event/emoss-22-workshop/
Zuhair AlSader (Seagate)
Andriy Tkachuk (Seagate)
Release
- Current stable release is 2.0.2. See https://docs.daos.io/v2.0/ and https://packages.daos.io/v2.0/ for
more information.
- Branches:
- release/2.0 is the release branch for the stable 2.0 release. Latest bug fix release is 2.0.2 and 2.0.3 rc1 has been tagged.
- release/2.2 is the development branch for the future 2.2 release. Latest test build is 2.1.102.
- Master is the development branch for the future 2.4 release. Latest test build is 2.3.100.
- Major recent changes on release/2.0 (bugfix
release):
- Upgrade to libfabric v1.15.1
- Fill a sem leak on failure path of pool creation
- Several patches to improve error reporting when downgrading from 2.2 to 2.0 after dmg pool upgrade was completed.
- Fix a limitation in the agent code and NIC selection on servers with a lot of NUMA nodes
- Fix reference leak on container destroy path
- Major recent changes on release/2.2 (future
2.2 release):
- All patches listed in the 2.0 section above.
- Evict pool handles on agent shutdown
- Several test fixes to run the CI over UCX
- Add support for virtual network interface
- Disable FUSE metadata caching when using IL
- Split large I/Os into chunk size IOs (8MB by default) to avoid exceeding the SPDK bdev single IO size limit.
- Fix build on aarch64
- Reduce duration of NVMe storage format in the control plane
- Fix huge pages NUMA locality
- Fix available storage with heterogeneous NVMe
- Disable unified mode in UCX
- Major recent changes on master (future
2.4 release):
- All patches listed in the 2.2 section above.
- Add dmg pool query-targets
- Add chgrp support to dfuse
- Check thread ID in IL destructor
- Use pip to install deps on ubuntu
- Remove some compatibility code with DAOS 1.2
- Allow fanout control RPC to report progress
- Add rebuild generation for each rebuild retry
- Add new
DFS APIs to retrieve pool and container handles
- What is coming:
- 2.0.3 validation and decision on whether we need another release candidate
- 2.2.0 code freeze
- 2.4.0 feature freeze
R&D
- Major features under development:
- Checksum scrubber
- Improvement to test coverage in progress.
- Should be merged to master soon.
- Branch: feature/csum-scrubbing
- Target release: 2.4
- Multi-user dfuse
- More aggressive caching in dfuse for AI APPs
- I/O steaming function interception
- Catastrophic recovery
- Aka distributed fsck or checker
- First passes implemented.
- Testing and debugging in progress.
- Branch: feature/cat_recovery
- Target release: 2.6
- Multi-homed network support
- Aka multi-provider support
- This feature aims at supporting multiple network provider in the engine
- Changes to the engine to support multiple provider implemented.
- Branch: feature/multiprovider
- Target release: 2.6
- Client-side metrics
- Performance domain
- Extend placement algorithm to be aware of fabric topology
- Branch: feature/perf_dom
- Target release: 2.8
- LDMS plugin to export DAOS metrics
- Pathfinding:
- DAOS Pipeline API for active storage
- Branch: feature/pipeline_api
- Prototyping of server-side find using the pipeline API completed. Initial results shared at IXPUG conference.
- Changes to MariaDB DAOS engine with predicate pushdown to the DAOS storage
- Leveraging the Intel Data Streaming Accelerator (DSA) to accelerate DAOS
- Prototype leveraging DSA for VOS aggregation delivered
- Initial results shared at IXPUG conference.
- DAOS/DFS integration with NFS Ganesha
- Under exploration
- Discussions about NFS handle and readdir+ support
- OPX provider support in collaboration with Cornelis Networks
- DAOS and Mercury patches to add OPX support under review.
- GPU data path optimizations
- I/O Middleware / Framework Support
- TensorFlow-IO plugin for DAOS
- S3 support via a DAOS backend to Rados Gateway (RGW)
- Support for versioning and multi-part upload added
- Work underway to evaluate performance and move DAOS-specific code to a new library called libds3.
- PR submitted upstream by Zuhair/Seagate: https://github.com/ceph/ceph/pull/45888
- Block interface over DAOS using SPDK DAOS bdev
News
- Jerome Soumagne joins the Intel DAOS team to work on the network stack.