Hi there,
Please find below the DAOS community newsletter for May 2022.
Past Events
- 2022 Energy HPC Conference (March 3rd)
DAOS Use at Argonne
Kevin Harms (Argonne)
Johann Lombardi (Intel)
Mohamad Chaarawi (Intel)
https://energyhpc.rice.edu/program/
- Salishan Conference On High Speed Computing (April 26th)
Accelerating Data-driven Workflows with DAOS
https://salishan.ahsc-nm.org/program4.html
Johann Lombardi (Intel)
Upcoming Events
- ECP BoF (May 11th at 11am Eastern time)
DAOS Next Generation Storage
https://www.exascaleproject.org/event/ecp-community-bof-days-2022/
(registration is open)
Kevin Harms (ANL)
Mohamad Chaarawi (Intel)
Johann Lombardi (Intel)
Advanced Storage and Memory Hierarchy in AI and HPC with DAOS Storage
https://reg.oneventseries.intel.com/flow/intel/vision2022/portal-live/page/session-catalog?tab.day=20220511&search=AIML009
Andrey Kudryavtsev (Intel)
- SODACODE 2022 Keynote (May 25th)
Advanced Storage and Memory Hierarchy in AI and HPC with DAOS Storage
Andrey Kudryavtsev (Intel)
- SODACODE 2022 Breakout Session (May 26th)
Accelerating AI with DAOS Storage
Johann Lombardi (Intel)
- ISC’22 BoF (May 30th at 4pm CEST)
Accelerating HPC and AI with DAOS Storage
https://app.swapcard.com/widget/event/isc-high-performance-2022/planning/UGxhbm5pbmdfODYxMTYx
Kevin Harms (ANL)
Adrian Jackson (EPCC)
Michael Hennecke (Intel)
Mohamad Chaarawi (Intel)
Johann Lombardi (Intel)
- ISC’22 IXPUG
(June 2nd at 1pm CEST)
DAOS Features for Next Generation Platforms
https://app.swapcard.com/widget/event/isc-high-performance-2022/planning/UGxhbm5pbmdfODYxMjIy
Mohamad Chaarawi (Intel)
- ISC’22 Intel Booth
(May 30th to June 1st)
DAOS demonstration, Fireside chats, …
- ESSA’22: 3rd Workshop on Extreme-Scale Storage and Analysis
(June 3rd)
DAOS: Nextgen Storage Stack for HPC and AI
https://sites.google.com/view/essa-2022/
Johann Lombardi (Intel)
Release
- Current stable release is 2.0.2. See https://docs.daos.io/v2.0/ and https://packages.daos.io/v2.0/ for
more information.
- Branches:
- release/2.0 is the release branch for the stable 2.0 release. Latest bug fix release is 2.0.2.
- release/2.2 is the development branch for the future 2.2 release. Latest test build is 2.1.101.
- Master is the development branch for the future 2.4 release. Latest test build is 2.3.100.
- Major recent changes on release/2.0
(bugfix release):
- Several documentation and test fixes
- Fix forced pool destroy flow
- Fix a bug in the rare event of leadership loss in the management service
- Upgrade to libfabric 1.15-rc3 to fix a tcp provider regression and pick some other upstream fixes
- Fix a few EC bugs in degraded update and aggregation
- Fix a libdaos bug with blocking calls in multi-threaded environments. An extra lock on the DAOS event is required.
- Migrate CI testing to Rocky Linux
- Major recent changes on release/2.2
(future 2.2 release):
- All the changes listed above in the release/2.0 section.
- Load-balance NIC allocations in the agent when the process does not have a local interface usable with DAOS
- Fix a bug with checksum, hole and EC.
- Fix a bug in SPDK with VMD causing spdk_vmd_init() to fail in certain conditions
- Several EC aggregation optimisations
- Allow ucx+dc_x provider in the config
- Major recent changes on master (future
2.4 release):
- All the changes listed above in the release/2.2 section.
- Introduce local TX to rdb
- Wrap MPI code in unified interface
- Allow VOS pool open to ignore UUID
- Add documentation for pmempool check and repair
- Improve test coverage
- What is coming:
- 2.2.0 testing and code freeze
- 2.4.0 feature freeze
R&D
- Major features under development:
- Checksum scrubber
- Initial development completed and demonstrated.
- Testing/bugfixing/polishing in progress.
- Branch: feature/csum-scrubbing
- Target release: 2.4
- Catastrophic recovery
- Aka distributed fsck or checker
- Developed control plane interface as well as pass 0 and 1
- Branch: feature/cat_recovery
- Target release: 2.6
- Performance domain
- Extend placement algorithm to be aware of fabric topology
- New container properties added
- Changes to the algorithmic placement implemented
- Branch: feature/perf_dom
- Target release: 2.8
- Multi-homed network support
- Aka multi-provider support
- This feature aims at supporting multiple network provider in the engine
- CART changes completed. Changes to the engine are in progress.
- Branch: feature/multiprovider
- Target release: 2.6
- Client-side metrics
- Multi-user dfuse
- More aggressive caching in dfuse for AI APPs
- I/O steaming function interception
- LDMS plugin to export DAOS metrics
- Pathfinding:
- DAOS Pipeline API for active storage
- Branch: feature/pipeline_api
- Prototyping of server-side find using the pipeline API is in progress
- Changes to MariaDB DAOS engine with predicate pushdown to the DAOS storage
- Leveraging the Intel Data Streaming Accelerator (DSA) to accelerate DAOS
- Prototype leveraging DSA for VOS aggregation delivered
- GPU data path optimizations
- OPX provider support in collaboration with Cornelis Networks
- I/O Middleware / Framework Support
- TensorFlow-IO plugin for DAOS
- S3 support via a DAOS backend to Rados Gateway (RGW)
- Block interface over DAOS using SPDK DAOS bdev
News
- James Nunez joins the Intel DAOS team as a performance/application
engineer.
- DAOS Roadmap has been updated on the
wiki.
- SODACODE 2022 hackathon is still underway. The list of DAOS activities offered in this hackathon has been created in jira:
https://daosio.atlassian.net/browse/DAOS-10010?jql=labels%20%3D%20%22SODACODE2022%22%20and%20statusCategory%20!%3D%20Done
See https://events.linuxfoundation.org/sodacode/ for
more information.