DAOS Community Update / July'22


Lombardi, Johann
 

Hi there,

 

Please find below the DAOS community newsletter for July 2022.

A copy of this newsletter is available on the wiki.

 

Past Events

  • ISC’22 IXPUG (June 2nd at 9am CEST)

DAOS Features for Next Generation Platforms

https://www.ixpug.org/events/isc22-ixpug-workshop

Mohamad Chaarawi (Intel)

  • ESSA’22: 3rd Workshop on Extreme-Scale Storage and Analysis (June 3rd)

DAOS: Nextgen Storage Stack for HPC and AI

https://sites.google.com/view/essa-2022/

Johann Lombardi (Intel)

  • EMOSS’22: Emerging Open Storage Systems and Solutions for Data Intensive Computing (July 1st)

One big happy family: sharing the S3 layer between Ceph, CORTX, and DAOS

https://iosea-project.eu/event/emoss-22-workshop/
Zuhair AlSader (Seagate)
Andriy Tkachuk (Seagate)

 

Upcoming Events

  • Flash Memory Summit’22: 3rd Workshop on Extreme-Scale Storage and Analysis (August 2nd-4th)

Requirements and Challenges Associated with the World's Fastest Storage Platform

https://sites.google.com/view/essa-2022/

Jeff Olivier (Intel)

 

Release

  • Current stable release is 2.0.2. See https://docs.daos.io/v2.0/ and https://packages.daos.io/v2.0/ for more information.
  • Branches:
    • release/2.0 is the release branch for the stable 2.0 release. Latest bug fix release is 2.0.2. 2.0.3 rc3 was tagged and rc4 is imminent.
    • release/2.2 is the development branch for the future 2.2 release. Latest test build is 2.1.103.
    • Master is the development branch for the future 2.4 release. Latest test build is 2.3.100.
  • Major recent changes on release/2.0 (bugfix release):
    • Fix a few client-side compilation issues on ARM64
    • Fix a segfault in bio with large single-value update
    • Fix several event-related issues causing client-side assertion during soak testing.
    • Several documentation improvements/clarifications
  • Major recent changes on release/2.2 (future 2.2 release):
    • All patches listed in the 2.0 section above.
    • Fix semaphore leak on pool addition failure
    • Add support for ucx+all alias
    • Limit UCX scanning to ib and tcp components.
    • Fix a few issues in incast variables
    • Build UCX in scons when building from source
    • Remove NVMe space reservation
    • Move dmg to new daos-admin RPM package
    • Fix bug in available NVMe space calculation used to support pool create with percentage-based space
    • Add support for EC conditional fetch (only used for metadata on EC)
    • Remove 1.2 compatibility code in daos_cont_create(). Container UUID cannot be provided by caller any longer.
    • Fix daos_event_abort_corruption issue
    • Check thread ID in interception descriptor
    • Remove dfuse user-space readahead entirely (readahead still done by the kernel)
  • Major recent changes on master (future 2.4 release):
    • All patches listed in the 2.2 section above.
    • Several clean-ups to prepare for EC parity rotation feature
    • Add SVM (shared virtual memory) support for QAT compression
    • Remove CentOS7 stage from CI pipeline
    • Update ubuntu to 22.04 in GitHub Action
    • User correct uid/git bits when creating files
    • Several build infrastructure (scons) improvements
    • Add EL9 docker file
    • Several server-side build fixes for ARM64
    • Limit DTX batched commit count.
  • What is coming:
    • 2.0.3 rc4 validation
    • 2.2.0 code freeze
    • 2.4.0 feature freeze

 

R&D

  • Major features under development:
    • Checksum scrubber
      • PR #9345 created out of the feature branch to land to master.
      • Soak and performance tests were run against the PR.
      • Exit criteria under view, merge to master is imminent.
      • Branch: feature/csum-scrubbing
      • Target release: 2.4
    • Multi-user dfuse
    • More aggressive caching in dfuse for AI APPs
    • I/O steaming function interception
      • Add the ability to intercept fopen/fclose/fread/fwrite and other streaming functions in the interception library.
      • Intercept missing ftello64 function.
      • Fix several bug discovered by daos_build test (i.e. building DAOS source code over FUSE/IL).
      • Test plan created.
      • PR: https://github.com/daos-stack/daos/pull/6939
      • Target release: 2.4
    • Catastrophic recovery
      • Aka distributed fsck or checker
      • ddb (low level debugger utility similar to debugfs for ext4) landed
      • Most of passes 2 and 3 implemented.
      • Testing for ddb and pass 1 in progress.
      • Branch: feature/cat_recovery
      • Target release: 2.6
    • Multi-homed network support
      • Aka multi-provider support
      • This feature aims at supporting multiple network provider in the engine
      • Incorporate new CART API to support bulk transfer from a second provider.
      • RPC dispatch to a second provider not supported yet.
      • Branch: feature/multiprovider
      • Target release: 2.6
    • Client-side metrics
    • Performance domain
      • Extend placement algorithm to be aware of fabric topology
      • Fix to avoid putting shards on the same domain landed
      • Branch: feature/perf_dom
      • Target release: 2.8 
    • LDMS plugin to export DAOS metrics
  • Pathfinding:
    • DAOS Pipeline API for active storage
    • Leveraging the Intel Data Streaming Accelerator (DSA) to accelerate DAOS
      • Prototype leveraging DSA for VOS aggregation delivered
      • Initial results shared at IXPUG conference.
    • OPX provider support in collaboration with Cornelis Networks
      • OPX provider merged upstream in libfabric
      • PR to add OPX support to mercury merged
      • Changes to DAOS to enable OPX as part of the build in progress
    • GPU data path optimizations
  • I/O Middleware / Framework Support

 

News

  • Thanks to Kevin Zhao for providing access to an ARM64 node that will be added soon as a self-hosted runner in GitHub action. This will allow the community to build & test DAOS on ARM64 on a regular basis.
  • A proposal for a DAOS community BoF at SC’22 has been submitted.

 

---------------------------------------------------------------------
Intel Corporation SAS (French simplified joint stock company)
Registered headquarters: "Les Montalets"- 2, rue de Paris,
92196 Meudon Cedex, France
Registration Number:  302 456 199 R.C.S. NANTERRE
Capital: 5 208 026.16 Euros

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.