DAOS Community Update / June'22


Johann
 

Hi there,

 

Please find below the DAOS community newsletter for June 2022.

 

Past Events

  • Salishan Conference On High Speed Computing (April 26th)

Accelerating Data-driven Workflows with DAOS
https://salishan.ahsc-nm.org/program4.html

Johann Lombardi (Intel)

  • ECP BoF (May 11th at 11am Eastern time)

DAOS Next Generation Storage

https://www.exascaleproject.org/event/ecp-community-bof-days-2022/ (registration is open)

Kevin Harms (ANL)

Mohamad Chaarawi (Intel)

Johann Lombardi (Intel)

  • Intel Vision (May 11th)

Advanced Storage and Memory Hierarchy in AI and HPC with DAOS Storage

https://reg.oneventseries.intel.com/flow/intel/vision2022/portal-live/page/session-catalog?tab.day=20220511&search=AIML009

Andrey Kudryavtsev (Intel)

  • SODACODE 2022 Keynote (May 25th)

Advanced Storage and Memory Hierarchy in AI and HPC with DAOS Storage

Andrey Kudryavtsev (Intel)

  • SODACODE 2022 Breakout Session (May 26th)

Accelerating AI with DAOS Storage

Johann Lombardi (Intel)

  • ISC’22 BoF (May 30th at 4pm CEST)

Accelerating HPC and AI with DAOS Storage

https://app.swapcard.com/widget/event/isc-high-performance-2022/planning/UGxhbm5pbmdfODYxMTYx

Kevin Harms (ANL)

Adrian Jackson (EPCC)

Michael Hennecke (Intel)

Mohamad Chaarawi (Intel)

Johann Lombardi (Intel)

  • ISC’22 IXPUG (June 2nd at 9am CEST)

DAOS Features for Next Generation Platforms

https://www.ixpug.org/events/isc22-ixpug-workshop

Mohamad Chaarawi (Intel)

  • ISC’22 Intel Booth (May 30th to June 1st)

DAOS demonstration, Fireside chats, …

 

Upcoming Events

  • ESSA’22: 3rd Workshop on Extreme-Scale Storage and Analysis (June 3rd)

DAOS: Nextgen Storage Stack for HPC and AI

https://sites.google.com/view/essa-2022/

Johann Lombardi (Intel)

  • EMOSS’22: Emerging Open Storage Systems and Solutions for Data Intensive Computing (July 1st)

One big happy family: sharing the S3 layer between Ceph, CORTX, and DAOS

https://iosea-project.eu/event/emoss-22-workshop/
Zuhair AlSader (Seagate)
Andriy Tkachuk (Seagate)

 

Release

  • Current stable release is 2.0.2. See https://docs.daos.io/v2.0/ and https://packages.daos.io/v2.0/ for more information.
  • Branches:
    • release/2.0 is the release branch for the stable 2.0 release. Latest bug fix release is 2.0.2 and 2.0.3 rc1 has been tagged.
    • release/2.2 is the development branch for the future 2.2 release. Latest test build is 2.1.102.
    • Master is the development branch for the future 2.4 release. Latest test build is 2.3.100.
  • Major recent changes on release/2.0 (bugfix release):
    • Upgrade to libfabric v1.15.1
    • Fill a sem leak on failure path of pool creation
    • Several patches to improve error reporting when downgrading from 2.2 to 2.0 after dmg pool upgrade was completed.
    • Fix a limitation in the agent code and NIC selection on servers with a lot of NUMA nodes
    • Fix reference leak on container destroy path
  • Major recent changes on release/2.2 (future 2.2 release):
    • All patches listed in the 2.0 section above.
    • Evict pool handles on agent shutdown
    • Several test fixes to run the CI over UCX
    • Add support for virtual network interface
    • Disable FUSE metadata caching when using IL
    • Split large I/Os into chunk size IOs (8MB by default) to avoid exceeding the SPDK bdev single IO size limit.
    • Fix build on aarch64
    • Reduce duration of NVMe storage format in the control plane
    • Fix huge pages NUMA locality
    • Fix available storage with heterogeneous NVMe
    • Disable unified mode in UCX
  • Major recent changes on master (future 2.4 release):
    • All patches listed in the 2.2 section above.
    • Add dmg pool query-targets
    • Add chgrp support to dfuse
    • Check thread ID in IL destructor
    • Use pip to install deps on ubuntu
    • Remove some compatibility code with DAOS 1.2
    • Allow fanout control RPC to report progress
    • Add rebuild generation for each rebuild retry
    • Add new DFS APIs to retrieve pool and container handles
  • What is coming:
    • 2.0.3 validation and decision on whether we need another release candidate
    • 2.2.0 code freeze
    • 2.4.0 feature freeze

 

R&D

  • Major features under development:
    • Checksum scrubber
      • Improvement to test coverage in progress.
      • Should be merged to master soon.
      • Branch: feature/csum-scrubbing
      • Target release: 2.4
    • Multi-user dfuse
    • More aggressive caching in dfuse for AI APPs
    • I/O steaming function interception
    • Catastrophic recovery
      • Aka distributed fsck or checker
      • First passes implemented.
      • Testing and debugging in progress.
      • Branch: feature/cat_recovery
      • Target release: 2.6
    • Multi-homed network support
      • Aka multi-provider support
      • This feature aims at supporting multiple network provider in the engine
      • Changes to the engine to support multiple provider implemented.
      • Branch: feature/multiprovider
      • Target release: 2.6
    • Client-side metrics
    • Performance domain
      • Extend placement algorithm to be aware of fabric topology
      • Branch: feature/perf_dom
      • Target release: 2.8 
    • LDMS plugin to export DAOS metrics
  • Pathfinding:
    • DAOS Pipeline API for active storage
      • Branch: feature/pipeline_api
      • Prototyping of server-side find using the pipeline API completed. Initial results shared at IXPUG conference.
      • Changes to MariaDB DAOS engine with predicate pushdown to the DAOS storage
    • Leveraging the Intel Data Streaming Accelerator (DSA) to accelerate DAOS
      • Prototype leveraging DSA for VOS aggregation delivered
      • Initial results shared at IXPUG conference.
    • DAOS/DFS integration with NFS Ganesha
      • Under exploration
      • Discussions about NFS handle and readdir+ support
    • OPX provider support in collaboration with Cornelis Networks
      • DAOS and Mercury patches to add OPX support under review.
    • GPU data path optimizations
  • I/O Middleware / Framework Support

 

News

  • Jerome Soumagne joins the Intel DAOS team to work on the network stack.

---------------------------------------------------------------------
Intel Corporation SAS (French simplified joint stock company)
Registered headquarters: "Les Montalets"- 2, rue de Paris,
92196 Meudon Cedex, France
Registration Number:  302 456 199 R.C.S. NANTERRE
Capital: 5 208 026.16 Euros

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.