DAOS Community Update / August'22


Lombardi, Johann
 

Hi there,

Please find below the DAOS community newsletter for August 2022. A copy of this newsletter is also available on the wiki.

Past Events

Upcoming Events

  • 6th annual DAOS User Group (Nov'22)

Release

  • Current stable release is 2.0.3. See https://docs.daos.io/v2.0/ and https://packages.daos.io/v2.0/ for more information.
    2.0.3 includes several fixes for ARM64 support, erasure code and pool operations. Please see the release notes for more details.
  • Branches:
    • release/2.0 is the release branch for the stable 2.0 release. Latest bug fix release is 2.0.3 (v2.0.3 tag).
    • release/2.2 is the development branch for the future 2.2 release. Latest test build is 2.1.104 (v2.1.104-tb tag).
    • Master is the development branch for the future 2.4 release. Latest test build is 2.3.100 (v2.3.100-tb tag).
  • Major recent changes on release/2.0 (bugfix release) since v2.0.3:
    • Several coverty fixes
    • Several packaging changes to prevent 2.0 from picking up UCX packages and new SPDK version.
    • Several test fixes
    • Remove unused test code
  • Major recent changes on release/2.2 (future 2.2 release):
    • All patches listed in the 2.0 section above.
    • Remove check for dpdk/rte_eal.
    • Handle the rare case in the control plane where a node processing a SWIM dead event loses leadership before the membership can be updated
    • The hwloc library provides quite a bit of information about block devices; make this available through the control plane. Also adds support for non-PCI devices
    • Add backward compatibility code for the enable_vmd config parameter
    • Several dtx internal fixes.
    • Fix size compatibility issue with ds_cont_prop_cont_global_version in rdb
    • Refine code cleaning up huge pages left behind by previous instances
    • Update mercury to 2.2.0-rc6 to grab several fixes in ucx, tcp and cxi.
    • Improve QoS on request processing in the engine when running out of DMA buffers (FIFO order is now guaranteed to avoid starvation)
    • Improve support of compound RPCs with co-located shards
    • Update SPDK to v22.01.1
    • Add support for ucx/tcp transport to cart.
    • Fix assertion failure in pool_map_get_version()
    • Fix mtime accounting for user set mtime
    • Add VPIC and LAMMPS applications to soak testing framework
    • Report pool global version on dmg pool list/query
    • Reject pool connection from old clients after pool upgrade
  • Major recent changes on master (future 2.4 release):
    • All patches listed in the 2.2 section above.
    • Add code to report Jira status into GitHub 
    • Use shared event queue in pydaos
    • Fix an issue in the I/O scheduler related to I/O throttling
    • Add partial support for readdir caching to dfuse
    • Land checksum scrubbing feature
    • Remove openpa package dependency
    • Add support for streaming I/O functions to the interception library
    • Update vendor dependency in the control plane
    • Fix daos cont list-obj JSON output
    • Fix possible race condition in map_refresh
    • Fix a race in dc_tx_get_epoch
    • Add metrics to track EC full stripes vs partial updates
    • Fix name match in daos_oclass_name2id()
    • Add new STACK_MMAP build option to enable DAOS-managed ABT stacks in the engine
    • Fix OID leak in the OIT
    • Fix a bug in size query introduced by the EC shard rotation feature
    • Move fault injection testing from CentOS7 to Rocky Linux 8 to prepare for the CentOS7 removal for 2.4.
    • Add ARM64 self-hosted runners to GitHub Action.
  • What is coming:
    • 2.2.0 code freeze
    • 2.4.0 feature freeze

R&D

  • Major features under development:
    • VOS on SPDK blob
      • New umem backend and WAL to maintain an up-to-date of copy of a VOS (i.e. DAOS metadata) file on a SPDK blob (i.e. SSD).
      • Patch to use umem DAOS interface in BIO and VOS landed.
      • Branch: feature/vos-on-blob
      • Target release: TBD
    • Checksum scrubber
      • Feature landed to master for 2.4
      • Branch: feature/csum-scrubbing
      • Target release: 2.4
      • This entry will be removed from this report next time
    • Multi-user dfuse
    • More aggressive caching in dfuse for AI APPs
    • I/O steaming function interception
      • Add the ability to intercept fopen/fclose/fread/fwrite and other streaming functions in the interception library
      • Feature landed to master for 2.4
      • PR: https://github.com/daos-stack/daos/pull/6939
      • Target release: 2.4
      • This entry will be removed from this report next time
    • Catastrophic recovery
      • Aka distributed fsck or checker
      • Tests for ddb (low level debugger utility similar to debugfs for ext4) under review
      • Testing for the dmg checker under development
      • Improvements to the checker start/stop flow and dmg in progress
      • Pass 4 for container recovery is in progress
      • Branch: feature/cat_recovery
      • Target release: 2.6
    • Multi-homed network support
      • Aka multi-provider support
      • This feature aims at supporting multiple network provider in the engine
      • Branch is feature complete now and testing is underway
      • Branch: feature/multiprovider
      • Target release: 2.6
    • Client-side metrics
    • Performance domain
      • Extend placement algorithm to be aware of fabric topology
      • Fix to avoid putting shards on the same domain landed
      • Branch: feature/perf_dom
      • Target release: 2.8 
    • LDMS plugin to export DAOS metrics
  • Pathfinding:
    • DAOS Pipeline API for active storage
    • Leveraging the Intel Data Streaming Accelerator (DSA) to accelerate DAOS
      • Prototype leveraging DSA for VOS aggregation delivered
      • Initial results shared at IXPUG conference.
    • OPX provider support in collaboration with Cornelis Networks
      • OPX provider merged upstream in libfabric
      • Provider supported in latest mercury version
      • Changes to DAOS to enable OPX as part of the build in progress
    • GPU data path optimizations
  • I/O Middleware / Framework Support:

News

  • Thanks a lot to Linaro and Croit for providing the DAOS community with access to ARM64 nodes with different configurations (#cores, OS, ...). Github actions is now enabled to build the DAOS master branch regularly on ARM64. Next step is to run unit tests.
  • A proposal for a DAOS community BoF at SC’22 has been submitted. We should know on Aug 12.
  • IO500 instructions on the wiki have been updated to use the new DAOS-aware pfind.

 

---------------------------------------------------------------------
Intel Corporation SAS (French simplified joint stock company)
Registered headquarters: "Les Montalets"- 2, rue de Paris,
92196 Meudon Cedex, France
Registration Number:  302 456 199 R.C.S. NANTERRE
Capital: 5 208 026.16 Euros

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

Join daos@daos.groups.io to automatically receive all group messages.