Back to Blog
February 26, 2026
Daft v0.7.4: Arrow-rs, OpenDAL, Flight Shuffle, and Better Metrics

Daft v0.7.4: Arrow-rs, OpenDAL, Flight Shuffle, and Better Metrics

Daft v0.7.4 completes its arrow-rs migration, adds Apache OpenDAL storage support, Flight shuffle for Flotilla, and a full observability stack.

by Daft Team

Over the past several months, the team has been migrating core rust kernels from arrow2 to arrow-rs. With release 0.7.4, over two thousand arrow2 call-sites have been meticulously converted, bringing to a close a colossal effort by the team. This release also brings a full observability stack, Apache OpenDAL support, and quality-of-life improvements across metrics, storage backends, and SQL compatibility.

The Great Arrow-rs Migration

The most significant change in v0.7.4 is Daft's migration from arrow2 to arrow-rs, the official Apache Arrow Rust implementation.

Why this matters: "arrow2 was a high-quality Rust implementation of Arrow, but it's no longer actively maintained. arrow-rs, on the other hand, is the Apache Software Foundation's official project, backed by a large contributor community, regular releases, and deep integration with the broader Rust data ecosystem (DataFusion, Ballista, and more)."

This release alone includes 25 of 120+ total PRs spanning everything from core data types to kernel implementations:

  • Arithmetic kernels, cast operations, and concat logic moved to arrow-rs
  • Boolean bitmap access, array serialization, and filtering migrated
  • Hash kernels, growable internals, and comparison operations ported
  • The daft-arrow compatibility layer is being systematically removed

Breaking change: Interval arithmetic now uses arrow-rs (#6186). If you're using interval types directly, check the migration notes.

The arrow2-to-arrow-rs migration has been underway since December and is now effectively complete. Cory Grinstead (@universalmind303) led the charge, authoring over half of the 122 total migration PRs.

Here are the full numbers:

MetricValue
Total migration PRs122
Total lines added+20,934
Total lines deleted-17,916
Total lines changed38,850
Net change+3,018

Per-contributor breakdown:

ContributorPRsLines AddedLines Deleted% of PRs
@universalmind3036213,0718,62450.8%
@desmondcheongzx161,7841,77113.1%
@srilman113,2342,9209.0%
@rohitkulshreshtha103445698.2%
@kevinzwang88963,0126.6%
@cckellogg82911336.6%
@huleilei3107602.5%
@colin-ho354522.5%
@rchowell11,1537750.8%

Better Metrics for Observability

v0.7.4 continues the observability push that started in v0.7.3. Standardized metric naming and split duration columns mean you can now correlate per-node execution times across pipeline stages without writing custom parsing, the metrics DataFrame gives you what you need directly.

New in v0.7.4:

  • New Metrics documentation (#6253) — comprehensive docs for Daft's metrics system
  • Consolidated metric naming (#6236) — standardized metric names with node.type attributes for cleaner dashboards
  • Split duration metrics (#6235) — separate columns in the metrics DataFrame for easier analysis
  • Dashboard CLI improvements (#6234) — split into start/stop subcommands for cleaner workflow

Combined with v0.7.3's OTEL export support, Flotilla metrics, and dashboard daemon mode, Daft now has a complete observability story: collect metrics, export to OpenTelemetry, and visualize in a built-in dashboard.

Apache OpenDAL: One API for Every Storage Backend

Daft now supports Apache OpenDAL compatible backends (#6177). OpenDAL provides a unified data access layer for dozens of storage services, S3, GCS, Azure Blob, HDFS, and many more, through a single API.

This means Daft can now read from and write to any storage backend that OpenDAL supports, without needing a dedicated connector for each one.

Flight Shuffle for Flotilla

Daft's distributed execution engine Flotilla gets a major upgrade with Flight shuffle support (#6123). Arrow Flight is a high-performance data transport protocol built on gRPC and Arrow IPC, it enables efficient, zero-copy data movement between nodes during shuffle operations.

This is a foundational piece for Flotilla's performance at scale, reducing serialization overhead during distributed data exchange.

More Highlights

  • Tencent Cloud COS support (#6140) — native support for Tencent Cloud Object Storage, contributed by @XuQianJin-Stars
  • pyiceberg 0.11.0 (#6200) — updated Iceberg integration to the latest pyiceberg release
  • .as_T cast methods (#6100) — convenient type casting methods like .as_int(), .as_str() on expressions
  • SQL ORDER BY position (#6211) — ORDER BY 1, 2 now works as expected in Daft SQL
  • Time-interval sampling (#6088) — enhanced sampling with comprehensive time-interval support for audio/video workflows

Community Contributions

This release wouldn't be possible without contributions from the community:

  • @huleilei — time-interval sampling enhancements
  • @XuQianJin-Stars — Tencent Cloud COS support
  • @gweaverbiodevpyiceberg 0.11.0 support
  • @Lucas61000SQL ORDER BY column position
  • @plotor — dashboard daemon mode
  • @gpathak128 — JSON timestamp write support
  • @aaron-ang.as_T cast methods

Upgrade today to 0.7.4

uv add "daft>=0.7.4"

Or try the latest nightly:

uv pip install daft --pre --extra-index-url https://nightly.daft.ai

Check the full changelog for the complete list of changes.

Suggested Posts