Eventual Logo

The data warehouse for Physical AI.

Ingest your fleet's multimodal data. Query your video data in natural language. 7-22x faster dataloading to your GPUs.

Video frames
Multi-cam, 10-30 Hz
LiDAR frames
200k points / sample
MCAP logs
IMU, GPS, telemetry
MultiBase
Index · Understand · Query
Powered by daft
Dataload at line rate to GPUs during training
What slows down physical AI teams
01

A needle in a petabyte

The right training data is buried in your video. But there’s no way to search by content. Every new research question means days of heuristic scripts, annotation vendors, or manual review.

02

Starving GPUs

20–40% of training time lost to data loading. The path from storage to GPU is its own engineering project.

Compute scales. Storage bandwidth doesn’t.

~3×
Computeper GPU generation (H100 → B200 → Vera Rubin)
~1.3×
Storage bandwidthper generation (same workload)

The gap widens every year. The teams that solve the data loop now will be able to use the next-gen hardware. The ones that don’t will watch $100M clusters idle.

How we solve it

MultiBase is the video-native index for physical AI that turns weeks of data wrangling into hours.

Warehouse

One data warehouse, not five tools wired together.

  • Video and sensors on the same row, aligned on timecode
  • Dataset versioning and row-level provenance
  • Add columns like embeddings or labels without rewriting existing data
Understand

Describe what you need. Get results in minutes.

  • Every clip annotated at corpus scale, for a fraction of annotation vendor cost
  • Search by what's inside the video, not just metadata
  • Temporal and causal understanding: “find an unprotected left turn where the vehicle squeezed through a tight gap in oncoming traffic”
Serve

Curated data to your GPUs at line rate.

  • Video-native PyTorch dataloader: dict[str, Tensor] on GPU
  • 7-22x faster dataloading to your GPUs
  • Turn 20-40% GPU idle time into training time
Built for physical AI teams
Robotics labs
Autonomous driving
Video generation teams
Why us
  • Built on Daft, an open-source data engine processing petabytes daily at companies like Amazon and Mobileye.
  • Founding team spent years building PB-scale sensor pipelines for autonomous vehicles. Same class of problem, at the same scale.
  • We build and operate the VLMs that power curation. Customers don’t stand up their own model-serving stack.

Get in touch

We're shaping MultiBase with leading robotics labs, autonomous vehicle companies, and GPU infrastructure providers.

Our engineering team will reach out to understand how best to serve you.

Backed by

Y CombinatorFelicisCRVarray.vc