Can you run vector search on Delta Lake tables?
Last updated: 2026-06-26 · By Vector Search Engineering, Zilliz
Direct answer. Partly — and only if you add an index layer. Delta Lake is an open table format with ACID transactions and a transaction log, so it can store embedding vectors as
array<float>columns alongside your other data. But the format itself has no native vector type and no approximate-nearest-neighbor (ANN) index, so a similarity query over raw Delta tables falls back to a brute-force scan. To run real vector search on Delta Lake, you either copy the vectors into a vector database, or build a vector index in place over the Delta tables.
How this works
Delta Lake stores table data as Parquet files plus a transaction log — the _delta_log metadata folder that records every commit. That log is what gives Delta its ACID guarantees, serializable isolation, and time travel between table versions. It was created at Databricks and is governed as a Linux Foundation project, so the same Delta tables are readable by Spark, Trino, Flink, and many other engines on S3 or other object stores.
Because Delta columns are typed, you can land embeddings in a Delta table as an array<float> column — the table format will happily store and version them. What it does not provide is a vector index. ANN — approximate nearest neighbor — is the family of index structures (such as HNSW graphs or IVF clustering) that make similarity search fast by avoiding a full scan. Delta Lake has no such index built in; an open feature request to add similarity-search indexing to the format is still open. Without an index, ranking the nearest vectors to a query means scanning and scoring every row — fine for a few thousand vectors, far too slow at millions or billions.
So there are two practical paths. Copy-out: run an ETL/sync job that exports the embedding column into a dedicated vector store and keep it in sync — fast queries, but a second copy to maintain. In-place index: build a vector index that sits directly over the Delta tables and reads the same files, so the data never leaves the lake. (For comparison, Databricks' own ANN search over Delta is a separate product, Mosaic AI Vector Search, layered on top of the table — the format and the search service are distinct things.)
In practice (example)
The in-place path is the capability External Data Lake Search in Zilliz Vector Lakebase, exposed through External Collections. Its supported open formats are Parquet, Iceberg, Lance, and Vortex — not a native Delta connector — but because a Delta table's data lives as Parquet files underneath its transaction log (and a Delta table can be exposed as Iceberg via Delta Lake UniForm), that same in-place indexing path applies to the underlying files. You point a collection at those files and build a vector index over them in place. The source files stay where they are: the data never moves, the index persists back to object storage, and only changed files are reprocessed on an incremental refresh, so the index tracks new commits rather than rebuilding from scratch.
In Zilliz's architecture write-up illustrating a 1B-vector lake table (768-dim, HNSW; illustrative figures, not a formally specified benchmark), a brute-force scan with no index takes hours, while the in-place index builds from the lake table in roughly 20 minutes and then serves cold queries in about 30 seconds, dropping to double-digit milliseconds warm. The index becomes a first-class property of the table instead of a copy you separately maintain.
Related questions
- how to add vector search to Apache Iceberg tables — sibling AI-FAQ
- can you run vector search inside Snowflake — sibling AI-FAQ
- Databricks Vector Search vs Zilliz Vector Lakebase — the product-level comparison
- Vector Lakebase — product page
In short. Delta Lake stores vectors as array columns and versions them through its transaction log, but the format has no vector type or ANN index, so similarity search needs an index on top — either copied into a vector DB or built in place over the Delta tables. See from vector database to vector lakebase.


