Dremio Blog

17 minute read · May 20, 2026

What’s New in Apache Iceberg 1.11.0

Alex Merced Head of DevRel, Dremio

Start For Free

Copied to clipboard

What’s New in Apache Iceberg 1.11.0

The File Format API: The Architectural Shift That Changes Iceberg's Long-Term Trajectory

Deletion Vectors: Solving the Delete File Accumulation Problem

The Variant Type: Semi-Structured Data Without the Performance Penalty

Geospatial Types: Spatial Data Without Custom Extensions

Nanosecond Timestamps: Precision for High-Frequency Workloads

Engine-Specific Updates

Upgrading Your Tables to V3

Conclusion

Apache Iceberg 1.11.0 shipped in May 2026, and it is not a routine maintenance release. Two parallel threads of work converge here: a significant architectural shift in how Iceberg handles file formats, and the arrival of production maturity for the V3 specification features the community has been building toward for the past two years.

If you have been waiting for deletion vectors, the Variant type, native geospatial support, or nanosecond-precision timestamps to stabilize before adopting them, 1.11.0 is the release that signals they are ready.

This post covers each major change, what it means technically, which workloads it affects, and how to enable it.

The File Format API: The Architectural Shift That Changes Iceberg's Long-Term Trajectory

The most consequential change in 1.11.0 is one that most users won't directly interact with for months. The new File Format API restructures how Iceberg's core relates to the physical storage formats it reads and writes.

Try Dremio’s Interactive Demo

Explore this interactive demo and see how Dremio's Intelligent Lakehouse enables Agentic AI

The Problem with Hardcoded Format Support

Before 1.11.0, adding support for a new file format in Iceberg required touching the engine's internals. Parquet support lived in one code path. ORC support lived in another. Developers added new format integrations by extending and branching this logic: a pattern that worked when Parquet was the only format anyone cared about, but that does not scale when the ecosystem is producing new formats built for AI workloads, GPU pipelines, and ML training.

The practical result was that formats like Vortex (a high-performance analytics format designed as a modular Parquet successor), Lance (built for vector search and high-dimensional embedding data), and Nimble (optimized for wide-table ML training) were impractical to integrate even if the community wanted to. The cost of integration was too high relative to the uncertainty of adoption.

What the File Format API Provides

The new API introduces a unified, extensible layer for reading and writing data files. A new format implements the interface defined by the API. Iceberg's core doesn't need to know what format is underneath. The engine negotiates with the format through the interface.

This is a plugin model, not a branch model. Adding a new format goes from "edit the engine internals" to "ship a new format implementation that conforms to the interface."

For users, the immediate benefit is stability. The legacy branching code paths are consolidated, which reduces the surface area for bugs and makes the engine easier to maintain. For the ecosystem, the benefit is a clear integration point for the next generation of file formats.

What This Makes Possible

The community is actively working on integrating Vortex as the first new pluggable format to ship through the File Format API. Vortex is designed for high-performance analytics: it supports direct GPU decompression, efficient filter expressions evaluated on compressed data, and a modular column encoding system that can match or exceed Parquet's performance on analytical workloads.

Lance, built specifically for AI-native workloads, offers high-performance random access to high-dimensional vector data. Its current home is the LanceDB ecosystem, but the File Format API makes future Iceberg integration experiments viable in a way they weren't before.

Nimble targets ML training pipelines that consume very wide tables with thousands of feature columns. It prioritizes fast decoding speed over compression ratio, which is the right tradeoff for training jobs that read the same features repeatedly.

None of these formats are a drop-in replacement for Parquet in all workloads. Each makes different tradeoffs. The point of the File Format API is that Iceberg no longer has to pick one winner. Different tables can use different formats, and the catalog, metadata layer, and transaction semantics remain consistent across all of them.

Deletion Vectors: Solving the Delete File Accumulation Problem

The V2 Iceberg specification introduced row-level delete support through positional delete files. The concept was sound but the implementation had a scaling problem that anyone running frequent UPDATE or DELETE operations has encountered.

The V2 Delete File Accumulation Problem

In V2, every update or delete operation that doesn't trigger an immediate file rewrite creates a positional delete file. These files accumulate over time. A table with active update patterns: GDPR deletion requests, CDC feeds, price corrections: ends up with dozens or hundreds of small delete files sitting alongside each data file. At query time, the engine must open and apply all of them to reconstruct the current state. This gets progressively slower as delete files multiply.

How Deletion Vectors Work

Deletion vectors, introduced in the V3 spec and production-ready in 1.11.0, replace positional delete files with a fundamentally different mechanism. Instead of creating a new file for each delete operation, the engine maintains a Roaring bitmap stored in the Puffin file format that tracks exactly which rows in a data file are deleted.

The critical structural difference: there is a 1:1 relationship between a data file and its deletion vector. One data file, one deletion vector file. No accumulation. No growing list of delete files to open and apply at read time. When the engine reads a data file, it checks for a corresponding deletion vector (if one exists), applies the bitmap mask, and returns only the live rows. The overhead is a single file open, regardless of how many rows have been deleted from that data file.

For GDPR compliance use cases, this means processing right-to-erasure requests without rewriting large partitions. Delete the rows, update the deletion vector, done. The data files remain in place and can be physically cleaned up during the next scheduled compaction at whatever cadence suits your retention policy.

The Variant Type: Semi-Structured Data Without the Performance Penalty

The Variant type is the answer to a problem that has been annoying Iceberg users since the beginning: how do you store schema-flexible, semi-structured data without destroying query performance?

The JSON-as-String Failure Mode

The standard workaround for semi-structured data in Iceberg V2 is to store it as a string column containing JSON. This works for storage but fails for analytics. The engine cannot pushdown predicates into a string column that contains JSON. Filtering on payload['event_type'] = 'click' when payload is a string requires the engine to fetch and parse every row. No statistics. No skipping. Full scan.

For tables with billions of event records where event payloads are the primary access dimension, this is a significant performance problem. Teams work around it by pre-parsing important fields into separate typed columns during ingestion: which is exactly the kind of ETL work that Iceberg's architecture is supposed to eliminate.

What the Variant Type Provides

The Variant type stores semi-structured data in a binary encoding that is more compact than JSON strings and supports predicate pushdown directly into the structure. The engine can evaluate a filter like variant_column['region'] = 'US-West' without parsing the full document for every row.

Schema flexibility is preserved. You don't need to define the shape of the data at table creation time. Variant accommodates evolving structures, optional fields, and nested documents. The difference from a string column is that the binary encoding gives the engine enough metadata to skip irrelevant data at the file and block level.

Shredded Variants: Targeted Optimization for Hot Fields

Shredded variants take this further. If certain fields within a Variant column are accessed frequently: say, event_type and user_id appear in most queries against the events table: shredding extracts those fields into separate, typed Parquet columns alongside the Variant column. Queries that read those fields get typed column performance. The full Variant column remains available for queries that need the complete document.

Shredded variant writing is available in Spark 4.1+ and Flink 2.1+. If your engine versions are behind those minimums, you can still use the base Variant type without shredding.

Geospatial Types: Spatial Data Without Custom Extensions

Every team that has stored spatial data in Iceberg V2 has had to choose between bad options: store WKB (Well-Known Binary) or WKT (Well-Known Text) strings and lose all spatial query optimization, or maintain a custom extension that breaks compatibility with other engines.

Native GEOMETRY and GEOGRAPHY in V3

Iceberg 1.11.0's V3 specification adds native GEOMETRY and GEOGRAPHY data types. These are not string columns with a convention. They are first-class types defined in the Iceberg specification, which means engines can implement spatial operations against them with full awareness of the type's semantics.

GEOMETRY handles planar (flat-earth) spatial data. GEOGRAPHY handles spherical (curved-earth) spatial data. The distinction matters for distance calculations: the same coordinate pair produces different results under planar vs. spherical geometry, and using the wrong model produces incorrect results for large-scale geographic analysis.

What Changes for Spatial Workloads

Teams running GIS workloads, location analytics, ride-sharing platform analysis, or geofencing operations can now store spatial data in Iceberg tables without custom format extensions or string workarounds. The same Iceberg table, governed by the same Polaris catalog, accessed by the same compute engines, can hold both structured analytics data and spatial geometry data with type-aware query support.

Engine support for the new spatial types is rolling out across Spark, Flink, and Trino connectors. Check your specific engine connector version for spatial type write and query support before migrating production spatial workloads.

Nanosecond Timestamps: Precision for High-Frequency Workloads

Iceberg V2 timestamps top out at microsecond precision. For most analytical workloads, microseconds are more than sufficient. For high-frequency trading systems, scientific data collection, and industrial IoT sensors, they are not.

The Precision Gap

A trading system processing market data events may need to distinguish events separated by 100 nanoseconds. Sensors on manufacturing equipment can produce readings at rates that exceed microsecond resolution. Particle physics detectors generate event streams where nanosecond timestamps are not a precision luxury but a data integrity requirement.

Storing sub-microsecond data in a microsecond-precision timestamp column requires rounding, which is data loss. The workaround: storing nanoseconds as a raw integer column and converting at query time: loses all timestamp semantics, including time zone handling, timestamp arithmetic, and engine-level time-based pruning.

timestamp_ns and timestamptz_ns

Iceberg 1.11.0 ships timestamp_ns (nanosecond timestamp without time zone) and timestamptz_ns (nanosecond timestamp with UTC time zone) as V3 native types. Tables must be on format version 3 to use them.

Engine support: Flink 2.1+ adds nanosecond precision to the Flink-Iceberg integration. Spark support is evolving in newer versions. For production use with nanosecond timestamps, verify your engine's connector version against the minimum requirements.

Upgrading an existing table to V3 does not rewrite existing data files. Existing microsecond timestamp columns remain as-is. New columns can use the nanosecond types on the upgraded table.

Engine-Specific Updates

Apache Spark

Spark 4.1+ gains shredded variant writing support, enabling the performance optimization for frequently-accessed Variant fields. Not all V3 features are backported to Spark 4.0. If you are running Spark 4.0, basic Variant type support is available but shredding is not. Check the iceberg-spark connector release notes for the specific feature matrix against your Spark version.

Apache Flink

Flink 2.1+ adds nanosecond-precision timestamp support to the Flink-Iceberg integration, covering both timestamp_ns and timestamptz_ns. Streaming write performance with deletion vectors also continues to improve. Teams using Flink for CDC ingestion pipelines will see reduced delete file management overhead when running V3 tables.

Trino

Trino maps its JSON type to Iceberg's Variant type in the V3 connector. Full shredded variant support in the Trino-Iceberg connector is evolving; check the connector changelog for your Trino version. Basic Variant read and write are available in recent releases. Geospatial type support is also being added to the Trino connector in parallel.

Upgrading Your Tables to V3

All of the V3 features: deletion vectors, Variant type, geospatial types, nanosecond timestamps: require your Iceberg tables to be on format version 3. The upgrade command is:

ALTER TABLE your_catalog.your_schema.your_table
SET TBLPROPERTIES ('format-version' = '3');

This command is a metadata-only operation. It does not rewrite your existing data files. Existing files remain as-is and continue to be readable. New writes to the table use V3 capabilities. Deletion vectors kick in for subsequent delete operations. Variant columns can be added via ALTER TABLE ... ADD COLUMN. Existing typed columns are unaffected.

The tradeoff to acknowledge: once a table is on format version 3, engines that only support V2 cannot read it. Before upgrading, confirm that all query engines and tools in your stack have V3-compatible Iceberg connector versions.

The V4 specification is under active design in the Iceberg community, but V3 is the stable production target today. Getting your tables onto V3 now means you are on the foundation from which V4 features will eventually be addable incrementally.

Conclusion

Apache Iceberg 1.11.0 delivers on two fronts. The File Format API is an architectural investment whose full payoff comes over the next year or two as new format plugins ship, but it also consolidates and cleans up the engine's internal format handling today. The V3 feature set: deletion vectors, the Variant type with shredding, native geospatial types, and nanosecond timestamps: is now production-ready. These are not experimental features gated behind configuration flags. They are the stable, supported path for workloads that outgrow what V2 provided.

If you are running Iceberg in production on V2 tables, start evaluating which tables benefit most from a V3 upgrade. Tables with active deletes are the clearest win for deletion vectors. Tables carrying JSON payloads are the clearest win for the Variant type.

To run Apache Iceberg V3 tables with Autonomous Reflections, built-in AI analytics, and zero infrastructure management, start a free trial of Dremio Cloud at dremio.com/get-started.

Try Dremio Cloud free for 30 days

Deploy agentic analytics directly on Apache Iceberg data with no pipelines and no added overhead.

Start For Free

Article Topics

Dremio Blog: Open Data Insights

Sep 22, 2023 Dremio Blog: Open Data Insights

Intro to Dremio, Nessie, and Apache Iceberg on Your Laptop

We're always looking for ways to better handle and save money on our data. That's why the "data lakehouse" is becoming so popular. It offers a mix of the flexibility of data lakes and the ease of use and performance of data warehouses. The goal? Make data handling easier and cheaper. So, how do we […]

Alex Merced

Aug 16, 2023 Dremio Blog: News Highlights

5 Use Cases for the Dremio Lakehouse

With its capabilities in on-prem to cloud migration, data warehouse offload, data virtualization, upgrading data lakes and lakehouses, and building customer-facing analytics applications, Dremio provides the tools and functionalities to streamline operations and unlock the full potential of data assets.

Alex Merced

Aug 31, 2023 Dremio Blog: News Highlights

Dremio Arctic is Now Your Data Lakehouse Catalog in Dremio Cloud

Dremio Arctic bring new features to Dremio Cloud, including Apache Iceberg table optimization and Data as Code.

Jeremiah Morrow

What’s New in Apache Iceberg 1.11.0

Table of Contents

The File Format API: The Architectural Shift That Changes Iceberg's Long-Term Trajectory

Try Dremio’s Interactive Demo