The Dremio Blog

Dremio Blog: Open Data Insights

Dremio Blog: Open Data Insights

Hidden Partitioning: How Iceberg Eliminates Accidental Full Table Scans

The most expensive mistake in data lake querying is the accidental full table scan: a query that reads every file because the user did not correctly reference the partition columns. In Hive, this happens constantly. In Iceberg, it is structurally impossible because users never reference partition columns at all.

Alex Merced
Dremio Blog: Open Data Insights

Semantic Layer for AI Agents: Stop Getting the Numbers Wrong

The reason so many agentic analytics projects stall at proof-of-concept is not the AI model. It is the absence of the infrastructure that would make the AI trustworthy on real data. A semantic layer is that infrastructure.

Alex Merced
Dremio Blog: Open Data Insights

MCP Server Data Lakehouse: Connect AI Agents to Your Data

The Model Context Protocol (MCP) changes this equation. An MCP server data lakehouse setup gives any compliant AI client a single, governed, structured gateway to your data. You configure it once. Every agent that follows the spec connects automatically.

Alex Merced
Dremio Blog: Open Data Insights

Apache Iceberg Small Files Problem: Causes, Fixes, and Prevention

Solving the Apache Iceberg small files problem requires addressing it at multiple layers. Detection comes first: use table_files() to establish a baseline and set thresholds that trigger action. Prevention comes next: configure write.target-file-size-bytes at the source and increase checkpoint intervals for streaming jobs.

Alex Merced
Dremio Blog: Open Data Insights

Partition Evolution: Change Your Partitioning Without Rewriting Data

Partition evolution is one of the features that makes Iceberg a safe long-term choice. It means the partitioning decision you make today is not permanent.

Alex Merced
Dremio Blog: Open Data Insights

Apache Iceberg REST Catalog: What It Is and How to Use It

From that point, all engines share a consistent view of your Iceberg tables. New tables created by Spark appear in Dremio immediately. Schema changes committed by Flink are visible to PyIceberg clients without any manual sync. The catalog handles the coordination.

Alex Merced
Dremio Blog: Open Data Insights

Apache Iceberg Partition Evolution: Change Your Partitioning Strategy Without Rewriting Data

Partition evolution is one of those features that seems minor until you need it. Then it's the difference between a two-minute metadata update and a two-day rewrite project. If you're building on Iceberg and haven't thought carefully about your partition strategy yet, the time to do that is before your table reaches 10 TB, not after.

Alex Merced
Dremio Blog: Open Data Insights

What Is Agentic Analytics? How It Differs from BI and AI Assistants

The framing that matters here: agentic analytics is not a feature you add to your existing BI stack. It is a different approach to how analytical work gets done, who does it, and at what speed.

Alex Merced
Dremio Blog: Open Data Insights

Agentic Lakehouse vs Data Lakehouse: What Actually Changes

The Agentic Lakehouse is not a different architecture from your existing lakehouse. It is four additional structural layers built on top of a foundation you have likely already built: an AI Semantic Layer, Autonomous Performance, active metadata, and agent-specific interfaces.

Alex Merced
Dremio Blog: Open Data Insights

Apache Polaris 1.5.0: Deep-Dive Into the Future of Open Data Catalogs

The release of Apache Polaris 1.5.0 marks a significant step forward in the project's evolution. This release introduces enterprise-grade security integrations, expanded catalog federation, advanced credential vending, and key performance optimizations.

Alex Merced
Dremio Blog: Open Data Insights

Agentic Lakehouse Architecture: The Four Technical Layers

This composability is what makes the Agentic Lakehouse architecture viable long-term. As Iceberg V3 adoption grows and the Polaris REST Catalog becomes the universal standard for catalog interoperability, adding a new engine or a new AI framework to your stack becomes a configuration change, not a migration project.

Alex Merced
Dremio Blog: Open Data Insights

Performance and Apache Iceberg’s Metadata

The single biggest performance advantage of Iceberg over raw data lakes is not a clever algorithm or a faster codec. It is metadata-driven data skipping. By the time a query engine begins scanning actual Parquet files, Iceberg's metadata has already eliminated 90-99% of the files from consideration.

Alex Merced
Dremio Blog: Open Data Insights

Apache Iceberg V2 vs V3: What Changed and What It Means for Your Tables

Apache Iceberg V3 is a meaningful advancement over V2, not a version bump for its own sake. Deletion vectors address the fundamental I/O cost of merge-on-read that V2 delete file accumulation creates. The Variant type eliminates one of the most common workarounds in modern data pipelines: storing JSON as strings and parsing at query time.

Alex Merced
Dremio Blog: Open Data Insights

Migrate Delta Lake to Apache Iceberg: Step-by-Step Guide

The Iceberg ecosystem is consolidating fast. REST Catalog interoperability, growing AI tooling, and the Apache governance model mean that every month you stay on Delta Lake, you are working against the direction of the industry. The migration investment pays off in engine flexibility, catalog portability, and access to a growing set of tools that assume Iceberg as the standard.

Alex Merced
Dremio Blog: Open Data Insights

What’s New in Apache Iceberg 1.11.0

Apache Iceberg 1.11.0 delivers on two fronts. The File Format API is an architectural investment whose full payoff comes over the next year or two as new format plugins ship, but it also consolidates and cleans up the engine's internal format handling today.

Alex Merced

1
2
3
…
14
Next Page »