Dremio Blog: Open Data Insights
-
Dremio Blog: Open Data InsightsHidden Partitioning: How Iceberg Eliminates Accidental Full Table Scans
The most expensive mistake in data lake querying is the accidental full table scan: a query that reads every file because the user did not correctly reference the partition columns. In Hive, this happens constantly. In Iceberg, it is structurally impossible because users never reference partition columns at all. -
Dremio Blog: Open Data InsightsSemantic Layer for AI Agents: Stop Getting the Numbers Wrong
The reason so many agentic analytics projects stall at proof-of-concept is not the AI model. It is the absence of the infrastructure that would make the AI trustworthy on real data. A semantic layer is that infrastructure. -
Dremio Blog: Open Data InsightsMCP Server Data Lakehouse: Connect AI Agents to Your Data
The Model Context Protocol (MCP) changes this equation. An MCP server data lakehouse setup gives any compliant AI client a single, governed, structured gateway to your data. You configure it once. Every agent that follows the spec connects automatically. -
Dremio Blog: Open Data InsightsApache Iceberg Small Files Problem: Causes, Fixes, and Prevention
Solving the Apache Iceberg small files problem requires addressing it at multiple layers. Detection comes first: use table_files() to establish a baseline and set thresholds that trigger action. Prevention comes next: configure write.target-file-size-bytes at the source and increase checkpoint intervals for streaming jobs. -
Dremio Blog: Open Data InsightsPartition Evolution: Change Your Partitioning Without Rewriting Data
Partition evolution is one of the features that makes Iceberg a safe long-term choice. It means the partitioning decision you make today is not permanent. -
Dremio Blog: Open Data InsightsApache Iceberg REST Catalog: What It Is and How to Use It
From that point, all engines share a consistent view of your Iceberg tables. New tables created by Spark appear in Dremio immediately. Schema changes committed by Flink are visible to PyIceberg clients without any manual sync. The catalog handles the coordination. -
Dremio Blog: Open Data InsightsApache Iceberg Partition Evolution: Change Your Partitioning Strategy Without Rewriting Data
Partition evolution is one of those features that seems minor until you need it. Then it's the difference between a two-minute metadata update and a two-day rewrite project. If you're building on Iceberg and haven't thought carefully about your partition strategy yet, the time to do that is before your table reaches 10 TB, not after. -
Dremio Blog: Open Data InsightsWhat Is Agentic Analytics? How It Differs from BI and AI Assistants
The framing that matters here: agentic analytics is not a feature you add to your existing BI stack. It is a different approach to how analytical work gets done, who does it, and at what speed. -
Dremio Blog: Open Data InsightsAgentic Lakehouse vs Data Lakehouse: What Actually Changes
The Agentic Lakehouse is not a different architecture from your existing lakehouse. It is four additional structural layers built on top of a foundation you have likely already built: an AI Semantic Layer, Autonomous Performance, active metadata, and agent-specific interfaces. -
Dremio Blog: Open Data InsightsApache Polaris 1.5.0: Deep-Dive Into the Future of Open Data Catalogs
The release of Apache Polaris 1.5.0 marks a significant step forward in the project's evolution. This release introduces enterprise-grade security integrations, expanded catalog federation, advanced credential vending, and key performance optimizations. -
Dremio Blog: Open Data InsightsAgentic Lakehouse Architecture: The Four Technical Layers
This composability is what makes the Agentic Lakehouse architecture viable long-term. As Iceberg V3 adoption grows and the Polaris REST Catalog becomes the universal standard for catalog interoperability, adding a new engine or a new AI framework to your stack becomes a configuration change, not a migration project. -
Dremio Blog: Open Data InsightsPerformance and Apache Iceberg’s Metadata
The single biggest performance advantage of Iceberg over raw data lakes is not a clever algorithm or a faster codec. It is metadata-driven data skipping. By the time a query engine begins scanning actual Parquet files, Iceberg's metadata has already eliminated 90-99% of the files from consideration. -
Dremio Blog: Open Data InsightsApache Iceberg V2 vs V3: What Changed and What It Means for Your Tables
Apache Iceberg V3 is a meaningful advancement over V2, not a version bump for its own sake. Deletion vectors address the fundamental I/O cost of merge-on-read that V2 delete file accumulation creates. The Variant type eliminates one of the most common workarounds in modern data pipelines: storing JSON as strings and parsing at query time. -
Dremio Blog: Open Data InsightsMigrate Delta Lake to Apache Iceberg: Step-by-Step Guide
The Iceberg ecosystem is consolidating fast. REST Catalog interoperability, growing AI tooling, and the Apache governance model mean that every month you stay on Delta Lake, you are working against the direction of the industry. The migration investment pays off in engine flexibility, catalog portability, and access to a growing set of tools that assume Iceberg as the standard. -
Dremio Blog: Open Data InsightsWhat’s New in Apache Iceberg 1.11.0
Apache Iceberg 1.11.0 delivers on two fronts. The File Format API is an architectural investment whose full payoff comes over the next year or two as new format plugins ship, but it also consolidates and cleans up the engine's internal format handling today.
- 1
- 2
- 3
- …
- 14
- Next Page »