Dremio Blog

8 minute read · March 20, 2026

How Dremio’s Agentic Lakehouse is Turning Data into Action

Will Martin Will Martin Technical Evangelist
Start For Free
How Dremio’s Agentic Lakehouse is Turning Data into Action
Copied to clipboard

Key Takeaways

  • The Agentic Lakehouse transforms data from a passive resource to an active participant in decision-making, drastically reducing business cycle times.
  • Dremio's Intelligent Data Discovery improved Amazon's analytics efficiency, slashing project setup times by 90% and query performance to just 4-6 seconds.
  • Dremio enables processing of unstructured data through SQL with AI functions, making previously unusable data accessible for analysis.
  • The AI Semantic Layer ensures effective communication, providing necessary business context and delivering sub-second responses for real-time interactions.
  • Companies like Amazon, NetApp, and Maersk demonstrate how Dremio's unified approach accelerates insights and enhances decision-making capabilities.

For decades, the traditional data experience has been defined by friction, with business teams frequently required to wait. Waiting for SQL experts to draft queries, waiting for ETL pipelines to refresh, and waiting for static dashboards to render. This reactive model has gone past being just a bottleneck and now represents an existential risk for the modern enterprise.

We are now entering the era of the Agentic Lakehouse. It represents a fundamental pivot from data as a passive resource to data as an active participant in decision-making. By bringing a natural-language conversational interface directly into the lakehouse, organisations are shortening business cycles from weeks to minutes. Data is no longer a destination you visit but a partner you chat with to drive immediate action.

Try Dremio’s Interactive Demo

Explore this interactive demo and see how Dremio's Intelligent Lakehouse enables Agentic AI

Intelligent Data Discovery

In my experience, new users ramping up is the real ROI killer for any data platform. When team members inherit datasets and views, they typically need days, if not weeks, to fully grasp the data estate and understand the business logic, in order to use the right data with confidence. This onboarding effort is compounded when dealing with the scale of the world's largest enterprises which have multiple teams using hundreds of datasets.

The Amazon Supply Chain Finance Analytics team faced this exact wall. Tasked with managing tens of billions of dollars in spend, they had to navigate hundreds of different tables and datasets containing billions of rows. Traditional BI tools were failing, often crashing after just 100 million rows, and the complexity required "expert-only" workflows that throttled productivity.

By implementing Dremio’s Intelligent Data Discovery, Amazon:

  • Slashed project setup times by 90%.
  • Query Performance down from 60 seconds to just 4-6 seconds.
  • 60 hours of manual work eliminated per project.

By automating the "how" of discovery, by providing semantic understanding and AI-powered explanations for inherited data, Amazon allowed its team to focus entirely on the "why" of the spend.

Unstructured Data Processing

Data lakes have three compelling features: they store huge amounts of data, they do that far cheaper than a warehouse, and they can store both structured and unstructured data.

Part of the beauty of the data lakehouse is that it leverages and benefits from these features, while delivering data warehouse quality transactions. However, there is one exception; the lakehouse, just like the warehouse, can only handle structured data. Any unstructured data (PDFs, images, and call logs) in the data lake remained unusable without specialised tools.

Historically, businesses had two options: ignore their unstructured data or invest in OCR tools, manual labeling, and separate ML pipelines that lived outside the core data platform. But now, Dremio offers a third option, by embedding Large Language Models (LLMs) directly into the SQL engine via AI Functions.

These native functions allow any SQL-literate analyst to interact with unstructured content without building a single pipeline. Examples of these functions include:

  • AI_GENERATE: turns unstructured PDFs or images into queryable, structured rows.
  • AI_CLASSIFY: high-speed categorisation, great for sentiment analysis or document labeling at scale.
  • AI_COMPLETE: intelligent text generation and summarisation, perfect for distilling narrative patterns from massive document sets.

While these functions represent the next generation of capability, the foundation of this speed is the unified lakehouse architecture. NetApp’s ActiveIQ team proved the power of this foundation by migrating from a fragmented Hadoop/MapReduce infrastructure to Dremio. By adopting a Zero-Copy Architecture and utilising Autonomous Reflections, they eliminated 33 mini-clusters and reduced their data footprint from 7PB to 3PB.

The results were staggering: NetApp achieved 95% faster time-to-insight, reducing average query times from 45 minutes to just 2 minutes. This architectural shift, combined with the new ability to process unstructured telemetry data through SQL, finally bridged their gap between raw documents and actionable intelligence.

The Semantic Layer Foundation

An AI agent is only as effective as the context it is provided. Without a robust AI Semantic Layer, an agent is effectively flying blind. It may deliver answers quickly, but those answers will lack the metric definitions and business terminology required for enterprise trust.

Dremio’s AI Semantic Layer provides the necessary business context, while Autonomous Reflections serve as the performance optimiser. This combination allows Dremio to deliver sub-second performance for real-time AI interactions. To support this at a global scale, Dremio leverages the Model Context Protocol (MCP) to ensure seamless connectivity with AI frameworks like Claude and ChatGPT.

This architecture is built on open standards, specifically Apache Iceberg, Apache Polaris, and Apache Arrow. By prioritising an open catalog and open data formats, Dremio delivers 20x performance at a significantly lower cost than proprietary data warehouses. 

The scalability of this approach is best seen in Maersk, which utilises Dremio to power 1.6 million queries per day with 99.97% uptime. By standardising on Iceberg, Maersk avoided proprietary lock-in while building a platform that serves 8,000 total users globally.

Conclusion:

The successes at Amazon, NetApp, and Maersk represent a broader shift in the ROI of data. When project setup times drop by 90% and query runtimes fall from 45 minutes to 2 minutes, the relationship between a professional and their data changes fundamentally.

Thanks to AI, data is now something users can chat with, rather than look at through a report. The Agentic Lakehouse removes the technical barriers of manual ETL and complex SQL, allowing the speed of business to finally catch up to the speed of thought. By unifying structured and unstructured data under a single, governed, and performant layer, Dremio has turned the waiting game of analytics into a rapid engine for action.

Try Dremio Cloud free for 30 days

Deploy agentic analytics directly on Apache Iceberg data with no pipelines and no added overhead.