Download The Guide

Apache Hudi: The Definitive Guide

Whether you've been using Hudi for years, or you’re new to Hudi’s robust capabilities, this guide will help you build robust, open, and high-performing data lakehouses.

Universal Data Lakehouse hero image

Apache Hudi enables you to create and manage a data lakehouse; database-like capabilities - including efficient upserts, deletions, and incremental data processing - on the data lake. It is a revolutionary open source framework that transforms the way data engineers and data scientists interact with large-scale datasets.

These capabilities are implemented using metadata alongside data lake file storage. Read this early release chapter from the upcoming O’Reilly ebook to learn how to efficiently read from Hudi. Refer to the full ebook, available in the months ahead from O’Reilly, to learn about Hudi writes, the use of indexing, running Hudi in production, and more.

Coverage includes:

  • How to read from Hudi
  • Distributed query engines and the query lifecycle
  • Snapshot and time travel queries
  • Incremental queries in latest-state and change-data-capture modes