whitepaper

Apache Hudi: The Definitive Guide

Apache Hudi^TM enables you to create and manage a data lakehouse; database-like capabilities - including efficient upserts, deletions, and incremental data processing - on the data lake. It is a revolutionary open source framework that transforms the way data engineers and data scientists interact with large-scale datasets.

‍

These capabilities are implemented using metadata alongside data lake file storage. Read these early release chapters from the upcoming O’Reilly book to learn what is Apache Hudi (chapter 1 in the full book), getting started with Hudi (chapter 2 in the full book), how to write to Hudi (chapter 3 in the full book), how to efficiently read from Hudi (chapter 4 in the full book), how to maintain and optimize Hudi tables (chapter 6 in the full book), and how to use Hudi Streamer for data ingestion (chapter 8 in the full book).

‍

Coverage in the full ebook includes:

How to write to Hudi
Distributed query engines and the query lifecycle
Snapshot and time travel queries
Incremental queries in latest-state and change-data-capture modes

Written by

A close up of a person wearing a blue shirt.

Kyle Weller
‍Head of Product

Apache Hudi: The Definitive Guide

Written by

Access the Whitepaper: