Apache Hudi

Same as entry for: Hudi

Apache Hudi was originally developed at Uber and was released as an open source project in 2017. Hudi is considered to be the first data lakehouse project and is today one of the three leading data lakehouse projects.

Hudi was originally introduced as an “incremental data lake.” Today, Hudi is widely referred to as a data lakehouse, after the term was introduced by Databricks in 2020. 

A core feature of Hudi is its ability to efficiently manage incremental updates to data, whether that data is structured, unstructured, or semi-structured. This makes Hudi particularly well-suited for use with change data capture from databases and streaming data. Hudi enables this flexibility by writing metadata files in both Avro (row-based) and Apache Parquet (column-based) files for maximum flexibility. 

Related terms: Apache Parquet; Avro; change data capture; data lake; data lakehouse; streaming data; metadata

Relevant links on the Onehouse or Apache Hudi websites: 

Be the first to hear about news and product updates

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
We are hiring diverse, world-class talent — join us in building the future.