Apache Hudi

Same as entry for: Hudi

‍

Apache Hudi was originally developed at Uber and was released as an open source project in 2017. Hudi is considered to be the first data lakehouse project and is today one of the three leading data lakehouse projects.

‍

Hudi was originally introduced as an “incremental data lake.” Today, Hudi is widely referred to as a data lakehouse, after the term was introduced by Databricks in 2020.

‍

A core feature of Hudi is its ability to efficiently manage incremental updates to data, whether that data is structured, unstructured, or semi-structured. This makes Hudi particularly well-suited for use with change data capture from databases and streaming data. Hudi enables this flexibility by writing metadata files in both Avro (row-based) and Apache Parquet (column-based) files for maximum flexibility.

‍

Relevant links on the Onehouse or Apache Hudi websites:

Apache Hudi

Stay in the know