Apache Kafka

Same as entry for: Kafka

Apache Kafka was originally developed at LinkedIn and was released as an open source project in 2011. Kafka has many capabilities, but it is best known as scalable software for real-time data streaming. Kafka is fully distributed. 

Streaming data software such as Kafka, along with batch updates and change data capture (CDC), serve as sources of data that feed into the lakehouse. The lakehouse is designed to efficiently process frequent, incremental updates such as those provided by Kafka and CDC. Kafka, for instance, supports data movement and ephemeral storage, while the lakehouse serves as a persistent storage layer optimized for incremental updates.  

Related terms: change data capture; data lakehouse; streaming data

On the Onehouse website: 

Hudi website:

Be the first to hear about news and product updates

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
We are hiring diverse, world-class talent — join us in building the future.