Kafka

Entry is same as for: Apache Kafka

‍

Apache Kafka was originally developed at LinkedIn and was released as an open source project in 2011. Kafka has many capabilities, but it is best known as scalable software for real-time data streaming. Kafka is fully distributed.

‍

Streaming data software such as Kafka, along with batch updates and change data capture (CDC), serve as sources of data that feed into the lakehouse. The lakehouse is designed to efficiently process frequent, incremental updates such as those provided by Kafka and CDC. Kafka, for instance, supports data movement and ephemeral storage, while the lakehouse serves as a persistent storage layer optimized for incremental updates.

‍

On the Onehouse website:

Powering Real-Time Analytics with Confluent Kafka and Onehouse
Hudi Sink Connector for Kafka

Hudi website:

Apache Hudi

Kafka

Stay in the know