Streaming Ingestion at Scale: Kafka to the lakehouse

Scale data ingestion like the world’s most sophisticated data teams, without the engineering burden

Feb 27, 2024 | 10am PST

February 27, 2024

Overview

Apache Kafka and the data lakehouse are used by the world’s most sophisticated data teams to ingest data at scale, quickly, and cost-effectively. For example:

Zoom uses Kafka and Hudi to ingest 100 terabytes of application logs per day
Notion saves $1.25M per year by replacing Fivetran and Snowflake with Kafka and Hudi for database CDC

Confluent Cloud and Onehouse allow every engineering team to achieve these results with a simple, fully-managed cloud service. You can now:

Ingest 100s of TB per day
With minute-level data freshness
For up to 80% lower cost
Without engineering overhead

Agenda

We’ll present an overview of the joint Confluent and Onehouse solution, and dive deeper into the main components, including:

Fully-managed ingestion pipelines that can be configured in minutes
Confluent Connectors to ingest from virtually any data source
Database CDC for replication with minute-level freshness
Data lakehouse designed for high-performance updates and deletes
Interoperability with all your query engines including Snowflake, Databricks, BigQuery, and more, allowing you to ingest just once and use the data for all your analytics use cases

Our live demonstration will showcase the integration of multiple data sources with Confluent Cloud into the Onehouse universal data lakehouse. You’ll see how easy it is to get started in just minutes.

Why Attend?

Eradicate ETL Bottlenecks: Discover how moving from traditional ETL tools to streaming ingestion can provide minute-level data freshness, at scale.
Future-Proof Your Data Architecture: See how a single architecture with Confluent Cloud for stream processing and the universal data lakehouse ensures your pipelines withstand the test of time.
Reduce Data Management Costs: Learn why preparing data for use cases such as business intelligence or AI/ML inside the data lakehouse reduces expenses by leveraging the cost-efficiency of data lakes with the analytical power of data warehouses.
Adopt a Simple, Unified Strategy: Explore a data lakehouse solution that supports rapid updates and deletes, accommodates every data use case, and seamlessly works with any compute or query engine.

Your Presenters:

Will LaForest

Field CTO

Cameron O'Rourke ‍

Technical Product Marketing

Your Moderator: