The Data Lakehouse at Walmart
Many Walmart teams are using data lakehouses built on Apache Hudi to power real-time ingestion pipelines, manage very large tables, and run incremental data transformations, all supporting a variety of analytics and Machine Learning use cases. Walmart has performed a thorough evaluation of different table formats and Apache Hudi came out as the preferred open source project for the data format, table format, and open source services
Modern data lakehouses built on Apache Hudi enable you to:
Ingest in near real-time, at scale
Easily build CDC pipelines for databases, and ingest data from streaming platforms such as Apache Kafka, at scale.
Create a single source of truth
Create a master copy of raw incoming data; dedupe and cleanse tables; and even enrich tables, making them accessible as needed.
Support any downstream query engine
Query a single copy of data with popular query engines such as BigQuery, Databricks, Presto, Snowflake, Trino, and more.
Interoperate with any lakehouse format
Write to and query any lakehouse format with low overhead.
Reduce costs by 50% or more
Save 50% or more on the cost of ingestion, table storage, and incremental processing, compared to traditional data warehouses.
Walmart has partnered with Onehouse, the company founded by the creator of Apache Hudi, for Hudi-related services and support
Please reach out to Onehouse to discuss:
Hudi Advisory + <Free> Monitoring/Observability
Walmart team members have access to support services. Ask for a free hands-on Hudi architectural review session with our in-house experts, and add monitoring to help you optimize your deployment. For routine non-production questions and inquiries about Hudi, please email hudi-support@onehouse.ai.
Managed Hudi table services
Continuously replicate data from operational databases (including PostgreSQL, MySQL, MongoDB, SQL Server, and more) to the data lakehouse in near real-time.
Advanced cleaning
Continuously replicate data from operational databases (including PostgreSQL, MySQL, MongoDB, SQL Server, and more) to the data lakehouse in near real-time.