The Data Lakehouse at Walmart

Many Walmart teams are using data lakehouses built on Apache Hudi to power real-time ingestion pipelines, manage very large tables, and run incremental data transformations, all supporting a variety of analytics and Machine Learning use cases. Walmart has performed a thorough evaluation of different table formats and Apache Hudi came out as the preferred open source project for the data format, table format, and open source services

GET IN TOUCH

Modern data lakehouses built on Apache Hudi enable you to:

A pixel style image of a cross on a black background.

Ingest in near real-time, at scale

Easily build CDC pipelines for databases, and ingest data from streaming platforms such as Apache Kafka, at scale.

A black background with a purple and white pattern.

Create a single source of truth

Create a master copy of raw incoming data; dedupe and cleanse tables; and even enrich tables, making them accessible as needed.

A black background with a purple pattern on it.

Support any downstream query engine

Query a single copy of data with popular query engines such as BigQuery, Databricks, Presto, Snowflake, Trino, and more.

A black background with a purple and white pattern.

Interoperate with any lakehouse format

Write to and query any lakehouse format with low overhead.

A black and white photo of a computer screen.

Reduce costs by 50% or more

Save 50% or more on the cost of ingestion, table storage, and incremental processing, compared to traditional data warehouses.

one house logo

Walmart has partnered with Onehouse, the company founded by the creator of Apache Hudi, for Hudi-related services and support

Please reach out to Onehouse to discuss:

Hudi Advisory + <Free> Monitoring/Observability

Walmart team members have access to support services. Ask for a free hands-on Hudi architectural review session with our in-house experts, and add monitoring to help you optimize your deployment. For routine non-production questions and inquiries about Hudi, please email hudi-support@onehouse.ai.

Managed Hudi table services

Continuously replicate data from operational databases (including PostgreSQL, MySQL, MongoDB, SQL Server, and more) to the data lakehouse in near real-time.

Advanced cleaning

Continuously replicate data from operational databases (including PostgreSQL, MySQL, MongoDB, SQL Server, and more) to the data lakehouse in near real-time.