Data lake

A data lake is a repository for storing structured, semi-structured, and unstructured data, at any scale, without changing the structure of the data. The data lake uses object storage, which is very efficient for a wide range of data types and usually inexpensive. Data in the lake can be accessed for analytics and data science. 

The data lake was introduced in 2011 - well after the data warehouse, which was introduced in the 1970s. Originally, the emphasis for the data lake was on less-structured data, in contrast to the structured data that the data warehouse was designed for. 

The data lake was originally used mainly for big data processing and data science-type analytics workloads, including machine learning and AI. Today, data lakes also increasingly support data warehouse-type analytics, such as real-time dashboards and visualizations, for both structured and less-structured data. 

The data lakehouse builds on the data lake architecture to add improved updating, improved querying, and other capabilities. This gives the data lakehouse the ability to replace both data lakes and data warehouses for many use cases. This simplification can reduce duplication of data, costs, and effort. 

Related terms: data lakehouse; data warehouse

On the Onehouse website: 

Be the first to hear about news and product updates

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
We are hiring diverse, world-class talent — join us in building the future.