Data warehouse

A data warehouse is a repository originally designed in the 1980s for storing structured (relational) data for use in reporting and business intelligence. It has been widely used for that purpose ever since. A process called extract, transform, and load (ETL) was developed to extract data from an operational system, transform it for use in analytics (including moving the data from rowstore to columnstore databases), then load it into a columnstore database.

‍

The data warehouse was initially deployed on-premises, in the data centers of large organizations that could support expensive and operationally intensive infrastructure required to run it. Today, data warehousing software is also available in the cloud. The data warehouse is also being extended to support some less-structured data and a broader range of workloads.

‍

A data lake is often used in parallel to a data warehouse, handling less-structured data and data science, machine learning, and AI use cases. However, increasing demands on data lead to the two being interconnected, creating a complex infrastructure with duplicate data and duplication of costs and effort.

The data lakehouse, which builds on the data lake architecture to add improved updating, improved querying, and other capabilities, has the ability to replace both data lakes and data warehouses for many use cases. This simplification can reduce duplication of data, costs, and effort.

‍

Related terms: data lake; data lakehouse; ETL

‍

On the Onehouse website:

Data warehouse

Stay in the know