Apache Parquet

Entry is same as: Parquet

Apache Parquet is an open source file format that stores data in column-based format, making it more useful for many analytics operations. This is in contrast to data stored in row-based formats, such as data stored in Avro, which are easier to use for record-keeping and for transactions. 

Columnar file formats such as Parquet are often easy to compress using multiple approaches, yielding great savings in file size. Parquet files also contain metadata such as the minimum and maximum values in a specific column in a specified group of rows to make relevant analytics queries more efficient. 

All data lakehouse projects use Parquet, due to the operational efficiencies yielded by smaller file sizes and the suitability of columnar files for many queries. 

Related terms: Avro; data lakehouse; metadata 

On the Onehouse website: 

Be the first to hear about news and product updates

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
We are hiring diverse, world-class talent — join us in building the future.