Hudi-Presto Workshop
April 24th, 2024 | 9 AM PST | 12 PM EST
The workshop will leverage TPC-DS dataset in volume of 10 GB to demonstrate the various capabilities of read and write with Hudi and Presto. The dataset will be made available at a common S3 location accessible to workshop attendees.
All the required open source software and its dependencies will be pre-installed for this workshop session. Attendees will use Jupyter Notebooks to run various read and write queries on Apache Hudi using Presto and Spark SQL. Users will also have access to Spark UI and Presto UI for additional analysis and debugging.
The lakehouse architecture combines the flexibility, scalability, and cost-efficiency of data lakes with the robust data management features of data warehouses. This workshop is designed to provide data engineers & architects with a comprehensive understanding of Apache Hudi and use it to build an open lakehouse architecture on AWS S3, utilizing Presto as the engine for fast and interactive queries.
The data lakehouse is attracting greater and greater adoption. But building your own data lakehouse is challenging. Onehouse's Universal Data Lakehouse™ is a fully managed service built on open source technology. It offers interoperability with the leading lakehouse formats and compatibility with leading data stores and query engines such as Snowflake, Databricks, and Amazon Athena.
Our live webinar includes an overview of Onehouse from Founder and CEO Vinoth Chandar and a live demo of Onehouse. You’ll see how the managed lakehouse can: