If you’ve been following Onehouse, you’ll know that the past year has marked many important milestones on our journey to help every organization build an open data architecture. From kickstarting an industry conversation around data interoperability with the release of OneTable, now Apache XTable (Incubating) last year, to keynoting the first Open Source Data Summit, to sharing some of our customer journeys, it is clear that we’ve made great progress towards our vision.
Today, I’m excited to share that we have raised a $35M Series B led by Craft Ventures, with participation from existing investors Addition and Greylock Partners, which will help us accelerate our pace of innovation and product development. Along with this announcement, we’re also launching two new products: LakeView, a free lakehouse observability tool for the OSS community, and Table Optimizer, which automates data lakehouse optimizations.
Please check out these additional resources:
But first, let’s go back to the vision. We believe users need an open data architecture. Why? Because it’s the only way for organizations to unlock their most valuable asset - their data - and put it to productive use in support of existing and emerging use cases. Whether Generative AI, predictive ML, real-time analytics, or traditional BI - there’s too much innovation in the data space to lock data into a single, vertically integrated platform. Users must be free to choose the right tool for the use case. Over almost two decades, it's been proven that “one size fits all” is an idea whose time has come and gone.
Traditional databases and data warehouses have always been tightly integrated systems. Storage and processing are tightly coupled and optimized for performance. However, there’s a significant downside to this coupling. The biggest decision we see users regretting is choosing their compute engine vendor first and then having that choice dictate, “top-down,” their storage and data management options. An integrated data platform locks ingestion, storage, ELT pipelines, catalog and data services to the chosen engine - a cloud warehouse, database, or data lake engine. Reusing or sharing data with other processing engines is difficult or impossible.
Instead, we should think “bottom-up,” starting from the data, which is, after all, the enduring asset in the stack. This requires an open data architecture, where storage and data management are decoupled from the query engine. In this scenario, data is ingested, transformed, and managed just once. Any engine can then query it, and users can take the time to evaluate multiple engines properly, apply the right engine for the right use case, and even migrate between engines easily. Users free their data from lock-in, create a single source of truth, and can then bring it to any use case. Perhaps most importantly, this approach future-proofs your data architecture. For example, who would have guessed five years ago that vector databases would become so important to support RAG applications?
That’s our vision for the data lakehouse and our original motivation for building the first lakehouse nearly a decade ago at Uber. Though it has taken a while, thanks to all the shifts and twists in the data ecosystem, this vision can now be a reality broadly across the industry. Compared to 2021, when the world was divided into two camps—data lakes and data warehouses—we’re entering a great convergence into an open data lakehouse as the center of all data gravity.
While the vision is clear, getting from our legacy state to that vision is much harder. It requires openness and interoperability at all layers of the stack. While openness is absolutely necessary, it's not sufficient. Without a strong commitment from vendors on interoperability and compatibility, we risk fracturing data into silos again, even on top of open formats. While data lock-in is at the top of everyone’s mind, compute lock-in can be even worse.
We specifically still need to solve for the following:
So, what does this new funding mean? First, I’d like to thank our investors at Craft Ventures, Addition, and Greylock Partners for believing in our vision, along with our early customers, who have given this vision meaning.
As Michael Robinson of Craft Ventures said, “Onehouse enables organizations to deploy a lakehouse in a matter of minutes – a critical function as the data lakehouse has become the standard architecture to centralize data and power new services like real-time analytics, predictive ML, and GenAI. One day, every organization will be able to take advantage of truly open data platforms, and Onehouse is at the center of this transformation.”
We will continue to invest in the core Onehouse platform and open-source projects such as Hudi and XTable. Expect to see our development team grow quickly but carefully. You can expect to see more market presence from Onehouse, as we assemble a world-class go-to-market team to bring this very unique product to our customers.
If it sounds like an exciting time to be part of Onehouse, it is. It's also a great time to join us on this mission to make data open.
Be the first to read new posts