Onehouse secures $35M to advance open data lakehouse technology

Onehouse secures $35M to advance open data lakehouse technology - 4 minutes read

Don’t miss OpenAI, Chevron, Nvidia, Kaiser Permanente, and Capital One leaders only at VentureBeat Transform 2024. Gain essential insights about GenAI and expand your network at this exclusive three day event. Learn More

Data lakehouse vendor Onehouse is looking to expand both its commercial and open source efforts to enable interoperable data lake technologies with new funding.

Today the company announced a $35 million Series B round of funding led by Craft Ventures that includes the participation of Addition and Greylock Partners. The goal of the funding is to accelerate product development and market penetration. This latest round brings the company’s total funding to $68 million, following an initial seed round of $8 million and a $25 million Series A that was announced in Feb. 2023. Onehouse has its roots in the open source Apache Hudi technology which is an open data lake table format that was originally developed at ride sharing company Uber.

Apache Hudi is a competitive alternative to the open source Apache Iceberg and Delta Lake table formats, though the focus for Onehouse isn’t about competition but rather about interoperability. In Nov. 2023, Microsoft and Google joined Onehouse to back the OneTable open source data lake table format interoperability technology. That effort has since been moved to the Apache Software Foundation (ASF) and rebranded as Apache XTable.

With the new funding Onehouse will continue to contribute toward the development of XTable as well as advancing its Universal Data Lakehouse platform which provides an interoperable platform that enables organizations to use different table formats, data catalogs, query engines and cloud providers.

Countdown to VB Transform 2024

Join enterprise leaders in San Francisco from July 9 to 11 for our flagship AI event. Connect with peers, explore the opportunities and challenges of Generative AI, and learn how to integrate AI applications into your industry. Register Now

“We are neutral to query engines and we are neutral to different clouds,” Vinoth Chandar, CEO and Founder of Onehouse told VentureBeat. “Our job here is to bring the data and optimize the data, transform the data and put it in front of any engine, any catalog that the user picks.”

Apache XTable set to extend the open source interoperable data lake

Having multiple different data lake table formats represents a challenge to organizations, which is what XTable (formerly the OneTable project) helps to solve.

Chandar said that since the effort got Microsoft and Google’s backing in 2023, interoperability and usage has expanded. He noted that even at this relatively early stage, XTable provides interoperability across data lake table metadata in an omnidirectional way.

XTable usage by Microsoft in particular recently got a big boost. Chandar noted that at the Microsoft Build 2024 conference, the company revealed that Microsoft Fabric has an integration capability that uses XTable as a key component for translating between Snowflake writes with Apache Iceberg and Delta Lake reads, as well as vice-versa.

Apache XTable is also a core element of Onehouse’s commercial Universal Data Lakehouse platform.The universal data lakehouse is Onehouse’s managed product offering that aims to provide a neutral, efficient, and interoperable solution for data management. Chandar explained that data is ingested and transformed using Apache Hudi, then stored in a vendor-neutral format like Apache Parquet that is accessible to any query engine. It supports interoperability by allowing customers to query data stored in different table formats without performance loss.

The next generation of Apache Hudi will bring vector support to data lakes

While interoperability across different data lake table formats is important to Onehouse, so too is advancing the Apache Hudi technology that is at the foundation of its own platform.

Work is currently underway on the new Apache Hudi 1.0 release which brings a new concurrency model and work to support unstructured as well as structured data. Chandar said that an upcoming beta release for Apache Hudi 1.0 will include a new secondary indexing system, allowing indexing on non-primary keys and filtering queries using those indexes.

Perhaps even more interesting, he noted that there is work underway to add support for vector search indexes in the extensible indexing subsystem. This would allow for vector as well as text searches against data in the data lake. The goal is to turn Hudi into more of a database layer by improving indexing, query planning, and providing a database-like experience on top of data lakes.

Chandar said he expects Apache Hudi 1.0 to become generally available in the next several months.

VB Daily

Stay in the know! Get the latest news in your inbox daily

By subscribing, you agree to VentureBeat's Terms of Service.

Thanks for subscribing. Check out more VB newsletters here.

An error occured.

Source: VentureBeat

Powered by NewsAPI.org

Alexander Technology

168 views

0 points

Submitted 5 months ago