Data management private company Boomi announced the acquisition of Israeli data integration startup Rivery today. The financial details of the deal were not disclosed but Calacalist reports that market estimates place the value at around $100 million.
Data is eating the world and giving rise to startups helping enterprises automate the flow of data from myriad, distributed internal and external data sources to new generative AI or GenAI tools. To rapidly benefit from AI, many enterprises are engaged in a vast project of re-architecting their IT infrastructures, giving a prominent place to the "data stack" -- connecting and integrating internal data silos and external data sources. "Without data, there's no AI, so we are helping our customers bring the data into AI engines to speed up their adoption," Itamar Ben Hemo, co-founder and CEO of Rivery, told me last month.
A recent Menlo Ventures survey of corporate executives found that 72% anticipate broader adoption in the near future of generative AI tools. Retrieval-augmented generation or RAG now dominates adoption at 51%, up from 31% last year. RAG allows large language models (e.g., ChatGPT) to access information outside of their training data, which can make their outputs more relevant and accurate (i.e., reduced "hallucinations"). The survey revealed a "strong drive to unlock and harness the valuable knowledge hidden within data silos scattered across organizations... Across the stack, we see demand for technologies purpose-built to meet the needs of modern AI."
GenAI hallucinations are a key challenge that must be addressed to ensure a smooth adoption. "Fixing leaky data pipelines is critical to high GenAI accuracy," says Ben Hemo. "A GenAI app is only as good as the data that's fed into it." If the data is inconsistent, inaccurate, or missing, enterprise users of the GenAI application will very likely get inaccurate or even misleading results.
RAG helps solve this problem but also represents a new data integration challenge. In the past, analytical tools used by business executives to support and inform their decisions and actions, mostly "ingested" so-called "structured data." As in data stored in a spreadsheet, it had a pre-determined, well-defined structure that worked optimally with traditional business intelligence and decision-making support tools. The sources used by RAG also include vast volumes of "unstructured data," as in emails, PDFs, slack messages, sales calls, videos, or free-form fields of text collected by different applications.
"Ingested unstructured data needs to be handled in a way that is different from classic data modeling for analytics purposes," observes Ben Hemo. The traditional extract, load and transform or ELT process -- moving raw data from a source to a destination, such as a data warehouse -- doesn't work with the new AI tools and infrastructure. There are new data sources to extract data from and new target locations to load the data into. In addition, the data needs to be transformed in new ways, unlike the data modeling techniques that were designed for traditional data analytics.
The new approach to ELT is an integral part of the modern AI-centric data processing infrastructure. It leverages machine learning technologies to automate key data processing tasks such as data cleaning, feature extraction, and synthetic data generation, helping enterprises handle complex, multimodal datasets effectively. However, these advanced capabilities come with demanding infrastructure requirements.
Tradition begets innovation when you are ready to adapt quickly to rapid change. In 2019, Ben Hemo co-founded Rivery as a modern data integration platform after a long career in traditional business intelligence and business analytics consulting firms. With the advent of GenAI tools, Rivery has quickly adapted them to improve the productivity of its data engineers and the products and services it provides to its customers. In September, for example, Rivery introduced Ask AI, an AI agent that offers instant answers to technical questions.
"Today, customers are getting a handle on all the data they have inside and the data they're bringing from outside," says Ben Hemo. "We see ourselves as a strong partner of our customers in the transition to AI."