Galileo Empowers Companies to Run Machine Learning Better, Faster

"Every company will rely on machine learning in some shape or form"

The founders of Galileo believe that every company will be a machine learning company.

“Machine learning right now is becoming pervasive,” says Galileo cofounder Vikram Chatterji. “It’s like what software engineering used to be. Every company will rely on machine learning in some shape or form.”

But for companies to get the most out of machine learning, there must be a better way to select the high-quality data that ML models need to deliver valuable results. This is a problem that many companies are grappling with already—and it will only escalate as ML becomes more impactful within organizations.

A perfect example is Uber. It started off with five ML models in production. In just two and a half years, that number exploded to more than a thousand models. And, as the models increased, supplying high quality data to these models became a critical determinant of the success of the product.

“It’s hard to start manually tweaking every single model and make sure that it has high data quality,” says Chatterji. “Data is the lifeblood of ML models but data scientists had no tooling at all for figuring out what’s the right kind of data that should be going into these models. How do you make sure that the quality of the data is high?”

A giant leap for data scientists

This is the question that the three cofounders of Galileo are solving. And they are the perfect people to do it. Before launching the company, Chatterji was heading up a product management team at Google AI. His cofounder Atindriyo Sanyal led engineering teams at Uber AI. The two were also best friends in high school. Their third cofounder, Yash Sheth, was in charge of Google Speech Recognition Platform.

“The three of us noticed that, as our teams were building these ML models, it was very difficult to figure out what data should be going into the models,” says Chatterji. “We had a big pain point ourselves and we fought hard to solve that at Google and Uber. And we realized that this problem is only going to balloon over time.”

Galileo has built a series of data intelligence tools to enable data engineers and data scientists be more efficient in handling and cleaning up the data used for ML training and production. Galileo offers a giant leap forward because these highly paid data experts often spend over 80% of their time simply cleaning up data rather than developing the algorithms needed to take the business forward.

Galileo is purpose-built for ML teams to develop better-quality models faster. ML models can be opaque in understanding what data they didn’t perform well on and why. Galileo provides a host of tools ML teams can use to inspect and find ML data errors 10x faster by sifting through labelled or unlabelled data to automatically identify error patterns and data gaps in their models.

“Today, the way that data scientists work with data quality is the most soul-sucking part of their job,” explains Chatterji. “They have to manually sort through Excel sheets and Python scripts. We want to completely flip the narrative by using data-centric algorithms, some of which come out of our own work at Google and Uber, and productize all of that so that the user gets this intelligent, amazing experience. By adding just a few lines of code, they can uncover all the different errors that would have otherwise taken them many weeks to figure out.”

Ready to own all data modalities

Galileo has chosen to focus its product first on unstructured data, such as text, images and speech. In terms of use cases for the technology, Galileo is now being embraced by companies that are using chatbot and conversational AI across their customer experience and sales channels. These are great use cases because natural language processing (NLP) teams tend to struggle with figuring out the right data to work with, says Chatterji.

And while the product is currently geared toward NLP data scientists, going forward Galileo can quickly expand into computer vision, speech and eventually even structured data, thereby increasing its market size and owning all the different data modalities.

Right now, Galileo is taking a freemium approach to the market, offering a scaled-down version of the product for free to individual data scientists. The thinking is that once they try the freemium product they’ll want a more robust version with additional features that can be deployed at an enterprise-wide level and deliver more long-term value.

An ideal partner in Walden Catalyst

After raising a $5.1 million seed round from the Factory, Galileo secured $18 million in a Series A funding round led by Walden Catalyst and Battery Ventures.

“We’re thrilled to partner with Walden Catalyst. In truth, we’ve had a relationship with Lip-Bu Tan and Shankar Chandran from the Walden team since day one of this company,” says Chatterji. “The Walden Catalyst team deeply understands our problem space. They are extremely technical and they have experience in investing in other ML, AI and data-focused companies. With Walden Catalyst, it isn’t just the dollar capital, it’s also the intellectual capital we get, which is invaluable.”