Distributional: Empowering Trustworthy AI in the Enterprise

By Frances Schwiep and Vin Sachidananda on October 8, 2024

Air Canada has to honor a bogus refund policy its chatbot made up

AI chatbot calls itself useless, writes elaborate poem about its shortcomings, and says it works for the worst delivery firm in the world

Three major AI players’ chatbots and image generators go “off the rails,” spitting out gibberish for hours, turning historically white figures into multiracial characters, and telling users “I’m your enemy”.

These aren’t hypothetical scenarios or isolated incidents spread over years. They’re real headlines and events that occurred within a single month in 2024.

With enterprise AI adoption rates hitting an all-time high, such AI ‘going off the rails’ has become so common that insurance companies are scrambling to develop policies specifically for AI-related incidents. This surge isn’t surprising given the increase in AI-related incidents, which one study found grew by 26x between 2012 and 2021 – a trend that has likely accelerated since the start of the current AI boom in 2022.

From PR nightmares to lawsuits and worse, the risks for enterprises deploying both enterprise- and consumer-facing AI applications have never been higher. Meanwhile, firms face enormous pressure to adopt this breakout technology or risk getting left behind.

This is the AI Confidence Gap in action: the paralyzing tension between the need to innovate with AI and the fear of its potential pitfalls.

AI’s Achilles Heel: Outdated Testing Methods

It’s hard to believe that in an era of such advanced technology, AI systems and applications still frequently lack the systematic tooling common in other mission-critical software. Shockingly, many AI workflows rely on ad-hoc, manual troubleshooting and unscalable spot-checking instead of robust, automated processes. The problem is compounded by the fact that conventional software testing methods are ill-suited for AI. Take Datadog and New Relic as an example; while fitting for deterministic software, AI applications are inherently probabilistic (constantly changing). Consider how consumers can now generate an almost infinite variety of content – from song lyrics to videos – using AI. This same flexibility means that AI responses can vary widely even for similar inputs. As AI models continuously learn and evolve, they become “moving targets” for testing. This fundamental difference demands a new approach to testing and validation – one specifically tailored to the unique challenges of AI applications.

Recognizing this challenge, we wrote a thesis in 2021 predicting a billion-dollar opportunity in AI-specific testing and evaluation frameworks. Now, three years later, we believe we’ve reached a tipping point. The recent explosion of AI applications, fueled by newly accessible foundation models, has created a market ripe for adopting these specialized tools. The AI development landscape is finally ready for a revolution in testing tooling.

Bridging the AI Confidence Gap with Distributional

Distributional, founded by Scott Clark, Nick Payton, Michael McCourt and a team of veterans from their first startup, aims to tackle the AI testing challenge head-on with a scalable, easy-to-integrate testing suite designed specifically for Generative AI models. They are building a solution to allow enterprises to monitor and understand changes to AI application behavior to improve confidence and reliability. By implementing tests that seamlessly integrate into CI/CD and monitoring pipelines, Distributional aims to ensure AI outputs consistently behave as expected by falling within acceptable distributions. And their platform enables this throughout the AI software lifecycle – from development to production. We expect this approach to bridge the gap between traditional software testing and the unique needs of probabilistic AI systems.

The Distributional Difference

Over the years, the market has been flooded with tools claiming to measure the safety, observability, and robustness of AI. Yet, in our experience, most fall short. Why? These offerings typically focus on input-output behavior and tracking metrics over arbitrary evaluation sets. What organizations truly need goes beyond simple benchmarking. They require the ability to integrate extensible, proactive testing methodologies directly into their production pipelines.

It’s been clear from our years of conversations with leading AI teams — from Apple to Disney to Tesla — that there is a consistent gap in tooling needed to evaluate and test, highlighting two primary pain points:

1.  For AI researchers: Existing tools lack the depth to handle complex operations such as:

  • Changing models
  • Multi-step reasoning
  • Function-calling and tool use
  • Referencing external datastores
  • Automation around test creation
  • Completeness and depth of statistical testing

2. For AI engineers: Current solutions struggle to integrate with the software development lifecycle, particularly in areas like:

  • Rollbacks and versioning
  • Integration
  • Security measures
  • Repeatability of the testing process
  • Standardization of testing across all types of AI/ML components
  • Visibility and governance around AI application productionalization

Distributional is built differently. 

Distributional’s approach is informed by a decade of experience at SigOpt, the founders’ first startup––an optimization platform used to run ML modeling and simulations––later acquired by Intel. That experience taught the team a crucial lesson: production-ready AI pipelines require flexible testing tools that seamlessly integrate into both R&D environments and deployment workflows. Having addressed similar pain points at scale for some of the most sophisticated organizations globally, the Distributional team brings an approach to the current AI testing landscape built on a foundation of real-world experience.

What’s more is their vision. Distributional is building an extensible platform constructed as a closed-feedback loop – meaning, evaluation would be performed across every point in the model pipeline, additional testing is conducted after initial analysis of metrics, and the final results are used to determine programmatically which components are satisfactory for deployment. This approach paves the way for a future of automated testing, alerting, and observability that leverages the full context of deployments and versioning. Ultimately, it aims to enable self-correcting model pipelines with real-time optimizations, revolutionizing how enterprises manage and improve their AI systems.

Distributional in Action: Solving Critical AI Challenges Across Industries

Some of the largest companies in consumer software, finance, biotech, and media are all partnering up with Distributional’s testing and evaluation platform to address critical pain points around the safety and robustness of generative AI.  Distributional’s platform is designed to answer and address several critical questions like:

  • Are model pipelines/agents behaving as expected?
  • What’s changing over time?
  • Are there errors that need to be corrected?
  • How can we handle fail-safes/versioning/rollbacks?
  • What performance is needed for a set of models to make it to production?
  • Are business constraints being satisfied?

A massive shift towards comprehensive AI testing for data-driven and AI-native products is imminent. With their uniquely experienced team and product built off of the first-hand needs of the market, we have strong conviction the Distributional team is uniquely positioned to lead this transformation. It’s a privilege to back Distributional as they pave the way for a future where enterprise AI is not just powerful, but consistently reliable, safe, and trustworthy.

You can learn more about Distributional here.

Subscribe to our newsletter to get updated when new pieces are published.

*See all TSV investments here.

The views expressed herein are solely the views of the author(s), are as of the date they were originally posted, and are not necessarily the views of Two Sigma Ventures, LP, or any of its affiliates. They are not intended to provide, and should not be relied upon for, investment advice, nor is any information herein any offer to buy or sell any security or intended as the basis for the purchase or sale of any investment. The information herein has not been and will not be updated or otherwise revised to reflect information that subsequently becomes available, or circumstances existing or changes occurring after the date of preparation. Certain information contained herein is based on published and unpublished sources. The information has not been independently verified by TSV or its representatives, and the accuracy or completeness of such information is not guaranteed. Your linking to or use of any third-party websites is at your own risk. Two Sigma Ventures disclaims any responsibility for the products or services offered or the information contained on any third-party websites.