Product Hunt logo dark
  • Launches
    Coming soon
    Upcoming launches to watch
    Launch archive
    Most-loved launches by the community
    Launch Guide
    Checklists and pro tips for launching
  • Products
  • News
    Newsletter
    The best of Product Hunt, every day
    Stories
    Tech news, interviews, and tips from makers
    Changelog
    New Product Hunt features and releases
  • Forums
    Forums
    Ask questions, find support, and connect
    Streaks
    The most active community members
    Events
    Meet others online and in-person
  • Advertise
Subscribe
Sign in
Subscribe
Sign in
Evidently AI

Evidently AI

Collaborative AI observability platform

4.0
•3 reviews•

397 followers

Collaborative AI observability platform

4.0
•3 reviews•

397 followers

Visit website
Predictive AI
•
AI Infrastructure Tools
•
AI Metrics and Evaluation
Evidently helps evaluate, test and monitor your AI-powered products. From ML-based classifiers to LLM chatbots and agents. Built on top of the leading open-source library with over 20 million downloads: https://github.com/evidentlyai/evidently
  • Overview
  • Launches2
  • Reviews3
  • Alternatives
  • Team
  • Awards
  • More
Company Info
evidentlyai.comGitHub
Evidently AI Info
Launched in 2021View 2 launches
Forum
p/evidently-ai
Awards
Evidently AI was ranked #3 of the day for August 4th, 2021
View all
  • Blog
  • •
  • Newsletter
  • •
  • Questions
  • •
  • Forums
  • •
  • Product Categories
  • •
  • Apps
  • •
  • About
  • •
  • FAQ
  • •
  • Terms
  • •
  • Privacy and Cookies
  • •
  • X.com
  • •
  • Facebook
  • •
  • Instagram
  • •
  • LinkedIn
  • •
  • YouTube
  • •
  • Advertise
© 2025 Product Hunt
SocialX

Similar Products

TensorFlow
TensorFlow
An end-to-end open source machine learning platform
4.8(10 reviews)
Automation toolsAI Infrastructure Tools
Apple
Apple
Think Different
4.7(108 reviews)
Google Cloud Platform
Google Cloud Platform
A suite of cloud computing services by Google
5.0(129 reviews)
Engineering & DevelopmentWeb hosting services
Microsoft Azure
Microsoft Azure
Optimize your costs by developing in the cloud.
5.0(38 reviews)
Cloud Computing Platforms
Best of Machine Learning
Best of Machine Learning
A collection of the best resources in Machine Learning & AI
Knowledge base softwareAI
View more
This is the 2nd launch from Evidently AI. View more
Evidently AI

Evidently AI

Open-source evaluations and observability for LLM apps
Evidently AI was ranked #5 of the day for August 20th, 2024
Evidently is an open-source framework to evaluate, test and monitor AI-powered apps.

📚 100+ built-in checks, from classification to RAG.
🚦 Both offline evals and live monitoring.
🛠 Easily add custom metrics and LLM judges.
Evidently AI gallery image
Evidently AI gallery image
Evidently AI gallery image
Evidently AI gallery image
Evidently AI gallery image
Evidently AI gallery image
Free
Launch tags:
Open Source•Developer Tools•Artificial Intelligence
Launch Team / Built With
Michael SeibelElena SamuylovaEmeli Dral
GitHub
Hugging Face
ChatGPT by OpenAI

What do you think? …

Elena Samuylova
Elena Samuylova
Evidently AI

Evidently AI

Maker
📌
Hi Makers! I'm Elena, a co-founder of Evidently AI. I'm excited to share that our open-source Evidently library is stepping into the world of LLMs! 🚀 Three years ago, we started with testing and monitoring for what's now called "traditional" ML. Think classification, regression, ranking, and recommendation systems. With over 20 million downloads, we're now bringing our toolset to help evaluate and test LLM-powered products. As you build an LLM-powered app or feature, figuring out if it's "good enough" can be tricky. Evaluating generative AI is different from traditional software and predictive ML. It lacks clear criteria and labeled answers, making quality more subjective and harder to measure. But there is no way around it: to deploy an AI app to production, you need a way to evaluate it. For instance, you might ask: - How does the quality compare if I switch from GPT to Claude? - What will change if I tweak a prompt? Do my previous good answers hold? - Where is it failing? - What real-world quality are users experiencing? It's not just about metrics—it's about the whole quality workflow. You need to define what "good" means for your app, set up offline tests, and monitor live quality. With Evidently, we provide the complete open-source infrastructure to build and manage these evaluation workflows. Here's what you can do: 📚 Pick from a library of metrics or configure custom LLM judges 📊 Get interactive summary reports or export raw evaluation scores 🚦 Run test suites for regression testing 📈 Deploy a self-hosted monitoring dashboard ⚙️ Integrate it with any adjacent tools and frameworks It's open-source under an Apache 2.0 license. We build it together with the community: I would love to learn how you address this problem and any feedback and feature requests. Check it out on GitHub: https://github.com/evidentlyai/e..., get started in the docs: http://docs.evidentlyai.com or join our Discord to chat: https://discord.gg/xZjKRaNp8b.
Report
12mo ago
Joseph Abraham
Joseph Abraham
SaaS for Greater Good

SaaS for Greater Good

@elenasamuylova Congrats on bringing your idea to life! Wishing you a smooth and prosperous journey. How can we best support you on this journey?
Report
12mo ago
Elena Samuylova
Elena Samuylova
Evidently AI

Evidently AI

Maker
@kjosephabraham Thanks for the support! We always appreciate any feedback and help in spreading the word. As an open-source tool, it is built together with the community! 🚀
Report
12mo ago
Emeli Dral
Emeli Dral
Evidently AI

Evidently AI

Maker
Hi everyone! I am Emeli, one of the co-founders of Evidently AI. I'm thrilled to share what we've been working on lately with our open-source Python library. I want to highlight a specific new feature of this launch: LLM judge templates. LLM as a judge is a popular evaluation method where you use an external LLM to review and score the outputs of LLMs. However, one thing we learned is that no LLM app is alike. Your quality criteria are unique to your use case. Even something seemingly generic like "sentiment" will mean something different each time. While we do have templates (it's always great to have a place to start), our primary goal is to make it easy to create custom LLM-powered evaluations. Here is how it works: 🏆 Define your grading criteria in plain English. Specify what matters to you, whether it's conciseness, clarity, relevance, or creativity. 💬 Pick a template. Pass your criteria to an Evidently template, and we'll generate a complete evaluation prompt for you, including formatting it as JSON and asking the LLM to explain its scores. ▶️ Run evals. Apply these evaluations to your datasets or recent traces from your app. 📊 Get results. Once you set a metric, you can use it across the Evidently framework. You can generate visual reports, run conditional test suites, and track metrics in time on a dashboard. You can track any metric you like - from hallucinations to how well your chatbot follows the brand guidelines. We plan to expand on this feature, making it easier to add examples to your prompt and adding more templates, such as pairwise comparisons. Let us know what you think! To check it out, visit our GitHub: https://github.com/evidentlyai/e..., docs http://docs.evidentlyai.com or Discord to chat: https://discord.gg/xZjKRaNp8b.
Report
12mo ago
Emeli Dral
Emeli Dral
Evidently AI

Evidently AI

Maker
@hamza_afzal_butt Thank you so much!
Report
12mo ago
Rod Rivera
Rod Rivera
Congratulations on the launch, Evidently team! I've always admired Evidently for its comprehensiveness and all-encompassing approach framework. I often work with teams who are unsure about what metrics to focus on or how to begin their evaluation process. For those new or unsure where to start: * What best practices would you recommend? * Is there a feature that helps beginners 'set things on autopilot' while they're learning the ropes? * Do you offer any guided workflows or templates for common use cases that could help newcomers get started quickly? Thanks for your continued innovations in this space!
Report
12mo ago
Elena Samuylova
Elena Samuylova
Evidently AI

Evidently AI

Maker
@rorcde @rorcde Thanks for the support! :🙏🏻  Quickstart: We have a simple example here: https://docs.evidentlyai.com/get.... It will literally take a couple of minutes!  We packaged some popular evaluations as presets and general metrics (like detecting Denials). However, we generally encourage using your custom criteria—no LLM app is exactly alike, and the beauty of using LLM as a judge is that you can use your own definitions. We made it super easy to define your custom prompt just by writing your criteria in plain English. Best practices:  That's a huuuge question. Let me try to summarize a few of them: - Don't skip the evals! Implementing evals can sound complex, so it's tempting to "ship on vibes". But it’s much easier to start with a simple evaluation pipeline that you iterate on than to try adding evals to your process later on. So, start simple.  - Make curating an evaluation dataset a part of your process. When it comes to offline evals, the metrics are as important as the data you run them on. Preparing a set of representative, realistic inputs (and, ideally, approved outputs) is a high-value activity that should be part of the process. - Log everything. On that note, don’t miss out on capturing real traces of user conversations. You can then use them for testing, to replay new prompts against them, etc. - Start with regression testing. This is low-hanging fruit in evals: every time you change a prompt, re-generate new outputs for a set of representative inputs and see what changed (or have peace of mind that nothing did). This is hugely important for the speed of iteration.  - If you use LLM as a judge, start with binary criteria and measure the quality of your judge. It’s also easier to test alignment this way.
Report
12mo ago
Deepgram Voice Agent API
Deepgram Voice Agent API — Build production-ready voice agents with a unified speech-to-speech API.
Build production-ready voice agents with a unified speech-to-speech API.
Promoted

Evidently AI Launches

Evidently AI
Evidently AI Open-source evaluations and observability for LLM apps

Launched on August 20th, 2024

Evidently AI was ranked #5 of the day for August 20th, 2024
Evidently AI was ranked #5 of the day for August 20th, 2024

Do you use Evidently AI?

4.0
Based on 3 reviews
Review Evidently AI?

Evidently AI is praised for its effectiveness in detecting data drift and alerting users to underperforming models. Users find its analytics and visualizations particularly useful. The tool is highly recommended for those who value observability in their AI pipelines. Overall, Evidently AI is considered a remarkable tool with positive feedback on its performance and utility.

MariyaAntonis StellasMikhail Rozhkov
Summarized with AI
Reviews
Helpful

You might also like

TensorFlow
TensorFlow
An end-to-end open source machine learning platform
Apple
Apple
Think Different
Google Cloud Platform
Google Cloud Platform
A suite of cloud computing services by Google
Microsoft Azure
Microsoft Azure
Optimize your costs by developing in the cloud.
Best of Machine Learning
Best of Machine Learning
A collection of the best resources in Machine Learning & AI
Machine Learning Playground
Machine Learning Playground
Breathtaking visuals for learning ML techniques.
View more
Mariya
Mariya
•13 reviews
It does a good job at detecting data drift and alerting you when your models are underperforming. The analytics and visualisations pretty useful in multiple occasions.
Report
10mo ago
Antonis Stellas
Antonis Stellas
•1 review
If you consider that observability is of great value in your AI pipeline then you should for also for Evidently's tools!
Report
11mo ago
Mikhail Rozhkov
Mikhail Rozhkov
DVC

DVC

•2 reviews
I think that Evidently is a remarkable tool! All the best for your future endeavors!
Report
1yr ago