Launched this week
Predict, by Recall, is the world’s first ungameable, community-led benchmark for frontier models. Join thousands of AI researchers, developers, and enthusiasts in evaluating and building a benchmark for OpenAI’s upcoming GPT-5 model.
Hey Product Hunt 👋!
I'm Michael, part of the team behind Predict. Super stoked to share our experiment in ungameable, community-led AI benchmarking with you all.
Why we built Predict
We all know the situation with current AI benchmarks. Labs train and optimize their models to them. They're opaque, biased, limited, static, misaligned to users, and misrepresentative of reality. I could go on and on.... Just think back to the last release of Grok 4 and how it supposedly dominated benchmarks, but real users had an entirely underwhelming experience with it.
AI is transforming each of our lives in immeasurable ways, and picking the right model or tool for the job is more important than ever. With the imminent release of OpenAI's GPT-5, we thought now was the perfect time to unveil our project to the world.
Jump into our product and help build the gold standard in AI benchmarking and evaluation by:
1. Predicting performance across domains like coding, research, creativity, ...
2. Submitting new skills or evaluation prompts to test the model
3. Judging subjective traits such as helpfulness, creativity, trustworthiness (soon)
4. Earning rewards for your contributions to the benchmark
Part predictions. Part evals. Full transparency.
Our team believes in building and shipping fast. This is an alpha release and more features will be added over time. Thanks for your support and please let us know if you have any feedback! 🚀
@msena Recall has the potential to become a new benchmark for AI model evaluation, especially with the release of cutting-edge models like GPT-5. It can provide more authentic and diverse performance feedback, which is of positive significance for the development of the AI industry.
@msena I will always support the project, wish the project always success
@msena Congrats to the Predict team! Your approach to building a transparent, community-driven benchmark is super exciting. There's definitely a need for something like this—especially with GPT-5 on the horizon. Wishing you all the best as you grow and evolve the platform! 👏
A few extra details for anyone who wants to peek under the hood or understand the bigger bet we’re making:
1. Benchmarks vs. taste: Simon Willison’s one-liner—“draw a pelican on a bike”—did more to expose multimodal quirks than most formal suites. Meanwhile, Andrej Karpathy points out that random crowds often can’t spot the better answer. Predict is designed to surface who in the crowd consistently does have that taste, then wire their signal into the benchmark loop.
2. Private-until-release evals: Every submitted eval stays sealed until GPT-5 drops.
If you have a weird failure mode or a half-baked eval idea, drop it in. The stranger, the better—we want tests that a fine-tuned model hasn’t already memorized.
@andrewxhill Love this approach. benchmarks should evolve beyond static datasets and embrace the 'wisdom of the tuned crowd.' Simon’s pelican example proves that edge cases reveal more than polished test suites. Excited to see how Predict surfaces the signal in the noise!
"Wow, Predict GPT looks like a game-changer for data-driven insights! Excited to see how it empowers users to make smarter predictions. 🚀"