GPT-4 is the latest milestone in OpenAI’s effort in scaling up deep learning. GPT-4 is a large multimodal model (accepting image and text inputs, emitting text outputs) that exhibits human-level performance on various professional and academic benchmarks.
New OpenAI audio models for developers: gpt-4o powered speech-to-text (more accurate than Whisper) and steerable text-to-speech. Build voice agents, transcriptions, and more.
GPT-4o (“o” for “omni”) is a step towards much more natural human-computer interaction—it accepts as input any combination of text, audio, and image and generates any combination of text, audio, and image outputs.
GPT-4.5 is our most advanced model yet, scaling up unsupervised learning for better pattern recognition, deeper knowledge, and fewer hallucinations. It feels more natural, understands intent better, and excels at writing, programming, and problem-solving.
GPT-4o mini scores 82% on MMLU and currently outperforms GPT-4 on chat preferences. It is priced at 15¢/million input tokens and 60¢/million output tokens, an order of magnitude more affordable than previous frontier models and 60+% cheaper than GPT-3.5 Turbo.
Sora is OpenAI’s video generation model, designed to take text, image, and video inputs and generate a new video as an output. Users can create videos in various formats, generate new content from text, or enhance, remix, and blend their own assets.
Canvas opens in a separate window, allowing you and ChatGPT to collaborate on a project. This early beta introduces a new way of working together—not just through conversation, but by creating and refining ideas side by side.
GPT-4 Turbo is more capable and has knowledge of world events up to April 2023. It has a 128k context window so it can fit the equivalent of more than 300 pages of text in a single prompt.