An LPU Inference Engine, with LPU standing for Language Processing Unit™, is a new type of end-to-end processing unit system that provides the fastest inference at ~500 tokens/second.
This alpha demo lets you experience ultra-low latency performance using the foundational LLM, Llama 2 70B (created by Meta AI), running on the Groq LPU™ Inference Engine.