SmolVLA - Powerful robotics VLA that runs on consumer hardware

Ambassador

SmolVLA is a compact (450M) open-source Vision-Language-Action model for robotics. Trained on community data, it runs on consumer hardware & outperforms larger models. Released with code & recipes.

Replies

Best

Zac Zuo

Ambassador

Hunter

📌

Hi everyone! I think there are a few really important ingredients for bringing AI agents into the physical world. First, they need to be able to interact with real environments. Second, due to the limits of on-robot hardware, the models need to be lightweight and efficient. And third, for the good of the community and wider adoption, these foundational models should ideally be open-source. SmolVLA is an exciting new release because it squarely addresses these points. It's a compact (450M) Vision-Language-Action (VLA) model that runs on consumer-grade hardware, is fully open-source, and was trained entirely on open, community-contributed robotics datasets from the LeRobot project. Despite its small size, SmolVLA outperforms much larger VLAs on both simulation and real-world tasks. The team has also implemented things like asynchronous inference to make it even more responsive. This is a fantastic contribution for making capable, real-world robotics research more accessible to everyone.

Report

2mo ago

Giga Chkhikvadze

Hugging Face is doing incredible work! Their open-source model hub—packed with thousands of pre-trained models—makes it a breeze to dive into NLP, vision, and generative AI. I love how the community and APIs make complex AI feel so accessible and fun. Huge kudos to the team for building such a welcoming ecosystem!

Report

2mo ago

Mike Staub

This is incredibly cool.

Report

2mo ago

Supa Liu

SmolVLA is a great example of efficient design meeting real-world usability — compact, open-source, and high-performing. Love that it’s accessible to the broader robotics community right out of the box.

Report

2mo ago

Erliza. P

Hugging Face's 9th launch with SmoIVLA? 🤖🔥 Running powerful robotics VLA (Vision-Language-Action) on consumer hardware is a breakthrough! Must be using heavy quantization techniques or distilled multi-modal models to achieve this. The "View more ->" tease suggests edge-compute optimizations - possibly ROS 2 integration? Game-changer for indie robotics devs!

Report

2mo ago