NVIDIA Reveals Cosmos: AI Model for Real-World Video Prediction and Robotic Planning at GTC 2026
NVIDIA announced Cosmos at its GPU Technology Conference on April 16—a breakthrough video prediction model trained on 5 billion hours of video data that can forecast future frames given a current scene, enabling robots and autonomous systems to "see" what will happen next and plan accordingly. Cosmos generates photorealistic video continuations up to 30 seconds long, spanning everything from urban driving scenarios to industrial assembly tasks to humanoid robot manipulation. The practical application: a self-driving car sees a pedestrian about to step into the crosswalk; Cosmos predicts the pedestrian's trajectory 2 seconds into the future, giving the autonomous system time to brake before the collision risk zone. For robotics: a warehouse robot running Cosmos can predict how stacked boxes will shift as it moves, adjusting its grip or approach mid-task. Cosmos complements NVIDIA's GR00T—a general-purpose foundation model for robotic behavior—positioning NVIDIA's robotics stack as end-to-end: perception (video understanding) + prediction (Cosmos) + action (GR00T execution). Enterprise implications: Boston Dynamics, Tesla, and Intrinsic (Google's robotics company) are all in early trials. Open weights available immediately on Hugging Face under Apache 2.0 license; no licensing fees for commercial deployment. Availability: Cosmos runs on single NVIDIA H100 GPUs, making it accessible to mid-size robotics companies and research labs. The strategic significance: video prediction has been an unsolved problem in computer vision for 20+ years; Cosmos's breakthrough enables a new category of AI applications—every autonomous system that needs to predict future states before acting.
Read original article →