Physical AI
Develop world foundation models to advance physical AI.
Overview
NVIDIA Cosmos™ is a platform of state-of-the-art generative world foundation models (WFMs), guardrails, and an accelerated data processing and curation pipeline. Developers use Cosmos to accelerate physical AI development for autonomous vehicles (AVs), robots, and video analytics AI agents.
Open Models
Pretrained multimodal generative models that developers can use out-of-the-box for world generation or reasoning, or post-train to develop specialized physical AI models.
A state-of-the-art world state prediction model that can generate up to 30 seconds of continuous video from multimodal inputs with superior speed, fidelity and prompt adherence.
Multicontrol model to scale a single simulation or spatial video quickly across various environments and lighting conditions.
Accelerate 3D inputs from physical AI simulation frameworks, like CARLA or NVIDIA Isaac Sim™, to enable fully controllable data augmentation and synthetic data generation pipelines.
Fully customizable, reasoning vision language model (VLM) that excels in understanding the physical world like humans using structured reasoning on videos and images.
Built to power video analytics AI agents at run-time with spatiotemporal understanding of city and industrial operations, curate training data for robotics and autonomous vehicles (AV), and robot decision-making.
NVIDIA Cosmos Curator is a framework enabling developers to quickly filter, annotate, and deduplicate large amounts of sensor data necessary for physical AI development, creating tailored datasets to meet model needs.
Speed up efficient dataset processing and generation.
Use Cases
Use Cosmos world foundation models to simulate, reason, and generate data for downstream pipelines in robotics, autonomous vehicles, and industrial vision systems.
Robots need vast, diverse training data to effectively perceive and interact with their environments. With Cosmos WFMs, developers can generate controllable, high-fidelity synthetic data to train robot perception and policy models.
Diverse, high-fidelity sensor data is critical for safely training, testing, and validating autonomous vehicles. With Cosmos WFMs post-trained on vehicle data, developers can amplify existing data diversity with new weather, lighting, and geolocations, or expand into multi-sensor views—saving significant time and cost.
These AI agents can analyze, summarize, and interact with real-time or recorded video streams to enhance automation, safety, and operational efficiency across industrial and urban environments.
Cosmos Reason is a customizable vision language model (VLM) that powers video analytics AI agents with advanced visual understanding and spatial-temporal reasoning of the physical world. These AI agents deliver real-time question-answering, rapid alerts, and rich contextual insights—powering smarter, more responsive systems across edge and cloud deployments.
Our Commitment
Cosmos models, guardrails, and tokenizers are available on Hugging Face and GitHub, with resources to tackle data scarcity in training physical AI models. We're committed to driving Cosmos forward— transparent, open, and built for all.
AI Infrastructure
NVIDIA RTX PRO 6000 Blackwell Series Servers accelerate physical AI development for robots, autonomous vehicles, and AI agents across training, synthetic data generation, simulation, and inference.
Unlock peak performance for Cosmos world foundation models on NVIDIA Blackwell GB200 for industrial post-training and inference workloads.
Ecosystem
Model developers from the robotics, autonomous vehicles, and vision AI industries are using Cosmos to accelerate physical AI development.
Resources
Start with documentation. Cosmos WFMs are openly available on Hugging Face with inference and post-training scripts on GitHub. Developers can also use Cosmos tokenizer from /NVIDIA/cosmos-tokenizer on GitHub and Hugging Face.
Cosmos WFMs are available under an NVIDIA Open Model License for all.
PyTorch scripts are openly available for all Cosmos models for post-training. Please read the documentation for a step-by-step guide on post-training.
Yes, you can leverage Cosmos to build from scratch with your preferred foundation model or model architecture. You can start by using NeMo Curator for video data pre-processing. Then compress and decode your data with Cosmos tokenizer. Once you have processed the data, you can train or fine-tune your model using NVIDIA NeMo.
Using NVIDIA NIM™ microservices, you can easily integrate your physical AI models into your applications across cloud, data centers, and workstations.
You can also use NVIDIA DGX Cloud to train AI models and deploy them anywhere at scale.
All three are WFMs with distinct roles:
Cosmos Reason can generate new and diverse text prompts from one starting video for Cosmos Predict, or critique and annotate synthetic data from Predict and Transfer.
Omniverse creates realistic 3D simulations of real-world tasks by using different generative APIs, SDKs, and NVIDIA RTX rendering technology.
Developers can input Omniverse simulations as instruction videos to Cosmos Transfer models to generate controllable photoreal synthetic data.
Together, Omniverse provides the simulation environment before and after training, while Cosmos provides the foundation models to generate video data and train physical AI models.
Learn more about NVIDIA Omniverse.