Hugging Face BlogJun 17, 2026, 3:26 PM

MolmoMotion: Language-Guided 3D Motion Forecasting

Original: MolmoMotion: Language-guided 3D motion forecasting

Allen AI extends its Molmo multimodal model to predict 3D motion trajectories from natural language descriptions.

Allen Institute for AI has released MolmoMotion, a new model that adds language-guided 3D motion forecasting to the open-source Molmo family. By conditioning spatial trajectory predictions on natural language, the system enables more flexible, human-interpretable motion anticipation. The work targets applications in robotics, video understanding, and embodied AI where predicting movement in 3D space is safety-critical or operationally essential.

MolmoMotion is a new model from the Allen Institute for AI (AI2), published on the Allen AI Hugging Face blog on June 17, 2026. It extends the Molmo family of open-source multimodal models into the domain of 3D motion forecasting — predicting how objects, agents, or scene elements will move through three-dimensional space, conditioned on natural language input.

Full summary

Free shows the 3-line summary; Pro unlocks the full deep summary (~300 words) so you never have to click through.

See Pro plans →

Want the original English / full article?

Read on Hugging Face Blog →

open-source other molmo #3d-motion-forecasting #multimodal #language-grounding #robotics #video-understanding

Summaries are AI-generated; the original article is authoritative.