ByteDance Open-Sources Bernini, a Unified Framework for AI Video Editing

Original: 字节开源统一框架Bernini：给DiT配个“大模型军师”，AI视频编辑先理解再动手 2026-06-02

Bernini combines multimodal planning with a DiT renderer for more controllable AI video generation and editing.

ByteDance’s commercial technology team has open-sourced Bernini, a unified framework for AI video generation and editing. Its design separates semantic planning from visual rendering: an MLLM-based planner understands text, source videos, images, and video references, then a DiT-based renderer produces the final video. The released Bernini-R includes inference code and weights, while the full planner-enabled version is still being prepared.

Bernini 是 ByteDance 商業化技術團隊開源的 AI 影片生成與編輯框架，文章主軸是把影片創作從「單純照 prompt 生成」推向「先理解需求，再穩定修改畫面」。它的架構把任務拆成兩段：前段由 MLLM-based planner 理解文字指令、來源影片、參考圖片或參考影片，並在 ViT embedding space 中預測目標語義表示；後段再由 DiT-based renderer 把語義規劃轉成連續、高品質的影片。對影片編輯任務，renderer 還會結合來源影片的 VAE features，以保留原影片細節與不需修改的區域，避免小幅編輯導致整段影片跑掉。

Full summary

Free shows the 3-line summary; Pro unlocks the full deep summary (~300 words) so you never have to click through.

See Pro plans →

Summaries are AI-generated; the original article is authoritative.