Kaiming He's All-Undergrad Team Achieves Text-to-Image With Only 258M Parameters

Original: 全员本科生！何恺明组新作：文生图，258M参数就够了

Kaiming He's group, composed entirely of undergraduates, proposes a compact 258M-parameter text-to-image model.

A new research paper from Kaiming He's lab — notable for having an all-undergraduate team — demonstrates that high-quality text-to-image generation can be achieved with just 258 million parameters. This challenges the prevailing assumption that competitive image synthesis requires multi-billion-parameter models. The work signals a push toward leaner, more accessible generative vision architectures.

A new paper from the research group led by Kaiming He — the computer vision luminary best known for co-inventing ResNet and currently based at MIT CSAIL — proposes a text-to-image generation system that operates with only 258 million parameters. The headline figure is striking: leading open-source and commercial text-to-image models typically run in the range of one to several billion parameters, making a sub-300M system genuinely unusual. The article's title also draws attention to the composition of the team: all members are undergraduates, an uncommon distinction for work appearing at this level of visibility in the AI research community.

Full summary

Free shows the 3-line summary; Pro unlocks the full deep summary (~300 words) so you never have to click through.

See Pro plans →

Summaries are AI-generated; the original article is authoritative.