Silia: A Tiny Transformer Architecture for Sub-10M Parameter Models
r/LocalLLaMA top day·10 hours ago·Paper
A student from India shared their first paper on r/LocalLLaMA, proposing Silia, a Transformer architecture for extremely small models. The idea is to merge attention-style dynamic mixing with SwiGLU-like nonlinear transformation, aiming to save parameters in models under roughly 10M parameters. The author frames the work as an early, small-scale exploration, limited by old hardware and restricted access to larger compute.