Independent architecture,
built for efficient scale.
Ivonar is an independent LLM project from Germany, leveraging Google Cloud's high-performance computing for LLM pre-training while keeping the architecture and infrastructure developed in-house.
The focus is efficient scaling through routing, hybrid processing, and adaptive depth rather than brute-force parameter growth.
Training stack
Pre-trained from random initialization with a custom architecture and infrastructure stack.
Solo development
Designed and developed independently in Germany with a focused, architecture-first approach.
European operation
Built with EU privacy, deployment, and operational requirements in mind.
Original Architecture
Seven core design decisions make Ivonar different from a standard transformer.
Click any card to read the full explanation
Hybrid Backbone
Core DesignAlternating Attention and State-Space Model layers instead of just one.
Multi-Head Latent Attention (MLA)
Memory EfficiencyDecoupled RoPE and dramatically reduced KV cache.
Mixture-of-Experts (MoE)
Compute Routing7:1 interleaved Mamba-2 and Attention blocks.
Mixture-of-Recursions (MoR)
NovelSparse routing with expert choice ensures parameter efficiency.
YaRN Positional Encoding
Context ExtensionTemperature scaling for extreme context window extension.
Custom Optimizer [AdaMuon]
TrainingNot Adam. A purpose-built optimizer with orthogonalized gradient updates.
9 Upgrade Modules
ExtensibilityConfig-flag enhancements across training, inference, and post-training.
End-to-End Pipeline
InfrastructureFrom data curation to multi-GPU orchestration.
More with Less
Efficiency is not a compromise, it is the fundamental design principle.
Smart Routing
Not every token needs every parameter. MoE and MoR ensure compute goes precisely where it matters, eliminating waste.
Hybrid Processing
SSM layers process sequential context at a fraction of attention's cost. The 7:1 hybrid rhythm balances quality and speed.
Compressed Attention
MLA reduces memory footprint without sacrificing capability. Longer contexts at lower cost is a key differentiator.
Adaptive Depth
MoR lets the model think harder on hard problems and breeze through easy ones. Compute always matches actual complexity.
What's Next
Architecture Design & Implementation
Core hybrid architecture, MoE, MoR, MLA, YaRN, and custom AdaMuon optimizer are all implemented independently.
Mini Training & Validation
Training the compact model locally to validate the full architecture end-to-end before scaling.
Medium Model Training
Scaling to intermediate capacity for general-purpose language understanding and generation.
High Model Training
Pushing to large scale for research-grade performance and benchmark evaluation.
Ultra Model Training
Frontier scale training run across multiple nodes to maximize capability.