Ivonar

A hybrid-architecture large language model that was built entirely from scratch in Germany. Not a fine-tune. Not a wrapper. Original architecture, original code.

Hybrid SSMMixture-of-ExpertsYaRN ContextCustom AdaMuon

model/architecture.py

class IvonarModel(nn.Module):
def __init__(self, config):
super().__init__()
self.embed = YaRNEmbedding(config.v_size, config.d_model)

self.layers = nn.ModuleList()
for i in range(config.n_layers):
if i % 8 == 0:
self.layers.append(MambaLayer(config))
else:
self.layers.append(MLAAttention(config))

self.router = MoERouter(config.n_experts, top_k=2)
self.head = AdaMuonHead(config.d_model, config.v_size)

def forward(self, x):
h = self.embed(x)
for layer in self.layers:
h = layer(h)
return self.head(self.router(h))

Independent architecture,
built for efficient scale.

Ivonar is an independent LLM project from Germany, leveraging Google Cloud's high-performance computing for LLM pre-training while keeping the architecture and infrastructure developed in-house.

The focus is efficient scaling through routing, hybrid processing, and adaptive depth rather than brute-force parameter growth.

View architectureContact

Training stack

Pre-trained from random initialization with a custom architecture and infrastructure stack.

Solo development

Designed and developed independently in Germany with a focused, architecture-first approach.

European operation

Built with EU privacy, deployment, and operational requirements in mind.

Original Architecture

Seven core design decisions make Ivonar different from a standard transformer.

Click any card to read the full explanation

Hybrid Backbone

Core Design

Alternating Attention and State-Space Model layers instead of just one.

Multi-Head Latent Attention (MLA)

Memory Efficiency

Decoupled RoPE and dramatically reduced KV cache.

Mixture-of-Experts (MoE)

Compute Routing

7:1 interleaved Mamba-2 and Attention blocks.

Mixture-of-Recursions (MoR)

Novel

Sparse routing with expert choice ensures parameter efficiency.

YaRN Positional Encoding

Context Extension

Temperature scaling for extreme context window extension.

Custom Optimizer [AdaMuon]

Training

Not Adam. A purpose-built optimizer with orthogonalized gradient updates.

9 Upgrade Modules

Extensibility

Config-flag enhancements across training, inference, and post-training.

End-to-End Pipeline

Infrastructure

From data curation to multi-GPU orchestration.

More with Less

Efficiency is not a compromise, it is the fundamental design principle.

Smart Routing

Not every token needs every parameter. MoE and MoR ensure compute goes precisely where it matters, eliminating waste.

Hybrid Processing

SSM layers process sequential context at a fraction of attention's cost. The 7:1 hybrid rhythm balances quality and speed.

Compressed Attention

MLA reduces memory footprint without sacrificing capability. Longer contexts at lower cost is a key differentiator.

Adaptive Depth

MoR lets the model think harder on hard problems and breeze through easy ones. Compute always matches actual complexity.

Traditional Transformer

All params, every token

100% compute utilization always

Ivonar

Smart routing, adaptive compute

~20% active params per token

What's Next

Phase 1Completed

Architecture Design & Implementation

Core hybrid architecture, MoE, MoR, MLA, YaRN, and custom AdaMuon optimizer are all implemented independently.

Phase 2In Training

Mini Training & Validation

Training the compact model locally to validate the full architecture end-to-end before scaling.

Mini
Phase 3Upcoming

Medium Model Training

Scaling to intermediate capacity for general-purpose language understanding and generation.

Medium
Phase 4Upcoming

High Model Training

Pushing to large scale for research-grade performance and benchmark evaluation.

High
Phase 5Upcoming

Ultra Model Training

Frontier scale training run across multiple nodes to maximize capability.

Ultra