Ivonar

A hybrid-architecture large language model that was built entirely from scratch in Germany. Not a fine-tune. Not a wrapper. Original architecture, original code.

Hybrid SSM•Mixture-of-Experts•YaRN Context•Custom AdaMuon

model/architecture.py

class IvonarModel(nn.Module):
    def __init__(self, config):
        super().__init__()
        self.embed = YaRNEmbedding(config.v_size, config.d_model)

        self.layers = nn.ModuleList()
        for i in range(config.n_layers):
            if i % 8 == 0:
                self.layers.append(MambaLayer(config))
            else:
                self.layers.append(MLAAttention(config))

        self.router = MoERouter(config.n_experts, top_k=2)
        self.head = AdaMuonHead(config.d_model, config.v_size)

    def forward(self, x):
        h = self.embed(x)
        for layer in self.layers:
            h = layer(h)
        return self.head(self.router(h))

About the project

Independent architecture,
built for efficient scale.

Ivonar is an independent LLM project from Germany, leveraging Google Cloud's high-performance computing for LLM pre-training while keeping the architecture and infrastructure developed in-house.

The focus is efficient scaling through routing, hybrid processing, and adaptive depth rather than brute-force parameter growth.

View architecture Contact

Training stack

Pre-trained from random initialization with a custom architecture and infrastructure stack.

Solo development

Designed and developed independently in Germany with a focused, architecture-first approach.

European operation

Built with EU privacy, deployment, and operational requirements in mind.

The Blueprint

Original Architecture

Seven core design decisions make Ivonar different from a standard transformer.

Click any card to read the full explanation

Hybrid Backbone

Core Design

Alternating Attention and State-Space Model layers instead of just one.

Multi-Head Latent Attention (MLA)

Memory Efficiency

Decoupled RoPE and dramatically reduced KV cache.

Mixture-of-Experts (MoE)

Compute Routing

7:1 interleaved Mamba-2 and Attention blocks.

Mixture-of-Recursions (MoR)

Novel

Sparse routing with expert choice ensures parameter efficiency.

YaRN Positional Encoding

Context Extension

Temperature scaling for extreme context window extension.

Custom Optimizer [AdaMuon]

Training

Not Adam. A purpose-built optimizer with orthogonalized gradient updates.

9 Upgrade Modules

Extensibility

Config-flag enhancements across training, inference, and post-training.

End-to-End Pipeline

Infrastructure

From data curation to multi-GPU orchestration.

Design Philosophy

More with Less

Efficiency is not a compromise, it is the fundamental design principle.

Smart Routing

Not every token needs every parameter. MoE and MoR ensure compute goes precisely where it matters, eliminating waste.

Hybrid Processing

SSM layers process sequential context at a fraction of attention's cost. The 7:1 hybrid rhythm balances quality and speed.

Compressed Attention

MLA reduces memory footprint without sacrificing capability. Longer contexts at lower cost is a key differentiator.

Adaptive Depth

MoR lets the model think harder on hard problems and breeze through easy ones. Compute always matches actual complexity.

Traditional Transformer

All params, every token

100% compute utilization always

Ivonar

Smart routing, adaptive compute

~20% active params per token

Development Timeline

What's Next

Phase 1Completed

Architecture Design & Implementation

Core hybrid architecture, MoE, MoR, MLA, YaRN, and custom AdaMuon optimizer are all implemented independently.

Phase 2In Training

Mini Training & Validation

Training the compact model locally to validate the full architecture end-to-end before scaling.

Mini

Phase 3Upcoming

Medium Model Training

Scaling to intermediate capacity for general-purpose language understanding and generation.

Medium

Phase 4Upcoming

High Model Training

Pushing to large scale for research-grade performance and benchmark evaluation.

High

Phase 5Upcoming

Ultra Model Training

Frontier scale training run across multiple nodes to maximize capability.

Ultra

Ivonar

Independent architecture,built for efficient scale.

Training stack

Solo development

European operation

Original Architecture

Hybrid Backbone

Multi-Head Latent Attention (MLA)

Mixture-of-Experts (MoE)

Mixture-of-Recursions (MoR)

YaRN Positional Encoding

Custom Optimizer [AdaMuon]

9 Upgrade Modules

End-to-End Pipeline

More with Less

Smart Routing

Hybrid Processing

Compressed Attention

Adaptive Depth

What's Next

Architecture Design & Implementation

Mini Training & Validation

Medium Model Training

High Model Training

Ultra Model Training

Independent architecture,
built for efficient scale.