Blog

Name: unillm
Author: Cognisoc

Notes on the UniLLM internals — the tensor model, the scheduler, the KV cache, and how the 47 architectures collapse into one trait.

2026-05-12
Three cores, one runtime: how UniLLM keeps 47 architectures honest

A walk through TensorCore, ModelCore, and WeightLoaderCore — the three traits that let UniLLM support 47 model families without forking a runtime per architecture.
2026-04-28
RadixAttention plus PagedAttention: the UniLLM KV cache, explained

Why UniLLM's KV cache is hybrid, what RadixAttention and PagedAttention each contribute, and the honest state of integration today.
2026-04-10
Weight loading without the format wars: SafeTensors, GGUF, and PyTorch under one trait

How UniLLM's WeightLoaderCore makes SafeTensors, GGUF, and PyTorch checkpoints interchangeable from a model's point of view, and what dequantization looks like today.