Blog
Notes on the UniLLM internals — the tensor model, the scheduler, the KV cache, and how the 47 architectures collapse into one trait.
-
A walk through TensorCore, ModelCore, and WeightLoaderCore — the three traits that let UniLLM support 47 model families without forking a runtime per architecture.
-
Why UniLLM's KV cache is hybrid, what RadixAttention and PagedAttention each contribute, and the honest state of integration today.
-
How UniLLM's WeightLoaderCore makes SafeTensors, GGUF, and PyTorch checkpoints interchangeable from a model's point of view, and what dequantization looks like today.