UniLLM blog

UniLLM blogNotes on the UniLLM Rust inference runtime: tensor core, KV cache, scheduler, and the Model trait.https://unillm.cognisoc.com/Three cores, one runtime: how UniLLM keeps 47 architectures honesthttps://unillm.cognisoc.com/blog/three-cores-one-runtime/https://unillm.cognisoc.com/blog/three-cores-one-runtime/A walk through TensorCore, ModelCore, and WeightLoaderCore — the three traits that let UniLLM support 47 model families without forking a runtime per architecture.Tue, 12 May 2026 00:00:00 GMTRadixAttention plus PagedAttention: the UniLLM KV cache, explainedhttps://unillm.cognisoc.com/blog/kv-cache-radix-paged/https://unillm.cognisoc.com/blog/kv-cache-radix-paged/Why UniLLM's KV cache is hybrid, what RadixAttention and PagedAttention each contribute, and the honest state of integration today.Tue, 28 Apr 2026 00:00:00 GMTWeight loading without the format wars: SafeTensors, GGUF, and PyTorch under one traithttps://unillm.cognisoc.com/blog/weight-loading-gguf-safetensors-pytorch/https://unillm.cognisoc.com/blog/weight-loading-gguf-safetensors-pytorch/How UniLLM's WeightLoaderCore makes SafeTensors, GGUF, and PyTorch checkpoints interchangeable from a model's point of view, and what dequantization looks like today.Fri, 10 Apr 2026 00:00:00 GMT