<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"><channel><title>UniLLM blog</title><description>Notes on the UniLLM Rust inference runtime: tensor core, KV cache, scheduler, and the Model trait.</description><link>https://unillm.cognisoc.com/</link><item><title>Three cores, one runtime: how UniLLM keeps 47 architectures honest</title><link>https://unillm.cognisoc.com/blog/three-cores-one-runtime/</link><guid isPermaLink="true">https://unillm.cognisoc.com/blog/three-cores-one-runtime/</guid><description>A walk through TensorCore, ModelCore, and WeightLoaderCore — the three traits that let UniLLM support 47 model families without forking a runtime per architecture.</description><pubDate>Tue, 12 May 2026 00:00:00 GMT</pubDate></item><item><title>RadixAttention plus PagedAttention: the UniLLM KV cache, explained</title><link>https://unillm.cognisoc.com/blog/kv-cache-radix-paged/</link><guid isPermaLink="true">https://unillm.cognisoc.com/blog/kv-cache-radix-paged/</guid><description>Why UniLLM&apos;s KV cache is hybrid, what RadixAttention and PagedAttention each contribute, and the honest state of integration today.</description><pubDate>Tue, 28 Apr 2026 00:00:00 GMT</pubDate></item><item><title>Weight loading without the format wars: SafeTensors, GGUF, and PyTorch under one trait</title><link>https://unillm.cognisoc.com/blog/weight-loading-gguf-safetensors-pytorch/</link><guid isPermaLink="true">https://unillm.cognisoc.com/blog/weight-loading-gguf-safetensors-pytorch/</guid><description>How UniLLM&apos;s WeightLoaderCore makes SafeTensors, GGUF, and PyTorch checkpoints interchangeable from a model&apos;s point of view, and what dequantization looks like today.</description><pubDate>Fri, 10 Apr 2026 00:00:00 GMT</pubDate></item></channel></rss>