# UniLLM > A modular LLM inference runtime written in Rust. UniLLM provides a unified, type-safe interface for running large language models across 47 architectures via three composable abstractions: TensorCore, ModelCore, and WeightLoaderCore. UniLLM is an Apache-2.0 licensed open-source project from Cognisoc. The runtime is CPU-only today; GPU acceleration (CUDA, Metal via Candle feature flags) is on the near-term roadmap. LLaMA inference is validated end-to-end with real GGUF weights; the other 46 architectures pass unit tests with dummy tensors and are being progressively validated against real weights. ## Site map - [Home](https://unillm.cognisoc.com/): Overview, quick start, supported models. - [About](https://unillm.cognisoc.com/about): The three-core architecture, current status, audience. - [Blog](https://unillm.cognisoc.com/blog): Notes on internals. - [Compare](https://unillm.cognisoc.com/compare): How UniLLM compares to other Rust LLM runtimes. - [RSS](https://unillm.cognisoc.com/rss.xml) ## Canonical resources - Source: https://github.com/cognisoc/unillm - Docs: https://docs.cognisoc.com/unillm/ - Architecture: https://github.com/cognisoc/unillm/blob/main/docs/ARCHITECTURE.md - Roadmap: https://github.com/cognisoc/unillm/blob/main/docs/ROADMAP.md - API reference: https://github.com/cognisoc/unillm/blob/main/docs/api_reference.md - License: Apache-2.0 ## Blog posts - [Three cores, one runtime](https://unillm.cognisoc.com/blog/three-cores-one-runtime/): how TensorCore, ModelCore, and WeightLoaderCore keep 47 architectures honest. - [RadixAttention plus PagedAttention](https://unillm.cognisoc.com/blog/kv-cache-radix-paged/): the UniLLM hybrid KV cache, explained. - [Weight loading without the format wars](https://unillm.cognisoc.com/blog/weight-loading-gguf-safetensors-pytorch/): SafeTensors, GGUF, and PyTorch under one trait. ## Comparisons - [UniLLM vs Candle](https://unillm.cognisoc.com/compare/unillm-vs-candle/) - [UniLLM vs mistral.rs](https://unillm.cognisoc.com/compare/unillm-vs-mistral-rs/)