# UniLLM

> A modular LLM inference runtime written in Rust. UniLLM provides a unified, type-safe interface for running large language models across 47 architectures via three composable abstractions: TensorCore, ModelCore, and WeightLoaderCore.

UniLLM is an Apache-2.0 licensed open-source project from Cognisoc. The runtime is CPU-only today; GPU acceleration (CUDA, Metal via Candle feature flags) is on the near-term roadmap. LLaMA inference is validated end-to-end with real GGUF weights; the other 46 architectures pass unit tests with dummy tensors and are being progressively validated against real weights.

## Site map

- [Home](https://unillm.cognisoc.com/): Overview, quick start, supported models.
- [About](https://unillm.cognisoc.com/about): The three-core architecture, current status, audience.
- [Blog](https://unillm.cognisoc.com/blog): Notes on internals.
- [Compare](https://unillm.cognisoc.com/compare): How UniLLM compares to other Rust LLM runtimes.
- [RSS](https://unillm.cognisoc.com/rss.xml)

## Canonical resources

- Source: https://github.com/cognisoc/unillm
- Docs: https://docs.cognisoc.com/unillm/
- Architecture: https://github.com/cognisoc/unillm/blob/main/docs/ARCHITECTURE.md
- Roadmap: https://github.com/cognisoc/unillm/blob/main/docs/ROADMAP.md
- API reference: https://github.com/cognisoc/unillm/blob/main/docs/api_reference.md
- License: Apache-2.0

## Blog posts

- [Three cores, one runtime](https://unillm.cognisoc.com/blog/three-cores-one-runtime/): how TensorCore, ModelCore, and WeightLoaderCore keep 47 architectures honest.
- [RadixAttention plus PagedAttention](https://unillm.cognisoc.com/blog/kv-cache-radix-paged/): the UniLLM hybrid KV cache, explained.
- [Weight loading without the format wars](https://unillm.cognisoc.com/blog/weight-loading-gguf-safetensors-pytorch/): SafeTensors, GGUF, and PyTorch under one trait.

## Comparisons

- [UniLLM vs Candle](https://unillm.cognisoc.com/compare/unillm-vs-candle/)
- [UniLLM vs mistral.rs](https://unillm.cognisoc.com/compare/unillm-vs-mistral-rs/)