ModelTC/LightLLM — LightLLM provides a fast, scalable, and lightweight fra

Version	Commit	Size	Downloads	Date
latestLatest	HEAD	4.1 MB	6	1mo ago

visitors

LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance. LightLLM harnesses the strengths of numerous well-regarded open-source implementations, including but not limited to FasterTransformer, TGI, vLLM, and FlashAttention.

English Docs | 中文文档 | Blogs

Tech Blogs

[2025/11] 🚀 Prefix KV Cache Transfer between DP rankers is now supported! Check out the technical deep dive in our blog post.

News

[2025/09] 🔥 LightLLM v1.1.0 release!
[2025/08] Pre $^3$ achieves the outstanding paper award of ACL2025.
[2025/05] LightLLM paper on constrained decoding accepted by ACL2025 (Pre $^3$: Enabling Deterministic Pushdown Automata for Faster Structured LLM Generation). For a more accessible overview of the research with key insights and examples, check out our blog post: LightLLM Blog
[2025/04] LightLLM paper on request scheduler published in ASPLOS’25 (Past-Future Scheduler for LLM Serving under SLA Guarantees)

LightLLM

Quick Overview

What is this?

What problem does it solve?

Who should use it?

Pros

Cons

Scores

Trust Score

Maintenance

Popularity

Star History

Snapshot Versions

Alternatives

tensorflow

stable-diffusion-webui

transformers

pytorch

LLMs-from-scratch

opencv

Community Reviews

README

Tech Blogs

News