vllm-project/vllm-omni — project provides an efficient and cost-effective framew

Version	Commit	Size	Downloads	Date
latestLatest	HEAD	9.3 MB	8	1mo ago

Easy, fast, and cheap omni-modality model serving for everyone

Latest News 🔥

[2026/03] We released 0.18.0 - strengthens the core runtime through a large entrypoint refactor and scheduler/runtime cleanups, expands unified quantization and diffusion execution, broadens multimodal model coverage, and improves production readiness across audio, omni, image, video, RL, and multi-platform deployments.
[2026/03] Check out our first public project deepdive at the vLLM Hong Kong Meetup!
[2026/03] vllm-omni-skills is a community-driven collection of AI assistant skills that help developers work with vLLM-Omni more effectively. These skills can be used with popular agentic AI coding assistants like Cursor IDE, Claude, Codex, and more.
[2026/02] We released 0.16.0 - A major alignment + capability release that rebases onto upstream vLLM v0.16.0 and significantly expands performance, distributed execution, and production readiness across Qwen3-Omni / Qwen3-TTS, Bagel, MiMo-Audio, GLM-Image and the Diffusion (DiT) image/video stack—while also improving platform coverage (CUDA / ROCm / NPU / XPU), CI quality, and documentation.
[2026/02] We released 0.14.0 - This is the first stable release of vLLM-Omni that expands Omni’s diffusion / image-video generation and audio / TTS stack, improves distributed execution and memory efficiency, and broadens platform/backend coverage (GPU/ROCm/NPU/XPU). It also brings meaningful upgrades to serving APIs, profiling & benchmarking, and overall stability. Please check our latest paper for architecture design and performance results.
[2026/01] We released 0.12.0rc1 - a major RC milestone focused on maturing the diffusion stack, strengthening OpenAI-compatible serving, expanding omni-model coverage, and improving stability across platforms (GPU/NPU/ROCm).
[2025/11] vLLM community officially released vllm-project/vllm-omni in order to support omni-modality models serving.

vllm-omni

Quick Overview

What is this?

What problem does it solve?

Who should use it?

Pros

Cons

Scores

Trust Score

Maintenance

Popularity

Star History

Snapshot Versions

Alternatives

tensorflow

stable-diffusion-webui

transformers

pytorch

LLMs-from-scratch

opencv

Community Reviews

README

Easy, fast, and cheap omni-modality model serving for everyone

About