kserve/kserve — Deploy AI models on Kubernetes for inference.

Version	Commit	Size	Downloads	Date
latestLatest	HEAD	38.1 MB	7	1mo ago

KServe

KServe is a standardized distributed generative and predictive AI inference platform for scalable, multi-framework deployment on Kubernetes.

KServe is being used by many organizations and is a Cloud Native Computing Foundation (CNCF) incubating project.

For more details, visit the KServe website.

KServe

Why KServe?

Single platform that unifies Generative and Predictive AI inference on Kubernetes. Simple enough for quick deployments, yet powerful enough to handle enterprise-scale AI workloads with advanced features.

Features

Generative AI

🧮 Optimized Backends: Support for vLLM and llm-d for optimized performance for serving LLMs
📌 Standardization: OpenAI-compatible inference protocol for seamless integration with LLMs
🚅 GPU Acceleration: High-performance serving with GPU support and optimized memory management for large models
💾 Model Caching: Intelligent model caching to reduce loading times and improve response latency for frequently used models
🗂️ KV Cache Offloading: Advanced memory management with KV cache offloading to CPU/disk for handling longer sequences efficiently
📈 Autoscaling: Request-based autoscaling capabilities optimized for generative workload patterns

kserve

Quick Overview

What is this?

What problem does it solve?

Who should use it?

Pros

Cons

Scores

Trust Score

Maintenance

Popularity

Star History

Snapshot Versions

Alternatives

tensorflow

stable-diffusion-webui

transformers

pytorch

LLMs-from-scratch

opencv

Community Reviews

README

KServe

Why KServe?

Features