llm-serving

5 repos

Sort by:Most Stars Trending Newest

vllm

vllm-project

A high-throughput and memory-efficient inference and serving engine for LLMs

Featured

amdblackwellcuda

Python80.8K54517.1K34m ago

ray

ray-project

Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.

Featured

data-sciencedeep-learningdeployment

Python42.6K

skypilot

skypilot-org

Run, manage, and scale AI workloads on any AI infrastructure. Use one system to access & manage all AI compute (Kubernetes, Slurm, 20+ clouds, on-prem).

Featured

cloud-computingcloud-managementcost-optimization

Python10.0K261.1K28m ago

BentoML

bentoml

The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!

Featured

ai-inferencedeep-learninggenerative-ai

Python8.7K396516d ago

lorax

predibase

Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs

fine-tuninggptllama

Python3.8K3148d ago