vllm-project
A high-throughput and memory-efficient inference and serving engine for LLMs
ray-project
Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
skypilot-org
Run, manage, and scale AI workloads on any AI infrastructure. Use one system to access & manage all AI compute (Kubernetes, Slurm, 20+ clouds, on-prem).
bentoml
The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!
predibase
Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs