vllm-project
A high-throughput and memory-efficient inference and serving engine for LLMs
bentoml
The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!