Paged Attention
Efficiently manage KV cache using vLLM’s Paged Attention kernels.
Paged Attention
Efficiently manage KV cache using vLLM’s Paged Attention kernels.
Continuous Batching
Continuously batch incoming requests in the Async server.
Hugging Face Integration
Run almost any Hugging Face format LLM seamlessly.
Quantization Support
Support for almost all quantization formats, with optimized kernels for efficient deployment.
OpenAI-compatible API
Quickly deploy models with the integrated OpenAI API, supporting Text/Chat Completions, Vision, and Batch API.
Speculative Decoding
Accelerate inference using various state-of-the-art spec-decoding methods.
Adapters
Deploy hundreds or thousands of LoRAs efficiently using Punica, and PEFT-style Prompt adapters.
Hardware Support
Aphrodite supports NVIDIA & AMD GPUs, Intel XPUs, Google TPUs, AWS Inferentia/Trainium, AVX2/AVX512/ppc64le CPUs.