Aphrodite Engine

Breathing Life Into Language

Features

Paged Attention

Efficiently manage KV cache using vLLM’s Paged Attention kernels.

Continuous Batching

Continuously batch incoming requests in the Async server.

Hugging Face Integration

Run almost any Hugging Face format LLM seamlessly.

Quantization Support

Support for almost all quantization formats, with optimized kernels for efficient deployment.

OpenAI-compatible API

Quickly deploy models with the integrated OpenAI API, supporting Text/Chat Completions, Vision, and Batch API.

Speculative Decoding

Accelerate inference using various state-of-the-art spec-decoding methods.

Adapters

Deploy hundreds or thousands of LoRAs efficiently using Punica, and PEFT-style Prompt adapters.

Hardware Support

Aphrodite supports NVIDIA & AMD GPUs, Intel XPUs, Google TPUs, AWS Inferentia/Trainium, AVX2/AVX512/ppc64le CPUs.