Quantization SupportSupport for almost all quantization formats, with optimized kernels for efficient deployment.
OpenAI-compatible APIQuickly deploy models with the integrated OpenAI API, supporting Text/Chat Completions, Vision, and Batch API.
AdaptersDeploy hundreds or thousands of LoRAs efficiently using Punica, and PEFT-style Prompt adapters.
Hardware supportAphrodite supports NVIDIA & AMD GPUs, Intel XPUs, Google TPUs, AWS Inferentia/Trainium, AVX2/AVX512/ppc64le CPUs.