Skip to content

Supported Models

Aphrodite supports a large variety of generative Transformer models in Hugging Face Transformers. The following is the list of model architectures that we currently support.

Decoder-only Language Models

ArchitectureExample HF Model
AquilaForCausalLMBAAI/AquilaChat-7B
ArcticForCausalLMSnowflake/snowflake-arctic-instruct
BaiChuanForCausalLMbaichuan-inc/Baichuan2-13B-Chat
BloomForCausalLMbigscience/bloomz
ChatGLMModelTHUDM/chatglm3-6b
CohereForCausalLMCohereForAI/c4ai-command-r-v01
DbrxForCausalLMdatabricks/dbrx-instruct
DeciLMForCausalLMDeciLM/DeciLM-7B
DeepseekForCausalLMdeepseek-ai/deepseek-moe-16b-base
DeepseekV2ForCausalLMdeepseek-ai/DeepSeek-V2.5
ExaoneForCausalLMLGAI-EXAONE/EXAONE-3.0-7.8B-Instruct
FalconForCausalLMtiiuae/falcon-7b
GPT2LMHeadModelgpt2
GPTBigCodeForCausalLMbigcode/starcoder
GPTJForCausalLMpygmalionai/pygmalion-6b
GPTNeoXForCausalLMEleutherAI/pythia-12b
GemmaForCausalLMgoogle/gemma-7b
Gemma2ForCausalLMgoogle/gemma-2-9b
GraniteForCausalLMibm-research/PowerLM-3b
GraniteMoeForCausalLMibm-research/PowerMoE-3b
InternLMForCausalLMinternlm/internlm-7b
InternLM2ForCausalLMinternlm/internlm2-7b
JAISLMHeadModelcore42/jais-13b
JambaForCausalLMai21labs/Jamba-v0.1
LlamaForCausalLMmeta-llama/Meta-Llama-3.1-8B
MPTForCausalLMmosaicml/mpt-7b
MambaForCausalLMstate-spaces/mamba-2.8b-hf
MiniCPMForCausalLMopenbmb/MiniCPM-2B-dpo-bf16
MiniCPM3ForCausalLMopenbmb/MiniCPM3-4B
MistralForCausalLMmistralai/Mistral-7B-v0.1
MixtralForCausalLMmistralai/Mixtral-8x7B-v0.1
NemotronForCausalLMnvidia/Minitron-8B-Base
NVLM_Dnvidia/NVLM-D-72B
OPTForCausalLMfacebook/opt-66b
OLMoForCausalLMallenai/OLMo-7B-hf
OlmoeForCausalLMallenai/OLMoE-1B-7B-0125
OrionForCausalLMOrionStarAI/Orion-14B-Chat
PersimmonForCausalLMadept/persimmon-8b-chat
PhiForCausalLMmicrosoft/phi-2
Phi3ForCausalLMmicrosoft/Phi-3-medium-128k-instruct
Phi3SmallForCausalLMmicrosoft/Phi-3-small-128k-instruct
PhiMoEForCausalLMmicrosoft/Phi-3.5-MoE-instruct
QwenLMHeadModelQwen/Qwen-7B
Qwen2ForCausalLMQwen/Qwen2-72B
Qwen2MoeForCausalLMQwen/Qwen1.5-MoE-A2.7B
Qwen2VLForConditionalGenerationQwen/Qwen2-VL-7B-Instruct
SolarForCausalLMupstage/solar-pro-preview-instruct
StableLmforCausalLMstabilityai/stablelm-3b-4e1t
Starcoder2ForCausalLMbigcode/starcoder2-3b
XverseForCausalLMxverse/XVERSE-65B-Chat

Encoder-Decoder Language Models

ArchitectureExample Model
BartForConditionalGenerationfacebook/bart-large-cnn

Embedding Models

ArchitectureExample Model
MistralModelintfloat/e5-mistral-7b-instruct
Qwen2ForRewardModelQwen/Qwen2.5-Math-RM-72B
Gemma2ModelBAAI/bge-multilingual-gemma2

Multimodal Language Models

ArchitectureSupported ModalitiesExample Model
Blip2ForConditionalGenerationImageSalesforce/blip2-opt-6.7b
ChameleonForConditionalGenerationImagefacebook/chameleon-7b
ChatGLMModelImageTHUDM/chatglm3-6b
FuyuForCausalLMImageadept/fuyu-8b
InternVLChatModelImageOpenGVLab/InternVL2-8B
LlavaForConditionalGenerationImagellava-hf/llava-v1.5-7b-hf
LlavaNextForConditionalGenerationImagellava-hf/llava-v1.6-mistral-7b-hf
LlavaNextVideoForConditionalGenerationVideollava-hf/LLaVA-NeXT-Video-7B-hf
LlavaOnevisionForConditionalGenerationImage, Videollava-hf/llava-onevision-qwen2-7b-ov-hf
MiniCPMVImageopenbmb/MiniCPM-V-2_6
MllamaForConditionalGenerationImagemeta-llama/Llama-3.2-11B-Vision-Instruct
MolmoForCausalLMImageallenai/Molmo-7B-D-0924
PaliGemmaForConditionalGenerationImagegoogle/paligemma-3b-pt-224
Phi3VForCausalLMImagemicrosoft/Phi-3.5-vision-instruct
PixtralForConditionalGenerationImagemistralai/Pixtral-12B-2409
QWenLMHeadModelImageQwen/Qwen-VL
Qwen2VLForConditionalGenerationImageQwen/Qwen2-VL-7B-Instruct
UltravoxModelAudiofixie-ai/ultravox-v0_3

Speculative Models

ArchitectureExample Model
EAGLEModelabhigoyal/vllm-eagle-llama-68m-random
MedusaModelabhigoyal/vllm-medusa-llama-68m-random
MLPSpeculatorPreTrainedModelibm-fms/llama-160m-accelerator

If your model uses any of the architectures above, you can seamlessly run your model with Aphrodite.