🤗 Accelerated Inference API¶
Integrate into your apps over 5,000 pre-trained state of the art models, or your own private models, via simple HTTP requests, with 2x to 10x faster inference than out of the box deployment, and scalability built-in.
Looking for the old documentation? It’s here. For an overview of how we optimize models to speed up inference, head over to our blog.
Hugging Face is trusted in production by over 5,000 companies¶

Main features:¶
Leverage 5,000+ Transformer models (T5, Blenderbot, Bart, GPT-2, Pegasus…)
Upload, manage and serve your own models privately
Run Classification, NER, Conversational, Summarization, Translation, Question-Answering, Embeddings Extraction tasks
Get up to 10x inference speedup to reduce user latency
Accelerated inference on CPU and GPU (GPU requires a Startup or Enterprise plan)
Run large models that are challenging to deploy in production
Scale to 1,000 requests per second with automatic scaling built-in
Ship new NLP features faster as new models become available
Build your business on a platform powered by the reference open source project in NLP
Getting Started
- Overview
- Detailed parameters
- Which task is used by this model ?
- Zero-shot classification task
- Translation task
- Summarization task
- Conversational task
- Table question answering task
- Question answering task
- Text-classification task
- Named Entity Recognition (NER) task
- Token-classification task
- Text-generation task
- Text2text-generation task
- Fill mask task
- Automatic speech recognition task
- Parallelism and batch jobs
- Detailed usage and pinned models
- More information about the API