🤗 Accelerated Inference API¶
Integrate into your apps over 5,000 pre-trained state of the art models, or your own private models, via simple HTTP requests, with 2x to 10x faster inference than out of the box deployment, and scalability built-in.
Looking for the old documentation? It’s here. For an overview of how we optimize models to speed up inference, head over to our blog.
Main features:¶
Leverage 5,000+ Transformer models (T5, Blenderbot, Bart, GPT-2, Pegasus…)
Upload, manage and serve your own models privately
Run Classification, NER, Conversational, Summarization, Translation, Question-Answering, Embeddings Extraction tasks
Get up to 10x inference speedup to reduce user latency
Accelerated inference on CPU and GPU (GPU requires a Startup or Enterprise plan)
Run large models that are challenging to deploy in production
Scale to 1,000 requests per second with automatic scaling built-in
Ship new NLP features faster as new models become available
Build your business on a platform powered by the reference open source project in NLP
Getting Started
- Overview
- Detailed parameters
- Which task is used by this model ?
- Zero-shot classification task
- Translation task
- Summarization task
- Conversational task
- Table question answering task
- Question answering task
- Text-classification task
- Named Entity Recognition (NER) task
- Token-classification task
- Text-generation task
- Text2text-generation task
- Fill mask task
- Parallelism and batch jobs
- More information about the API