🤗 Accelerated Inference API

Integrate into your apps over 5,000 pre-trained state of the art models, or your own private models, via simple HTTP requests, with 2x to 10x faster inference than out of the box deployment, and scalability built-in.

Looking for the old documentation? It’s here. For an overview of how we optimize models to speed up inference, head over to our blog.

Hugging Face is trusted in production by over 5,000 companies


Main features:

  • Leverage 5,000+ Transformer models (T5, Blenderbot, Bart, GPT-2, Pegasus…)

  • Upload, manage and serve your own models privately

  • Run Classification, NER, Conversational, Summarization, Translation, Question-Answering, Embeddings Extraction tasks

  • Get up to 10x inference speedup to reduce user latency

  • Accelerated inference on CPU and GPU (GPU requires a Startup or Enterprise plan)

  • Run large models that are challenging to deploy in production

  • Scale to 1,000 requests per second with automatic scaling built-in

  • Ship new NLP features faster as new models become available

  • Build your business on a platform powered by the reference open source project in NLP