🤗 Accelerated Inference API

Integrate into your apps over 20,000 pre-trained state of the art models, or your own private models, via simple HTTP requests, with 2x to 10x faster inference than out of the box deployment, and scalability built-in.

Hugging Face is trusted in production by over 5,000 companies


Main features:

  • Leverage 20,000+ Transformer models (T5, Blenderbot, Bart, GPT-2, Pegasus…)

  • Upload, manage and serve your own models privately

  • Run Classification, NER, Conversational, Summarization, Translation, Question-Answering, Embeddings Extraction tasks

  • Get up to 10x inference speedup to reduce user latency

  • Accelerated inference on CPU and GPU (GPU requires a Startup or Enterprise plan)

  • Run large models that are challenging to deploy in production

  • Scale to 1,000 requests per second with automatic scaling built-in

  • Ship new NLP features faster as new models become available

  • Build your business on a platform powered by the reference open source project in NLP

If you are looking for custom support from the Hugging Face team