More information about the API

Rate limits

The current API does not enforce strict rate limitations. Instead we simply balance the loads evenly between all our available ressources, and favoring steady flows of requests. If your account suddenly sends 10k requests then you’re likely to receive 503 errors saying models are loading up. In order to prevent that, you should instead try to start running queries smoothly from 0 to 10k over the course of a few minutes.

Running private models

You can run private models by default ! If you don’t see them on you Hugging Face page please make sure you are logged in. Within the API make sure you include your token, otherwise your model will be declared as non existent.

Running a public model that I do not own

You can. Please check the model card for any licensing issue that might arise, but most public models are delivered by researchers and are usable within commercial products. But please double check.

Finetuning a public model

We currently don’t provide finetuning automatically for any model on your data, but we announced a product in that sense: https://twitter.com/huggingface/status/1341435640458702849

Tracking metrics

This is an area of active improvement. Stay tuned as we release more features to track your API usage !

Running the inference on my infrastructure

Currently we do not provide on premises inference out of the box. If you want to get our Accelerated Inference on your infrastructure you need to contact us at api-inference@huggingface.co