.. _detailed_parameters: Detailed parameters ==================================================================================================== Which task is used by this model ? ---------------------------------------------------------------------------------------------------- In general the 🤗 Hosted API Inference accepts a simple string as an input. However, more advanced usage depends on the "task" that the model solves. The "task" of a model is defined here on it's model page: .. image:: _static/images/task.png :width: 300 Zero-shot classification task ---------------------------------------------------------------------------------------------------- This task is a super useful to try it out classification with zero code, you simply pass a sentence/paragraph and the possible labels for that sentence and you get a result. .. seealso:: **Recommended model**: `facebook/bart-large-mnli `_. Request: .. only:: python .. literalinclude:: ../../../tests/documentation/test_inference.py :language: python :start-after: START zero_shot_inference :end-before: END zero_shot_inference :dedent: 8 .. only:: curl .. literalinclude:: ../../../tests/documentation/test_inference.py :language: bash :start-after: START curl_zero_shot_inference :end-before: END curl_zero_shot_inference :dedent: 8 When sending your request, you should send a JSON encoded payload. Here are all the options .. list-table:: All parameters :widths: 10 20 * - **inputs** :red:`(required)` - a string or list of strings * - **parameters** :red:`(required)` - a dict containing the following keys: * - \- *candidate_labels* :red:`(required)` - a list of strings that are potential classes for :obj:`inputs`. * - \- *multi_class* - (Default: :obj:`false`) Boolean that is set to True if classes can overlap * - **options** - a dict containing the following keys: * - \- *use_gpu* - (Default: :obj:`false`). Boolean to use GPU instead of CPU for inference (requires Startup plan at least) * - \- *use_cache* - (Default: :obj:`true`). Boolean. There is a cache layer on the inference API to speedup requests we have already seen. Most models can use those results as is as models are deterministic (meaning the results will be the same anyway). However if you use a non deterministic model, you can set this parameter to prevent the caching mechanism from being used resulting in a real new query. * - \- *wait_for_model* - (Default: :obj:`false`) Boolean. If the model is not ready, wait for it instead of receiving 503. It limits the number of requests required to get your inference done. It is advised to only set this flag to true after receiving a 503 error as it will limit hanging in your application to known places. Return value is either a dict or a list of dicts if you sent a list of inputs Response: .. only:: python .. literalinclude:: ../../../tests/documentation/test_inference.py :language: python :start-after: START zero_shot_inference_answer :end-before: END zero_shot_inference_answer :dedent: 8 .. list-table:: Returned values :widths: 10 20 * - **sequence** - The string sent as an input * - **labels** - The list of strings for labels that you sent (in order) * - **scores** - a list of floats that correspond the the probability of label, in the same order as :obj:`labels`. Translation task ---------------------------------------------------------------------------------------------------- This task is well known to translate text from one language to another .. seealso:: **Recommended model**: `Helsinki-NLP/opus-mt-ru-en `_. Helsinki-NLP uploaded many models with many language pairs. **Recommended model**: `t5-base `_. Example: .. only:: python .. literalinclude:: ../../../tests/documentation/test_inference.py :language: python :start-after: START translation_inference :end-before: END translation_inference :dedent: 8 .. only:: curl .. literalinclude:: ../../../tests/documentation/test_inference.py :language: bash :start-after: START curl_translation_inference :end-before: END curl_translation_inference :dedent: 8 When sending your request, you should send a JSON encoded payload. Here are all the options .. list-table:: All parameters :widths: 10 20 * - **inputs** :red:`(required)` - a string to be translated in the original languages * - **options** - a dict containing the following keys: * - \- *use_gpu* - (Default: :obj:`false`). Boolean to use GPU instead of CPU for inference (requires Startup plan at least) * - \- *use_cache* - (Default: :obj:`true`). Boolean. There is a cache layer on the inference API to speedup requests we have already seen. Most models can use those results as is as models are deterministic (meaning the results will be the same anyway). However if you use a non deterministic model, you can set this parameter to prevent the caching mechanism from being used resulting in a real new query. * - \- *wait_for_model* - (Default: :obj:`false`) Boolean. If the model is not ready, wait for it instead of receiving 503. It limits the number of requests required to get your inference done. It is advised to only set this flag to true after receiving a 503 error as it will limit hanging in your application to known places. Return value is either a dict or a list of dicts if you sent a list of inputs .. list-table:: Returned values :widths: 10 20 * - **translation_text** - The string after translation Summarization task ---------------------------------------------------------------------------------------------------- This task is well known to summarize text a big text into a small text. Be careful, some models have a maximum length of input. That means that the summary cannot handle full books for instance. Be careful when choosing your model. If you want to discuss you summarization needs, please get in touch api-inference@huggingface.co .. seealso:: **Recommended model**: `facebook/bart-large-cnn `_. Example: .. only:: python .. literalinclude:: ../../../tests/documentation/test_inference.py :language: python :start-after: START summarization_inference :end-before: END summarization_inference :dedent: 8 .. only:: curl .. literalinclude:: ../../../tests/documentation/test_inference.py :language: bash :start-after: START curl_summarization_inference :end-before: END curl_summarization_inference :dedent: 8 When sending your request, you should send a JSON encoded payload. Here are all the options .. list-table:: All parameters :widths: 10 20 * - **inputs** :red:`(required)` - a string to be summarized * - **options** - a dict containing the following keys: * - \- *use_gpu* - (Default: :obj:`false`). Boolean to use GPU instead of CPU for inference (requires Startup plan at least) * - \- *use_cache* - (Default: :obj:`true`). Boolean. There is a cache layer on the inference API to speedup requests we have already seen. Most models can use those results as is as models are deterministic (meaning the results will be the same anyway). However if you use a non deterministic model, you can set this parameter to prevent the caching mechanism from being used resulting in a real new query. * - \- *wait_for_model* - (Default: :obj:`false`) Boolean. If the model is not ready, wait for it instead of receiving 503. It limits the number of requests required to get your inference done. It is advised to only set this flag to true after receiving a 503 error as it will limit hanging in your application to known places. Return value is either a dict or a list of dicts if you sent a list of inputs .. list-table:: Returned values :widths: 10 20 * - **summarization_text** - The string after translation Conversational task ---------------------------------------------------------------------------------------------------- This task corresponds to any chatbot like structure. Models tend to have shorted max_length, so please check with caution when using a given model if you need long range dependency or not. .. seealso:: **Recommended model**: `microsoft/DialoGPT-large `_. Example: .. only:: python .. literalinclude:: ../../../tests/documentation/test_inference.py :language: python :start-after: START conversational_inference :end-before: END conversational_inference :dedent: 8 .. only:: curl .. literalinclude:: ../../../tests/documentation/test_inference.py :language: bash :start-after: START curl_conversational_inference :end-before: END curl_conversational_inference :dedent: 8 When sending your request, you should send a JSON encoded payload. Here are all the options .. list-table:: All parameters :widths: 10 20 * - **inputs** :red:`(required)` - a string to be translated in the original languages * - \- *text* :red:`(required)` - The last input from the user in the conversation. * - \- *generated_responses* - A list of strings corresponding to the earlier replies from the model. * - \- *past_user_inputs* - A list of strings corresponding to the earlier replies from the user. Should be of the same length of :obj:`generated_responses`. * - **options** - a dict containing the following keys: * - \- *use_gpu* - (Default: :obj:`false`). Boolean to use GPU instead of CPU for inference (requires Startup plan at least) * - \- *use_cache* - (Default: :obj:`true`). Boolean. There is a cache layer on the inference API to speedup requests we have already seen. Most models can use those results as is as models are deterministic (meaning the results will be the same anyway). However if you use a non deterministic model, you can set this parameter to prevent the caching mechanism from being used resulting in a real new query. * - \- *wait_for_model* - (Default: :obj:`false`) Boolean. If the model is not ready, wait for it instead of receiving 503. It limits the number of requests required to get your inference done. It is advised to only set this flag to true after receiving a 503 error as it will limit hanging in your application to known places. Return value is either a dict or a list of dicts if you sent a list of inputs .. list-table:: Returned values :widths: 10 20 * - **generated_text** - The answer of the bot * - **conversation** - A facility dictionnary to send back for the next input (with the new user input addition). * - \- *past_user_inputs* - List of strings. The last inputs from the user in the conversation, *after* the model has run. * - \- *generated_responses* - List of strings. The last outputs from the model in the conversation, *after* the model has run. Table question answering task ---------------------------------------------------------------------------------------------------- Don't know SQL ? Don't want to dive in a large spreadsheet ? Ask it questions in plain english ! .. seealso:: **Recommended model**: `google/tapas-base-finetuned-wtq `_. Example: .. only:: python .. literalinclude:: ../../../tests/documentation/test_inference.py :language: python :start-after: START table_question_answering_inference :end-before: END table_question_answering_inference :dedent: 8 .. only:: curl .. literalinclude:: ../../../tests/documentation/test_inference.py :language: bash :start-after: START curl_table_question_answering_inference :end-before: END curl_table_question_answering_inference :dedent: 8 When sending your request, you should send a JSON encoded payload. Here are all the options .. list-table:: All parameters :widths: 10 20 * - **inputs** :red:`(required)` - a string to be translated in the original languages * - \- *query* :red:`(required)` - The query in plain text that you want to ask the table * - \- *table* - A table of data represented as a dict of list where entries are headers and the lists are all the values, all lists must have the same size. * - **options** - a dict containing the following keys: * - \- *use_gpu* - (Default: :obj:`false`). Boolean to use GPU instead of CPU for inference (requires Startup plan at least) * - \- *use_cache* - (Default: :obj:`true`). Boolean. There is a cache layer on the inference API to speedup requests we have already seen. Most models can use those results as is as models are deterministic (meaning the results will be the same anyway). However if you use a non deterministic model, you can set this parameter to prevent the caching mechanism from being used resulting in a real new query. * - \- *wait_for_model* - (Default: :obj:`false`) Boolean. If the model is not ready, wait for it instead of receiving 503. It limits the number of requests required to get your inference done. It is advised to only set this flag to true after receiving a 503 error as it will limit hanging in your application to known places. Return value is either a dict or a list of dicts if you sent a list of inputs .. only:: python .. literalinclude:: ../../../tests/documentation/test_inference.py :language: python :start-after: START table_question_answering_inference_answer :end-before: END table_question_answering_inference_answer :dedent: 8 .. list-table:: Returned values :widths: 10 20 * - **answer** - The plaintext answer * - **coordinates** - a list of coordinates of the cells references in the answer * - **cells** - a list of coordinates of the cells contents * - **aggregator** - The aggregator used to get the answer Question answering task ---------------------------------------------------------------------------------------------------- Want to have a nice know-it-all bot that can answer any questions ? .. seealso:: **Recommended model**: `deepset/roberta-base-squad2 `_. Example: .. only:: python .. literalinclude:: ../../../tests/documentation/test_inference.py :language: python :start-after: START question_answering_inference :end-before: END question_answering_inference :dedent: 8 .. only:: curl .. literalinclude:: ../../../tests/documentation/test_inference.py :language: bash :start-after: START curl_question_answering_inference :end-before: END curl_question_answering_inference :dedent: 8 When sending your request, you should send a JSON encoded payload. Here are all the options .. list-table:: All parameters :widths: 10 20 * - **inputs** :red:`(required)` a dict containing the following keys: - a string to be translated in the original languages * - \- *question* :red:`(required)` - The question as a string that has an answer within :obj:`context`. * - \- *context* :red:`(required)` - A string that contains the answer to the question * - **options** - a dict containing the following keys: * - \- *use_gpu* - (Default: :obj:`false`). Boolean to use GPU instead of CPU for inference (requires Startup plan at least) * - \- *use_cache* - (Default: :obj:`true`). Boolean. There is a cache layer on the inference API to speedup requests we have already seen. Most models can use those results as is as models are deterministic (meaning the results will be the same anyway). However if you use a non deterministic model, you can set this parameter to prevent the caching mechanism from being used resulting in a real new query. * - \- *wait_for_model* - (Default: :obj:`false`) Boolean. If the model is not ready, wait for it instead of receiving 503. It limits the number of requests required to get your inference done. It is advised to only set this flag to true after receiving a 503 error as it will limit hanging in your application to known places. Return value is either a dict or a list of dicts if you sent a list of inputs .. only:: python .. literalinclude:: ../../../tests/documentation/test_inference.py :language: python :start-after: START question_answering_inference_answer :end-before: END question_answering_inference_answer :dedent: 8 .. list-table:: Returned values :widths: 10 20 * - **answer** - A string that's the answer within the text. * - **score** - A floats that represents how likely that the answer is correct * - **start** - The index (string wise) of the start of the answer within :obj:`context`. * - **stop** - The index (string wise) of the stop of the answer within :obj:`context`. Text-classification task ---------------------------------------------------------------------------------------------------- Usually used for sentiment-analysis this will output the likelihood of classes of an input. .. seealso:: **Recommended model**: `distilbert-base-uncased-finetuned-sst-2-english `_ Example: .. only:: python .. literalinclude:: ../../../tests/documentation/test_inference.py :language: python :start-after: START text_classification_inference :end-before: END text_classification_inference :dedent: 8 .. only:: curl .. literalinclude:: ../../../tests/documentation/test_inference.py :language: bash :start-after: START curl_text_classification_inference :end-before: END curl_text_classification_inference :dedent: 8 When sending your request, you should send a JSON encoded payload. Here are all the options .. list-table:: All parameters :widths: 10 20 * - **inputs** :red:`(required)` a dict containing the following keys: - a string to be classified * - **options** - a dict containing the following keys: * - \- *use_gpu* - (Default: :obj:`false`). Boolean to use GPU instead of CPU for inference (requires Startup plan at least) * - \- *use_cache* - (Default: :obj:`true`). Boolean. There is a cache layer on the inference API to speedup requests we have already seen. Most models can use those results as is as models are deterministic (meaning the results will be the same anyway). However if you use a non deterministic model, you can set this parameter to prevent the caching mechanism from being used resulting in a real new query. * - \- *wait_for_model* - (Default: :obj:`false`) Boolean. If the model is not ready, wait for it instead of receiving 503. It limits the number of requests required to get your inference done. It is advised to only set this flag to true after receiving a 503 error as it will limit hanging in your application to known places. Return value is either a dict or a list of dicts if you sent a list of inputs .. only:: python .. literalinclude:: ../../../tests/documentation/test_inference.py :language: python :start-after: START text_classification_inference_answer :end-before: END text_classification_inference_answer :dedent: 8 .. list-table:: Returned values :widths: 10 20 * - **label** - The label for the class (model specific) * - **score** - A floats that represents how likely is that the text belongs the this class. Named Entity Recognition (NER) task ---------------------------------------------------------------------------------------------------- See `Token-classification task`_ Token-classification task ---------------------------------------------------------------------------------------------------- Usually used for sentence parsing, either grammatical, or Named Entity Recognition (NER) to understand keywords contained within text. .. seealso:: **Recommended model**: `dbmdz/bert-large-cased-finetuned-conll03-english `_ Example: .. only:: python .. literalinclude:: ../../../tests/documentation/test_inference.py :language: python :start-after: START token_classification_inference :end-before: END token_classification_inference :dedent: 8 .. only:: curl .. literalinclude:: ../../../tests/documentation/test_inference.py :language: bash :start-after: START curl_token_classification_inference :end-before: END curl_token_classification_inference :dedent: 8 When sending your request, you should send a JSON encoded payload. Here are all the options .. list-table:: All parameters :widths: 10 20 * - **inputs** :red:`(required)` a dict containing the following keys: - a string to be classified * - **options** - a dict containing the following keys: * - \- *use_gpu* - (Default: :obj:`false`). Boolean to use GPU instead of CPU for inference (requires Startup plan at least) * - \- *use_cache* - (Default: :obj:`true`). Boolean. There is a cache layer on the inference API to speedup requests we have already seen. Most models can use those results as is as models are deterministic (meaning the results will be the same anyway). However if you use a non deterministic model, you can set this parameter to prevent the caching mechanism from being used resulting in a real new query. * - \- *wait_for_model* - (Default: :obj:`false`) Boolean. If the model is not ready, wait for it instead of receiving 503. It limits the number of requests required to get your inference done. It is advised to only set this flag to true after receiving a 503 error as it will limit hanging in your application to known places. Return value is either a dict or a list of dicts if you sent a list of inputs .. only:: python .. literalinclude:: ../../../tests/documentation/test_inference.py :language: python :start-after: START token_classification_inference_answer :end-before: END token_classification_inference_answer :dedent: 8 .. list-table:: Returned values :widths: 10 20 * - **entity_group** - The type for the entity being recognized (model specific). * - **score** - How likely the entity was recognized. * - **word** - The string that was captured * - **start** - The offset stringwise where the answer is located. Useful to disambiguate if :obj:`word` occurs multiple times. * - **end** - The offset stringwise where the answer is located. Useful to disambiguate if :obj:`word` occurs multiple times. Text-generation task ---------------------------------------------------------------------------------------------------- Use to continue text from a prompt. This is a very generic task. .. seealso:: **Recommended model**: `gpt2 `_ (it's a simple model, but fun to play with). Example: .. only:: python .. literalinclude:: ../../../tests/documentation/test_inference.py :language: python :start-after: START text_generation_inference :end-before: END text_generation_inference :dedent: 8 .. only:: curl .. literalinclude:: ../../../tests/documentation/test_inference.py :language: bash :start-after: START curl_text_generation_inference :end-before: END curl_text_generation_inference :dedent: 8 When sending your request, you should send a JSON encoded payload. Here are all the options .. list-table:: All parameters :widths: 10 20 * - **inputs** :red:`(required)`: - a string to be generated from * - **parameters** - dict containing the following keys: * - \- *top_k* - (Default: :obj:`None`). Integer to define the top tokens considered within the `sample` operation to create new text. * - \- *top_p* - (Default: :obj:`None`). Float to define the tokens that are within the sample` operation of text generation. Add tokens in the sample for more probable to least probable until the sum of the probabilities is greater than `top_p`. * - \- *temperature* - (Default: :obj:`1.0`). Float (0.0-100.0). The temperature of the sampling operation. 1 means regular sampling, 0 mens `top_k=1`, `100.0` is getting closer to uniform probability. * - \- *repetition_penalty* - (Default: :obj:`None`). Float (0.0-100.0). The more a token is used within generation the more it is penalized to not be picked in successive generation passes. * - \- *num_return_sequences* - (Default: :obj:`1`). Integer. The number of proposition you want to be returned. * - **options** - a dict containing the following keys: * - \- *use_gpu* - (Default: :obj:`false`). Boolean to use GPU instead of CPU for inference (requires Startup plan at least) * - \- *use_cache* - (Default: :obj:`true`). Boolean. There is a cache layer on the inference API to speedup requests we have already seen. Most models can use those results as is as models are deterministic (meaning the results will be the same anyway). However if you use a non deterministic model, you can set this parameter to prevent the caching mechanism from being used resulting in a real new query. * - \- *wait_for_model* - (Default: :obj:`false`) Boolean. If the model is not ready, wait for it instead of receiving 503. It limits the number of requests required to get your inference done. It is advised to only set this flag to true after receiving a 503 error as it will limit hanging in your application to known places. Return value is either a dict or a list of dicts if you sent a list of inputs .. only:: python .. literalinclude:: ../../../tests/documentation/test_inference.py :language: python :start-after: START text_generation_inference_answer :end-before: END text_generation_inference_answer :dedent: 8 .. list-table:: Returned values :widths: 10 20 * - **generated_text** - The continuated string Text2text-generation task ---------------------------------------------------------------------------------------------------- Essentially `Text-generation task`_. But uses Encoder-Decoder architecture, so might change in the future for more options. Fill mask task ---------------------------------------------------------------------------------------------------- Tries to fill in a hole with a missing word (token to be precise). That's the base task for BERT models. .. seealso:: **Recommended model**: `bert-base-uncased `_ (it's a simple model, but fun to play with). Example: .. only:: python .. literalinclude:: ../../../tests/documentation/test_inference.py :language: python :start-after: START fill_mask_inference :end-before: END fill_mask_inference :dedent: 8 .. only:: curl .. literalinclude:: ../../../tests/documentation/test_inference.py :language: bash :start-after: START curl_fill_mask_inference :end-before: END curl_fill_mask_inference :dedent: 8 When sending your request, you should send a JSON encoded payload. Here are all the options .. list-table:: All parameters :widths: 10 20 * - **inputs** :red:`(required)`: - a string to be filled from, must contain the [MASK] token (check model card for exact name of the mask) * - **options** - a dict containing the following keys: * - \- *use_gpu* - (Default: :obj:`false`). Boolean to use GPU instead of CPU for inference (requires Startup plan at least) * - \- *use_cache* - (Default: :obj:`true`). Boolean. There is a cache layer on the inference API to speedup requests we have already seen. Most models can use those results as is as models are deterministic (meaning the results will be the same anyway). However if you use a non deterministic model, you can set this parameter to prevent the caching mechanism from being used resulting in a real new query. * - \- *wait_for_model* - (Default: :obj:`false`) Boolean. If the model is not ready, wait for it instead of receiving 503. It limits the number of requests required to get your inference done. It is advised to only set this flag to true after receiving a 503 error as it will limit hanging in your application to known places. Return value is either a dict or a list of dicts if you sent a list of inputs .. only:: python .. literalinclude:: ../../../tests/documentation/test_inference.py :language: python :start-after: START fill_mask_inference_answer :end-before: END fill_mask_inference_answer :dedent: 8 .. list-table:: Returned values :widths: 10 20 * - **sequence** - The actual sequence of tokens that ran against the model (may contain special tokens) * - **score** - The probability for this token. * - **token** - The id of the token * - **token_str** - The string representation of the token