diff --git a/docs/configure-rails/yaml-schema/guardrails-configuration/built-in-guardrails.md b/docs/configure-rails/yaml-schema/guardrails-configuration/built-in-guardrails.md index ca0ac534c..3ef53378c 100644 --- a/docs/configure-rails/yaml-schema/guardrails-configuration/built-in-guardrails.md +++ b/docs/configure-rails/yaml-schema/guardrails-configuration/built-in-guardrails.md @@ -529,6 +529,128 @@ The above is an example prompt that you can use with the *content safety check i The `content safety check input` and `content safety check output` rails executes the [`content_safety_check_input`](../../nemoguardrails/library/content_safety/actions.py) and [`content_safety_check_output`](../../nemoguardrails/library/content_safety/actions.py) actions respectively. +#### Multilingual Refusal Messages + + +When content safety rails block unsafe content, you can configure NeMo Guardrails to automatically detect the user's input language and return refusal messages in that same language. This provides a better user experience for multilingual applications. + +##### Supported Languages + +The multilingual feature supports 9 languages: + +| Language | Code | Default Refusal Message | +|----------|------|-------------------------| +| English | `en` | I'm sorry, I can't respond to that. | +| Spanish | `es` | Lo siento, no puedo responder a eso. | +| Chinese | `zh` | 抱歉,我无法回应。 | +| German | `de` | Es tut mir leid, darauf kann ich nicht antworten. | +| French | `fr` | Je suis désolé, je ne peux pas répondre à cela. | +| Hindi | `hi` | मुझे खेद है, मैं इसका जवाब नहीं दे सकता। | +| Japanese | `ja` | 申し訳ありませんが、それには回答できません。 | +| Arabic | `ar` | عذراً، لا أستطيع الرد على ذلك. | +| Thai | `th` | ขออภัย ฉันไม่สามารถตอบได้ | + +If the detected language is not in this list, English is used as the fallback. + +##### Installation + +To use multilingual refusal messages, install NeMo Guardrails with the `multilingual` extra: + +```bash +pip install nemoguardrails[multilingual] +``` + +##### Usage + +To enable multilingual refusal messages, add the `multilingual` configuration to your `config.yml`: + +```yaml +models: + - type: main + engine: nim + model: meta/llama-3.3-70b-instruct + + - type: content_safety + engine: nim + model: nvidia/llama-3.1-nemotron-safety-guard-8b-v3 + +rails: + config: + content_safety: + multilingual: + enabled: true + + input: + flows: + - content safety check input $model=content_safety + + output: + flows: + - content safety check output $model=content_safety +``` + +##### Custom Refusal Messages + +You can customize the refusal messages for each language: + +```yaml +rails: + config: + content_safety: + multilingual: + enabled: true + refusal_messages: + en: "Sorry, I cannot help with that request." + es: "Lo siento, no puedo ayudar con esa solicitud." + zh: "抱歉,我无法处理该请求。" + # Add other languages as needed +``` + +If a custom message is not provided for a detected language, the built-in default message for that language is used. + + +##### How It Works + +When `multilingual.enabled` is set to `true`: + +1. The `detect_language` action uses the [fast-langdetect](https://github.com/LlmKira/fast-langdetect) library to detect the language of the user's input +2. If the content safety check blocks the input, the refusal message is returned in the detected language +3. Language detection adds minimal latency (~12μs per request) + +##### Cold Start Behavior + +The fast-langdetect library downloads a language detection model on first use: + +| Model | Download Size | [Memory Usage](https://github.com/LlmKira/fast-langdetect?tab=readme-ov-file#memory-note) | First Call Behavior | +|-------|---------------|--------------|---------------------| +| `auto` (default) | 125 MB | ~170-210 MB | Downloads model on first call if not cached | +| `lite` | ~0.9 MB (bundled) | ~45-60 MB | No download, works offline immediately | + +**Default cache location:** + +fast-langdetect stores its downloaded FastText model in a temporary, OS-specific cache directory at `{system_temp_dir}/fasttext-langdetect/`, where `system_temp_dir` is whatever directory your operating system uses for temporary files: + +- **macOS**: A sandboxed temp path such as `/var/folders//T/fasttext-langdetect/` +- **Linux**: The global temp directory `/tmp/fasttext-langdetect/` +- **Windows**: The user's temporary directory, e.g., `C:\Users\\AppData\Local\Temp\fasttext-langdetect\` + +You can override this location via the `FTLANG_CACHE` environment variable. + +**Production considerations:** + +- First API call may take ~10-20 seconds to download and load the full model (network-dependent) +- Subsequent calls use the cached model with ~9-12μs latency +- For container/serverless environments, consider pre-warming during startup or persisting the model cache in your container image + +##### Accuracy + +Language detection accuracy was benchmarked on two datasets: + +| Dataset | Samples | Accuracy | +|---------|---------|----------| +| [papluca/language-identification](https://huggingface.co/datasets/papluca/language-identification) | 40,500 | 99.71% | +| [nvidia/Nemotron-Safety-Guard-Dataset-v3](https://huggingface.co/datasets/nvidia/Nemotron-Safety-Guard-Dataset-v3) | 336,283 | 99.35% | + ### Topic Safety The topic safety feature allows you to define and enforce specific conversation rules and boundaries using NVIDIA's Topic Control model. This model helps ensure that conversations stay within predefined topics and follow specified guidelines. @@ -1168,7 +1290,7 @@ Before you begin, install the `yara-python` package or you can install the NeMo 1. Set your NVIDIA API key as an environment variable: ```console - $ export NVIDIA_API_KEY= + export NVIDIA_API_KEY= ``` 1. Create a configuration directory, such as `config`, and add a `config.yml` file with contents like the following: