-
Notifications
You must be signed in to change notification settings - Fork 568
feat: Modify endpoints for OpenAPI compatibility #1340
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
feat: Modify endpoints for OpenAPI compatibility #1340
Conversation
e5ac825 to
f494cf2
Compare
|
Updated |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #1340 +/- ##
===========================================
+ Coverage 71.66% 71.70% +0.04%
===========================================
Files 171 171
Lines 17015 17071 +56
===========================================
+ Hits 12193 12240 +47
- Misses 4822 4831 +9
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
|
please don't merge- we want to refactor this a little more |
| default=None, | ||
| description="A state object that should be used to continue the interaction.", | ||
| ) | ||
| # Standard OpenAI completion parameters |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's bring the OpenAI schema into a separate file- perhaps server/schemes/openai
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
@christinaex could you please apply the same approach you've already used for the response schemas to the request schemas as well?
| default=None, | ||
| description="Top-p sampling parameter.", | ||
| ) | ||
| stop: Optional[str] = Field( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stop needs to be Optional[Union[str, List[str]]]
nemoguardrails/server/api.py
Outdated
| index: Optional[int] = Field( | ||
| default=None, description="The index of the choice in the list of choices." | ||
| ) | ||
| messages: Optional[dict] = Field( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: messages needs to be message (no s)
2ec47ee to
96f759f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
This PR adds OpenAI API compatibility to NeMo Guardrails by introducing a new /v1/models endpoint and modifying /v1/chat/completions to return OpenAI-compatible response structures.
Key Changes:
- Created
nemoguardrails/server/schemas/openai.pywith Pydantic models for OpenAI-compatible requests and responses - Modified
RequestBodyto inherit fromOpenAIRequestFields, mappingmodelparameter toconfig_idfor backward compatibility - Updated
ResponseBodyto include OpenAI standard fields (id,object,created,choices) while maintaining NeMo-specific extensions (state,llm_output,log) - Added
/v1/modelsendpoint that lists available guardrails configurations in OpenAI model format - Implemented Colang 2.x state serialization support in
runtime.py - Updated all tests to verify new response structure
Issues Found:
- Critical: OpenAI parameters (
max_tokens,temperature, etc.) are only applied whenthread_idis present due to incorrect indentation (nemoguardrails/server/api.py:460-472) - Syntax Error: Test assertion uses wrong response path in tests/test_threads.py:143
Confidence Score: 2/5
- This PR should NOT be merged until the critical indentation bug is fixed - OpenAI parameters won't work without thread_id
- Score reflects a critical logic error that breaks OpenAI parameter functionality for non-threaded requests, plus a test syntax error that will cause test failures
- Pay immediate attention to nemoguardrails/server/api.py (lines 460-472 need unindenting) and tests/test_threads.py (line 143 needs correction)
Important Files Changed
File Analysis
| Filename | Score | Overview |
|---|---|---|
| nemoguardrails/server/schemas/openai.py | 5/5 | New file defining OpenAI-compatible schemas. Well-structured with proper Pydantic models for requests and responses. |
| nemoguardrails/server/api.py | 2/5 | Adds /v1/models endpoint and modifies /v1/chat/completions for OpenAI compatibility. Critical bug: OpenAI parameters only applied when using thread_id due to incorrect indentation. |
| tests/test_threads.py | 3/5 | Updates thread tests for new response format. Contains syntax error on line 143 with incorrect response path. |
Sequence Diagram
sequenceDiagram
participant Client
participant API as /v1/chat/completions
participant RequestBody
participant DataStore
participant LLMRails
participant ResponseBody
Client->>API: POST with model/config_id + messages
API->>RequestBody: Validate request
RequestBody->>RequestBody: Map model → config_id
RequestBody->>RequestBody: Apply OpenAI parameters
alt thread_id provided
API->>DataStore: Fetch thread messages
DataStore-->>API: Previous messages
API->>API: Prepend thread history
end
alt streaming enabled
API->>LLMRails: generate_async (streaming)
LLMRails-->>Client: StreamingResponse
else non-streaming
API->>LLMRails: generate_async
LLMRails-->>API: GenerationResponse
alt thread_id provided
API->>DataStore: Save updated thread
end
API->>ResponseBody: Build OpenAI response
ResponseBody->>ResponseBody: Add choices, message, id, created
ResponseBody->>ResponseBody: Include NeMo extensions (state, log)
ResponseBody-->>Client: JSON response
end
6 files reviewed, 2 comments
nemoguardrails/server/api.py
Outdated
| generation_options = body.options | ||
| if body.max_tokens: | ||
| generation_options.max_tokens = body.max_tokens | ||
| if body.temperature is not None: | ||
| generation_options.temperature = body.temperature | ||
| if body.top_p is not None: | ||
| generation_options.top_p = body.top_p | ||
| if body.stop: | ||
| generation_options.stop = body.stop | ||
| if body.presence_penalty is not None: | ||
| generation_options.presence_penalty = body.presence_penalty | ||
| if body.frequency_penalty is not None: | ||
| generation_options.frequency_penalty = body.frequency_penalty |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: OpenAI parameters only applied when using thread_id - these lines are incorrectly indented inside the if body.thread_id: block (starts at line 428). Move this block outside and before the streaming check to ensure parameters work for all requests.
| generation_options = body.options | |
| if body.max_tokens: | |
| generation_options.max_tokens = body.max_tokens | |
| if body.temperature is not None: | |
| generation_options.temperature = body.temperature | |
| if body.top_p is not None: | |
| generation_options.top_p = body.top_p | |
| if body.stop: | |
| generation_options.stop = body.stop | |
| if body.presence_penalty is not None: | |
| generation_options.presence_penalty = body.presence_penalty | |
| if body.frequency_penalty is not None: | |
| generation_options.frequency_penalty = body.frequency_penalty | |
| generation_options = body.options | |
| if body.max_tokens: | |
| generation_options.max_tokens = body.max_tokens | |
| if body.temperature is not None: | |
| generation_options.temperature = body.temperature | |
| if body.top_p is not None: | |
| generation_options.top_p = body.top_p | |
| if body.stop: | |
| generation_options.stop = body.stop | |
| if body.presence_penalty is not None: | |
| generation_options.presence_penalty = body.presence_penalty | |
| if body.frequency_penalty is not None: | |
| generation_options.frequency_penalty = body.frequency_penalty | |
| if ( |
tests/test_threads.py
Outdated
| ) | ||
| res = response.json() | ||
| assert res["messages"][0]["content"] == "Hello again!" | ||
| assert res["choices"]["messages"][0]["content"] == "Hello again!" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
syntax: Incorrect response path - should be res["choices"][0]["message"]["content"] to match OpenAI format
| assert res["choices"]["messages"][0]["content"] == "Hello again!" | |
| assert res["choices"][0]["message"]["content"] == "Hello again!" |
96f759f to
b411d61
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
This PR adds OpenAI API compatibility by introducing a new /v1/models endpoint and modifying /v1/chat/completions to return OpenAI-compatible response structures. The changes include new schema definitions, test updates, and enhanced state handling for Colang 2.x.
Critical Issues Found:
- Import/syntax errors: Old
ResponseBodyclass not removed, causing conflicts with the new OpenAI-compatible version and referencing missing imports (BaseModel,GenerationLog) - Logic bugs: OpenAI parameters (
max_tokens,temperature, etc.) are set but never used - both streaming and non-streaming paths passbody.optionsinstead of the modifiedgeneration_options - Test error: Incorrect array indexing in
test_threads.pyline 143
Major Changes:
- Added
/v1/modelsendpoint that lists available guardrails configurations in OpenAI format - Restructured response format to include
choicesarray with OpenAI-compatible structure while maintaining backward compatibility with NeMo-Guardrails fields (state,llm_output, etc.) - Enhanced Colang 2.x runtime to deserialize state from API calls
- Updated all tests to use new response structure
Confidence Score: 1/5
- This PR has critical syntax errors that will prevent the module from loading and logic bugs that break core functionality
- Score reflects three critical issues: (1) syntax errors from conflicting ResponseBody classes and missing imports will cause immediate module load failures, (2) logic bug where OpenAI parameters are set but never used means the API won't work as intended, (3) test syntax error. These must be fixed before merge.
- Pay immediate attention to
nemoguardrails/server/api.py(has syntax and logic errors) andtests/test_threads.py(has syntax error)
Important Files Changed
File Analysis
| Filename | Score | Overview |
|---|---|---|
| nemoguardrails/server/api.py | 1/5 | Critical syntax errors (old ResponseBody class conflicts with import) and logic bugs (OpenAI parameters not applied to generation calls) |
| nemoguardrails/server/schemas/openai.py | 5/5 | New OpenAI schema definitions - well structured with proper field descriptions and backward compatibility fields |
| tests/test_threads.py | 2/5 | Updated to new response format but has incorrect array indexing on line 143 |
Sequence Diagram
sequenceDiagram
participant Client
participant API as FastAPI Server
participant Rails as LLMRails
participant Datastore
participant LLM
Client->>API: POST /v1/chat/completions
Note over API: RequestBody with model/config_id mapping
API->>API: Validate thread_id (if provided)
alt Thread ID provided
API->>Datastore: get("thread-{thread_id}")
Datastore-->>API: Previous messages
API->>API: Prepend thread messages
end
API->>API: Set generation_options from OpenAI params
Note over API: max_tokens, temperature, top_p, etc.
alt Streaming mode
API->>Rails: generate_async(messages, streaming_handler, options)
Rails->>LLM: Generate with streaming
LLM-->>API: Stream tokens
API-->>Client: StreamingResponse
else Non-streaming mode
API->>Rails: generate_async(messages, options, state)
Rails->>LLM: Generate completion
LLM-->>Rails: Response
Rails-->>API: GenerationResponse
alt Thread ID provided
API->>Datastore: set("thread-{thread_id}", updated messages)
end
API->>API: Build OpenAI-compatible ResponseBody
Note over API: Add choices array with message
API-->>Client: ResponseBody (OpenAI format + extensions)
end
Client->>API: GET /v1/models
API->>API: List config directories
API->>API: Convert to OpenAI Model format
API-->>Client: ModelsResponse
Additional Comments (3)
-
nemoguardrails/server/api.py, line 263-281 (link)syntax: Old
ResponseBodyclass should be removed - it conflicts with the new OpenAI-compatibleResponseBodyimported fromopenai.py(line 43). This class also referencesBaseModelandGenerationLogwhich are no longer imported, causingNameErrors when the module loads. -
nemoguardrails/server/api.py, line 538 (link)logic: OpenAI parameters not applied in streaming mode - should use
generation_optionsinstead ofbody.options -
nemoguardrails/server/api.py, line 546 (link)logic: OpenAI parameters not applied - should use
generation_optionsinstead ofbody.options
6 files reviewed, 4 comments
tests/test_threads.py
Outdated
| ) | ||
| res = response.json() | ||
| assert res["messages"][0]["content"] == "Hello again!" | ||
| assert res["choices"]["message"][0]["content"] == "Hello again!" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
syntax: Wrong array index placement - choices is an array, not a dict
| assert res["choices"]["message"][0]["content"] == "Hello again!" | |
| assert res["choices"][0]["message"]["content"] == "Hello again!" |
b411d61 to
8d809d4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
This PR successfully refactors the NeMo Guardrails server to support OpenAI-compatible API endpoints.
Key Changes:
- Adds new
/v1/modelsendpoint that lists available guardrails configurations in OpenAI model format - Refactors
/v1/chat/completionsresponse to use OpenAI'schoicesarray structure withChoiceobjects - Maps OpenAI request fields (
model,max_tokens,temperature,top_p,stop,presence_penalty,frequency_penalty) to internalllm_params - Maintains backward compatibility by including NeMo-specific fields (
state,llm_output,output_data,log) in responses - Enables Colang 2.x state deserialization from dict format for stateful conversations
- Updates all tests to validate new response structure
Architecture:
The PR introduces a clean separation of concerns by creating nemoguardrails/server/schemas/openai.py with OpenAI-specific models, while extending RequestBody to inherit from OpenAIRequestFields. The model field is automatically mapped to config_id via a validator, and OpenAI parameters are converted to llm_params before passing to the LLM generation.
Confidence Score: 5/5
- Safe to merge - well-structured refactoring with comprehensive test coverage and backward compatibility
- The implementation is clean and follows OpenAI's API specification correctly. Previous indentation issues have been resolved. All tests are updated appropriately, and the changes maintain backward compatibility by keeping NeMo-specific fields in responses.
- No files require special attention - all changes are well-implemented
Important Files Changed
File Analysis
| Filename | Score | Overview |
|---|---|---|
| nemoguardrails/server/schemas/openai.py | 5/5 | New file defining OpenAI-compatible schema classes (OpenAIRequestFields, Choice, ResponseBody, Model, ModelsResponse) - well-structured with clear field descriptions |
| nemoguardrails/server/api.py | 4/5 | Adds /v1/models endpoint and refactors /v1/chat/completions to use OpenAI-compatible response format, maps OpenAI parameters to llm_params |
| nemoguardrails/colang/v2_x/runtime/runtime.py | 5/5 | Adds state deserialization support for Colang 2.x from dict format using json_to_state - enables state continuity in API calls |
Sequence Diagram
sequenceDiagram
participant Client
participant API as FastAPI Server
participant RequestBody
participant RailsConfig
participant LLMRails
participant DataStore
participant LLM as Language Model
Client->>API: POST /v1/chat/completions
API->>RequestBody: Validate request (OpenAI params)
RequestBody->>RequestBody: Map model → config_id
RequestBody->>RequestBody: Map OpenAI params → llm_params
alt thread_id provided
API->>DataStore: Fetch thread messages
DataStore-->>API: Return thread history
API->>API: Prepend thread messages
end
API->>RailsConfig: Load config by config_id
RailsConfig-->>API: Return LLMRails instance
alt streaming enabled
API->>LLMRails: generate_async (streaming)
LLMRails->>LLM: Stream tokens
LLM-->>Client: Server-sent events
else non-streaming
API->>LLMRails: generate_async (options with llm_params)
LLMRails->>LLM: Generate with parameters
LLM-->>LLMRails: Response
LLMRails-->>API: GenerationResponse
alt thread_id provided
API->>DataStore: Save updated thread messages
end
API->>API: Build ResponseBody (OpenAI format)
API-->>Client: Return choices with message
end
Client->>API: GET /v1/models
API->>RailsConfig: List available configs
RailsConfig-->>API: config_ids
API->>API: Convert to Model objects
API-->>Client: ModelsResponse with data array
6 files reviewed, no comments
8d809d4 to
59c1644
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
Modified the /v1/chat/completions endpoint and added /v1/models to be OpenAI-compatible while maintaining backward compatibility with NeMo-Guardrails specific features.
Major changes:
- Introduced new OpenAPI schema definitions (
openai.py) withOpenAIRequestFields,ResponseBody,Choice,Model, andModelsResponseclasses - Updated
RequestBodyto inherit fromOpenAIRequestFields, mappingmodelfield toconfig_idfor backward compatibility - Modified response format to match OpenAI spec with
id,object,created,model, andchoicesarray structure - Added
/v1/modelsendpoint that lists available guardrails configurations as OpenAI-compatible models - OpenAI parameters (
max_tokens,temperature,top_p,stop,presence_penalty,frequency_penalty) are now mapped tollm_params - Enhanced Colang 2.x runtime to deserialize state dicts from API calls
- Updated all tests to work with the new response format
- Error responses now return OpenAI-compatible structure with
finish_reason="error"
Backward compatibility:
- NeMo-Guardrails specific fields (
state,llm_output,output_data,log) are preserved in responses - Existing
config_idparameter continues to work alongside the newmodelparameter - The
optionsparameter and all existing functionality remain unchanged
Confidence Score: 5/5
- This PR is safe to merge with excellent test coverage and proper backward compatibility
- The implementation is well-designed with comprehensive test coverage across all endpoints and response formats. The code properly maintains backward compatibility by preserving all existing NeMo-Guardrails fields while adding OpenAI compatibility. The OpenAI parameter mapping is correctly implemented outside the thread_id block (as noted in previous comments). State deserialization is properly handled with appropriate error checking. All tests have been updated to reflect the new response structure.
- No files require special attention
Important Files Changed
File Analysis
| Filename | Score | Overview |
|---|---|---|
| nemoguardrails/server/schemas/openai.py | 5/5 | New file introducing OpenAI-compatible schema definitions for request/response formats, well-structured with proper field descriptions |
| nemoguardrails/server/api.py | 4/5 | Modified to support OpenAI compatibility: added /v1/models endpoint, updated response format to match OpenAI spec, and added parameter mapping |
| nemoguardrails/colang/v2_x/runtime/runtime.py | 5/5 | Added state deserialization from dict format to support API state continuation, properly handles serialized state from API calls |
Sequence Diagram
sequenceDiagram
participant Client
participant API as FastAPI Server
participant RequestBody
participant Rails as LLMRails
participant DataStore
participant Runtime as RuntimeV2_x
Client->>API: POST /v1/chat/completions
Note over Client,API: OpenAI-compatible request<br/>{model, messages, temperature, etc}
API->>RequestBody: Validate & parse request
Note over RequestBody: Map `model` → `config_id`<br/>Map OpenAI params → llm_params
RequestBody->>API: body.options.llm_params populated
alt thread_id provided
API->>DataStore: get(thread-{thread_id})
DataStore-->>API: Previous messages
Note over API: Prepend thread messages
end
alt state provided
API->>Runtime: Pass state dict
Runtime->>Runtime: json_to_state(state["state"])
Note over Runtime: Deserialize state for Colang 2.x
end
alt streaming enabled
API->>Rails: generate_async(messages, options, state)
Rails-->>API: StreamingResponse
API-->>Client: Server-sent events stream
else non-streaming
API->>Rails: generate_async(messages, options, state)
Rails-->>API: GenerationResponse
Note over API: Build OpenAI-compatible response<br/>{id, object, created, model, choices}
alt thread_id provided
API->>DataStore: set(thread-{thread_id}, updated_messages)
end
API-->>Client: ResponseBody with choices array
end
Client->>API: GET /v1/models
API->>API: List config directories
Note over API: Convert configs → Model objects
API-->>Client: ModelsResponse {data: [models]}
6 files reviewed, no comments
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
|
thank you @christinaexyou . I like the direction of making the api feel openai-compatible, but before we can call it fully compatible, a few foundational pieces would be helpful to add: version compatibility and testing
schema alignment
streaming format
documentation and scope
once we have explicit versioning, schema alignment, confirmed streaming format, integration tests, and a clear scope matrix, we’ll be in a solid position to confidently say “openai compatible.” happy to help with any of these pieces. |
Greptile OverviewGreptile SummaryThis PR successfully transforms the NeMo Guardrails API to be OpenAI-compatible by modifying Key Changes:
Issues Already Addressed in Previous Comments:
Confidence Score: 4/5
Important Files ChangedFile Analysis
Sequence DiagramsequenceDiagram
participant Client as OpenAI Client
participant API as FastAPI Server
participant Rails as LLMRails
participant LLM as LLM Provider
Note over Client,LLM: Non-Streaming Flow
Client->>API: POST /v1/chat/completions<br/>{model, messages, temperature, ...}
API->>API: Map model → config_id
API->>API: Set OpenAI params in llm_params<br/>(max_tokens, temperature, etc.)
API->>Rails: _get_rails(config_ids)
Rails-->>API: LLMRails instance
API->>Rails: generate_async(messages, options)
Rails->>LLM: Call LLM with parameters
LLM-->>Rails: Response
Rails-->>API: GenerationResponse
API->>API: Build OpenAI-compatible response<br/>{id, object, created, model, choices[]}
API-->>Client: ResponseBody with choices array
Note over Client,LLM: Streaming Flow
Client->>API: POST /v1/chat/completions<br/>{stream: true, ...}
API->>API: Map model → config_id
API->>API: Set OpenAI params in llm_params
API->>Rails: _get_rails(config_ids)
Rails-->>API: LLMRails instance
API->>API: Create StreamingHandler
API->>Rails: generate_async(messages, streaming_handler)
API-->>Client: StreamingResponse (SSE)
loop For each token
Rails->>LLM: Stream token
LLM-->>Rails: Token chunk
Rails->>API: push_chunk(chunk)
API->>API: _format_streaming_response<br/>Wrap in OpenAI SSE format
API-->>Client: data: {delta, index: 0, ...}\n\n
end
Rails->>API: push_chunk(END_OF_STREAM)
API-->>Client: data: [DONE]\n\n
Note over Client,LLM: Models Endpoint
Client->>API: GET /v1/models
API->>API: List config directories
API->>API: Convert to Model objects<br/>{id, object, created, owned_by}
API-->>Client: ModelsResponse with data array
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
8 files reviewed, 1 comment
908f066 to
f8a4cd2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additional Comments (1)
-
nemoguardrails/server/api.py, line 628 (link)logic: OpenAI parameters not applied in streaming mode - should pass
generation_optionsinstead ofbody.optionsto include the OpenAI parameters set on lines 602-613
8 files reviewed, 1 comment
f8a4cd2 to
971dd70
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
10 files reviewed, no comments
971dd70 to
4639da5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
10 files reviewed, no comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR, just wanted to add on to @Pouyanpi 's comments above. Using the guardrails configs as model fields in the /chat/completions request makes it less convenient to use Guardrails as a drop-in replacement for the Main LLM. If we keep the Main LLM in the /chat/completions POST request model field, the same request can be issued to the Main LLM directly as the guardrails fields are ignored. Or if we set a default configuration on the server (--default-config-id option in poetry run nemoguardrails server) then only the URL of the endpoint needs to change when moving from Main LLM to Guardrails.
I ran local integration tests and couldn't get streaming to work. I used the nemoguards config and added streaming support to the config.yml. I included the config and steps-to-reproduce below:
# config.yml
models:
- type: main
engine: nim
model: meta/llama-3.3-70b-instruct
- type: content_safety
engine: nim
model: nvidia/llama-3.1-nemoguard-8b-content-safety
- type: topic_control
engine: nim
model: nvidia/llama-3.1-nemoguard-8b-topic-control
streaming: True
rails:
input:
flows:
- content safety check input $model=content_safety
- topic safety check input $model=topic_control
- jailbreak detection model
output:
flows:
- content safety check output $model=content_safety
streaming:
enabled: True
chunk_size: 200
context_size: 50
config:
jailbreak_detection:
nim_base_url: "https://ai.api.nvidia.com"
nim_server_endpoint: "/v1/security/nvidia/nemoguard-jailbreak-detect"
api_key_env_var: NVIDIA_API_KEY
# Server command
$ poetry run nemoguardrails server --config examples/configs
INFO: Started server process [92469]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
I used the curl command below. It looks like the Main LLM generation isn't streaming from the Guardrails logs, and the client only receives a data: [DONE] response:
$ curl -X POST http://0.0.0.0:8000/v1/chat/completions \
-H 'Accept: application/json' \
-H 'Content-Type: application/json' \
-d '{
"model": "nemoguards",
"messages": [
{
"role": "user",
"content": "What can you do for me?"
}
],
"max_tokens": 256,
"stream": false,
"temperature": 1,
"top_p": 1,
"guardrails": {
"config_id": "nemoguards"
},
"stream": true
}'
data: [DONE]
nemoguardrails/server/api.py
Outdated
| # Standard OpenAI completion parameters | ||
| model: Optional[str] = Field( | ||
| default=None, | ||
| description="The model to use for chat completion. Maps to config_id for backward compatibility.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be better to have the model field to match the main model from the Guardrails configuration, rather than a Guardrails config ID. Then the same /chat/completions request can be POSTed to either the main LLM directly or Guardrails without any changes. On build.nvidia.com the guardrails field is ignored, and Guardrails can set a default config ID if the guardrails field isn't included
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the model field to match the main model in the config. If there is no main model defined, it defaults to config_id[0]
| description="A state object that should be used to continue the interaction in the future.", | ||
| ) | ||
| @app.get( | ||
| "/v1/models", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good, to the comment above I recommend including the main model from each Guardrail configuration and adding an extra field to the list of Models. So for example the nemoguards configuration which currently looks like:
{
"object": "list",
"data": [
{
"id": "nemoguards",
"created": 1764625695,
"object": "model",
"owned_by": "nemo-guardrails"
},
...
}
You'd have:
{
"object": "list",
"data": [
{
"id": "meta/llama-3.3-70b-instruct",
"guardrails_config_id": "nemoguards",
"created": 1764625695,
"object": "model",
"owned_by": "nemo-guardrails"
},
...
}
This can be used in a /chat/completions request, where the id is used in the model field, and guardrails_config_id is used in guardrails.config_id
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done - updated id match the model field and extended the existing OpenAI Model API to have a field named guardrail_config_id. If there are no models defined in the config, the id field defaults to config_id[0]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you help me understand the intent here? I’m not sure I follow, doesn’t this effectively bind a model to a config?
As a side note: config_id must always be set, potentially with a configurable default.
| res = response.json() | ||
| assert len(res["messages"]) == 1 | ||
| assert res["messages"][0]["content"] | ||
| # Check OpenAI-compatible response structure |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you update the e2e tests to use the LIVE_TEST_MODE approach to be consistent with the test_llm_params_e2e.py
tests/test_openai_integration.py
Outdated
| models = openai_client.models.list() | ||
|
|
||
| # Verify the response structure matches OpenAI's ModelList | ||
| assert models is not None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Would it be simpler to check against an expected dict than each individual field ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done - we check that the returned Choice response matches an expected dict and check the id and created timestamp separately since they are variable to change
tests/test_openai_integration.py
Outdated
| ) | ||
|
|
||
| # Verify response structure matches OpenAI's ChatCompletion object | ||
| assert response is not None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Would it be clearer to check against an expected dict rather than iterating over each of the attributes?
8bbe93d to
24ad343
Compare
24ad343 to
c37edfb
Compare
|
@Pouyanpi @tgasser-nv thanks so much for your feedback ! By default, the There's quite a lot of commits in this PR that don't necessarily relate to the original feature, so should they be kept as separate commits or do you still want me to squash and rebase ? |
|
Thank you @christinaexyou !
Unfortunately we had a bug which is fixed now. I add some detailed comments . |
| from pydantic import BaseModel, Field | ||
|
|
||
|
|
||
| class ResponseBody(ChatCompletion): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| class ResponseBody(ChatCompletion): | |
| class GuardrailsChatCompletion(ChatCompletion): |
| ) | ||
|
|
||
|
|
||
| class ModelsResponse(BaseModel): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure about this, but let's be consistent:
| class ModelsResponse(BaseModel): | |
| class GuardrailsModels(BaseModel): |
| description="A state object that should be used to continue the interaction.", | ||
| ) | ||
| # Standard OpenAI completion parameters | ||
| model: Optional[str] = Field( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
model MUST be provided. It is a required field. It should be the model name that we currently use in the guardrails configuration. Using the request body, we set the main LLM; other types of models come from the config (config.yml).
Guardrails and OpenAI compatibility with respect to the model field brings some challenges:
-
One is that Guardrails requires an engine in addition to the model.
- There are various possible solutions, like posting model metadata (including
engineandbase_url) to the models endpoint, where this becomes the model ID we use for lookup. This is possible but tricky. - Another option is to set a default engine and make it configurable via an environment variable. Cons: a user cannot experiment with multiple models from multiple providers while controlling for one
config_id. - We could also settle on a global models file (I recommend avoiding this).
- There are various possible solutions, like posting model metadata (including
-
Another challenge is that such models may be hosted in different places, thus requiring different base_urls. For example, I can have 3 different NIMs with 3 different
base_urls. How are we going to address that?
TL;DR: We need a mechanism to obtain <engine, base_url> from a given model name (ID).
What do you think?
| @root_validator(pre=True) | ||
| def ensure_config_id(cls, data: Any) -> Any: | ||
| if isinstance(data, dict): | ||
| if data.get("model") is not None and data.get("config_id") is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what is happening here.
model should never be None and we always override the main model defined in config.yml using the model provided via request body.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you 🥇
Let's do similar for Response schemas.
| state: Optional[dict] = Field(default=None, description="State object for continuing the conversation.") | ||
| llm_output: Optional[dict] = Field(default=None, description="Additional LLM output data.") | ||
| output_data: Optional[dict] = Field(default=None, description="Additional output data.") | ||
| log: Optional[dict] = Field(default=None, description="Generation log data.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
guardrails_config_ids: Optional[List[str]] = Field(default=None, description="The list of configuration ids that were used.")
state: Optional[dict] = Field(default=None, description="State object for continuing the conversation.")
llm_output: Optional[dict] = Field(default=None, description="Contains any additional output coming from the LLM.")
output_data: Optional[dict] = Field(
default=None,
description="The output data, i.e. a dict with the values corresponding to the `output_vars`.",
)
log: Optional[GenerationLog] = Field(default=None, description="Additional logging information.")nit:
re config_ids vs guardrails_config_ids is there any reason that we should explicitly mention guardrails? If not better to rename it to be consistent with the request schema. But I don't mind either name
| f"An internal error has occurred.", | ||
| } | ||
| ] | ||
| id=f"chatcmpl-{uuid.uuid4()}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these sorts of transformation/conversions on request and response fits nicely to a utils module in schemas.
a simple function with
input: response: GenerationResponse, config_ids=None
output: `GuardrailCompletionResponse`
same for streaming.
Also you can include this logic there.
merely a suggestion, what do you think?
| return llm_rails | ||
|
|
||
|
|
||
| async def _format_streaming_response( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for streaming it is better to use stream_async method of LLMRails. You can still iterate on it.
This allows you to support another feature. You can run ouput rails while streaming. To do so you can use/adapt following pieces:
class ErrorDetails(BaseModel):
message: str
type: str
param: str
code: str
class ErrorData(BaseModel):
error: ErrorDetailsfor now define them here later I will include them in llmrails.py and will update this code.
def process_chunk(chunk: str) -> Union[str, ErrorData]:
"""
Processes a single chunk from the stream.
Args:
chunk (str): A single chunk from the stream.
Returns:
Union[str, ErrorData]: ErrorData instance for errors or the original chunk.
"""
try:
validated_data = ErrorData.model_validate_json(chunk)
return validated_data # Return the ErrorData instance directly
except ValidationError:
# Not an error, just a normal token
pass
except json.JSONDecodeError:
# Invalid JSON format, treat as normal token
pass
except Exception as e:
log.warning(f"Unexpected error processing stream chunk: {type(e).__name__}: {str(e)}", extra={"chunk": chunk})then when you are iterating over the stream at the begining of your try/except block:
async for chunk in stream_iter:
processed_chunk = process_chunk(chunk)
if isinstance(processed_chunk, ErrorData):
# Yield the error and stop streaming
yield f"data: {processed_chunk.model_dump_json()}\n\n"
returnThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it the changes you made re broken streaming in NIM engine? or there are other reasons?
I think it won't be necessary if you use stream_async. You can revert it locally when you made the changes to the streaming and let me know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can do this at the end.
we should add an extra dependency called server with these two ( similar to tracing, nvidia etc.)
Description
This PR modifies the existing
/v1/chat/completionsand adds a/v1/modelsendpoint to be OpenAI compatible.Changes:
nemoguardrails/server/api.py- Introduces new classes that correspond to OpenAI's standard chat completion and models APItests/test_api.py,tests/test_server_calls_with_state.py&tests/test_threads- Updates the message content path according to newResponseBodyclass and adds new tests for the/v1/modelsAPIRelated Issue(s)
Checklist