feat: Modify endpoints for OpenAPI compatibility #1340

christinaexyou · 2025-08-18T18:02:21Z

Description

This PR modifies the existing /v1/chat/completions and adds a /v1/models endpoint to be OpenAI compatible.

Changes:

nemoguardrails/server/api.py - Introduces new classes that correspond to OpenAI's standard chat completion and models API
tests/test_api.py, tests/test_server_calls_with_state.py & tests/test_threads - Updates the message content path according to new ResponseBody class and adds new tests for the /v1/models API

Related Issue(s)

Checklist

I've read the CONTRIBUTING guidelines.
I've updated the documentation if applicable.
I've added tests if applicable.
@mentions of the person or team responsible for reviewing proposed changes.

christinaexyou · 2025-09-10T16:44:47Z

Updated nemoguardrails/colang/v2_x/runtime/runtime.py and tests/test_server_calls_with_state.py because tests were failing due to Colang 2.x not supporting assistant messages as input. Fixed by adding logic to convert State object to dict and passing in state instead of assistant cc: @Pouyanpi @tgasser-nv

codecov-commenter · 2025-09-10T17:38:51Z

Codecov Report

❌ Patch coverage is 81.42857% with 13 lines in your changes missing coverage. Please review.
✅ Project coverage is 71.70%. Comparing base (eb29437) to head (f494cf2).

Files with missing lines	Patch %	Lines
nemoguardrails/server/api.py	84.84%	10 Missing ⚠️
nemoguardrails/colang/v2_x/runtime/runtime.py	25.00%	3 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##           develop    #1340      +/-   ##
===========================================
+ Coverage    71.66%   71.70%   +0.04%     
===========================================
  Files          171      171              
  Lines        17015    17071      +56     
===========================================
+ Hits         12193    12240      +47     
- Misses        4822     4831       +9

Flag	Coverage Δ
python	`71.70% <81.42%> (+0.04%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
nemoguardrails/colang/v2_x/runtime/runtime.py	`73.84% <25.00%> (-0.41%)`	⬇️
nemoguardrails/server/api.py	`64.58% <84.84%> (+5.00%)`	⬆️

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

RobGeada · 2025-10-16T14:03:01Z

please don't merge- we want to refactor this a little more

RobGeada · 2025-10-21T08:27:53Z

nemoguardrails/server/api.py

        default=None,
        description="A state object that should be used to continue the interaction.",
    )
+    # Standard OpenAI completion parameters


Let's bring the OpenAI schema into a separate file- perhaps server/schemes/openai

+1

@christinaex could you please apply the same approach you've already used for the response schemas to the request schemas as well?

RobGeada · 2025-10-21T08:30:57Z

nemoguardrails/server/api.py

+        default=None,
+        description="Top-p sampling parameter.",
+    )
+    stop: Optional[str] = Field(


stop needs to be Optional[Union[str, List[str]]]

RobGeada · 2025-10-21T08:31:51Z

nemoguardrails/server/api.py

+    index: Optional[int] = Field(
+        default=None, description="The index of the choice in the list of choices."
+    )
+    messages: Optional[dict] = Field(


typo: messages needs to be message (no s)

greptile-apps

Greptile Overview

Greptile Summary

This PR adds OpenAI API compatibility to NeMo Guardrails by introducing a new /v1/models endpoint and modifying /v1/chat/completions to return OpenAI-compatible response structures.

Key Changes:

Created nemoguardrails/server/schemas/openai.py with Pydantic models for OpenAI-compatible requests and responses
Modified RequestBody to inherit from OpenAIRequestFields, mapping model parameter to config_id for backward compatibility
Updated ResponseBody to include OpenAI standard fields (id, object, created, choices) while maintaining NeMo-specific extensions (state, llm_output, log)
Added /v1/models endpoint that lists available guardrails configurations in OpenAI model format
Implemented Colang 2.x state serialization support in runtime.py
Updated all tests to verify new response structure

Issues Found:

Critical: OpenAI parameters (max_tokens, temperature, etc.) are only applied when thread_id is present due to incorrect indentation (nemoguardrails/server/api.py:460-472)
Syntax Error: Test assertion uses wrong response path in tests/test_threads.py:143

Confidence Score: 2/5

This PR should NOT be merged until the critical indentation bug is fixed - OpenAI parameters won't work without thread_id
Score reflects a critical logic error that breaks OpenAI parameter functionality for non-threaded requests, plus a test syntax error that will cause test failures
Pay immediate attention to nemoguardrails/server/api.py (lines 460-472 need unindenting) and tests/test_threads.py (line 143 needs correction)

Important Files Changed

File Analysis

Filename	Score	Overview
nemoguardrails/server/schemas/openai.py	5/5	New file defining OpenAI-compatible schemas. Well-structured with proper Pydantic models for requests and responses.
nemoguardrails/server/api.py	2/5	Adds `/v1/models` endpoint and modifies `/v1/chat/completions` for OpenAI compatibility. Critical bug: OpenAI parameters only applied when using thread_id due to incorrect indentation.
tests/test_threads.py	3/5	Updates thread tests for new response format. Contains syntax error on line 143 with incorrect response path.

Sequence Diagram

sequenceDiagram
    participant Client
    participant API as /v1/chat/completions
    participant RequestBody
    participant DataStore
    participant LLMRails
    participant ResponseBody

    Client->>API: POST with model/config_id + messages
    API->>RequestBody: Validate request
    RequestBody->>RequestBody: Map model → config_id
    RequestBody->>RequestBody: Apply OpenAI parameters
    
    alt thread_id provided
        API->>DataStore: Fetch thread messages
        DataStore-->>API: Previous messages
        API->>API: Prepend thread history
    end
    
    alt streaming enabled
        API->>LLMRails: generate_async (streaming)
        LLMRails-->>Client: StreamingResponse
    else non-streaming
        API->>LLMRails: generate_async
        LLMRails-->>API: GenerationResponse
        
        alt thread_id provided
            API->>DataStore: Save updated thread
        end
        
        API->>ResponseBody: Build OpenAI response
        ResponseBody->>ResponseBody: Add choices, message, id, created
        ResponseBody->>ResponseBody: Include NeMo extensions (state, log)
        ResponseBody-->>Client: JSON response
    end

_{6 files reviewed, 2 comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2025-11-04T18:48:03Z

nemoguardrails/server/api.py

+            generation_options = body.options
+            if body.max_tokens:
+                generation_options.max_tokens = body.max_tokens
+            if body.temperature is not None:
+                generation_options.temperature = body.temperature
+            if body.top_p is not None:
+                generation_options.top_p = body.top_p
+            if body.stop:
+                generation_options.stop = body.stop
+            if body.presence_penalty is not None:
+                generation_options.presence_penalty = body.presence_penalty
+            if body.frequency_penalty is not None:
+                generation_options.frequency_penalty = body.frequency_penalty


logic: OpenAI parameters only applied when using thread_id - these lines are incorrectly indented inside the if body.thread_id: block (starts at line 428). Move this block outside and before the streaming check to ensure parameters work for all requests.

Suggested change

generation_options = body.options

if body.max_tokens:

generation_options.max_tokens = body.max_tokens

if body.temperature is not None:

generation_options.temperature = body.temperature

if body.top_p is not None:

generation_options.top_p = body.top_p

if body.stop:

generation_options.stop = body.stop

if body.presence_penalty is not None:

generation_options.presence_penalty = body.presence_penalty

if body.frequency_penalty is not None:

generation_options.frequency_penalty = body.frequency_penalty

generation_options = body.options

if body.max_tokens:

generation_options.max_tokens = body.max_tokens

if body.temperature is not None:

generation_options.temperature = body.temperature

if body.top_p is not None:

generation_options.top_p = body.top_p

if body.stop:

generation_options.stop = body.stop

if body.presence_penalty is not None:

generation_options.presence_penalty = body.presence_penalty

if body.frequency_penalty is not None:

generation_options.frequency_penalty = body.frequency_penalty

if (

greptile-apps · 2025-11-04T18:48:04Z

tests/test_threads.py

    )
    res = response.json()
-    assert res["messages"][0]["content"] == "Hello again!"
+    assert res["choices"]["messages"][0]["content"] == "Hello again!"


syntax: Incorrect response path - should be res["choices"][0]["message"]["content"] to match OpenAI format

Suggested change

assert res["choices"]["messages"][0]["content"] == "Hello again!"

assert res["choices"][0]["message"]["content"] == "Hello again!"

greptile-apps

Greptile Overview

Greptile Summary

This PR adds OpenAI API compatibility by introducing a new /v1/models endpoint and modifying /v1/chat/completions to return OpenAI-compatible response structures. The changes include new schema definitions, test updates, and enhanced state handling for Colang 2.x.

Critical Issues Found:

Import/syntax errors: Old ResponseBody class not removed, causing conflicts with the new OpenAI-compatible version and referencing missing imports (BaseModel, GenerationLog)
Logic bugs: OpenAI parameters (max_tokens, temperature, etc.) are set but never used - both streaming and non-streaming paths pass body.options instead of the modified generation_options
Test error: Incorrect array indexing in test_threads.py line 143

Major Changes:

Added /v1/models endpoint that lists available guardrails configurations in OpenAI format
Restructured response format to include choices array with OpenAI-compatible structure while maintaining backward compatibility with NeMo-Guardrails fields (state, llm_output, etc.)
Enhanced Colang 2.x runtime to deserialize state from API calls
Updated all tests to use new response structure

Confidence Score: 1/5

This PR has critical syntax errors that will prevent the module from loading and logic bugs that break core functionality
Score reflects three critical issues: (1) syntax errors from conflicting ResponseBody classes and missing imports will cause immediate module load failures, (2) logic bug where OpenAI parameters are set but never used means the API won't work as intended, (3) test syntax error. These must be fixed before merge.
Pay immediate attention to nemoguardrails/server/api.py (has syntax and logic errors) and tests/test_threads.py (has syntax error)

Important Files Changed

File Analysis

Filename	Score	Overview
nemoguardrails/server/api.py	1/5	Critical syntax errors (old ResponseBody class conflicts with import) and logic bugs (OpenAI parameters not applied to generation calls)
nemoguardrails/server/schemas/openai.py	5/5	New OpenAI schema definitions - well structured with proper field descriptions and backward compatibility fields
tests/test_threads.py	2/5	Updated to new response format but has incorrect array indexing on line 143

Sequence Diagram

sequenceDiagram
    participant Client
    participant API as FastAPI Server
    participant Rails as LLMRails
    participant Datastore
    participant LLM

    Client->>API: POST /v1/chat/completions
    Note over API: RequestBody with model/config_id mapping
    API->>API: Validate thread_id (if provided)
    
    alt Thread ID provided
        API->>Datastore: get("thread-{thread_id}")
        Datastore-->>API: Previous messages
        API->>API: Prepend thread messages
    end
    
    API->>API: Set generation_options from OpenAI params
    Note over API: max_tokens, temperature, top_p, etc.
    
    alt Streaming mode
        API->>Rails: generate_async(messages, streaming_handler, options)
        Rails->>LLM: Generate with streaming
        LLM-->>API: Stream tokens
        API-->>Client: StreamingResponse
    else Non-streaming mode
        API->>Rails: generate_async(messages, options, state)
        Rails->>LLM: Generate completion
        LLM-->>Rails: Response
        Rails-->>API: GenerationResponse
        
        alt Thread ID provided
            API->>Datastore: set("thread-{thread_id}", updated messages)
        end
        
        API->>API: Build OpenAI-compatible ResponseBody
        Note over API: Add choices array with message
        API-->>Client: ResponseBody (OpenAI format + extensions)
    end
    
    Client->>API: GET /v1/models
    API->>API: List config directories
    API->>API: Convert to OpenAI Model format
    API-->>Client: ModelsResponse

Additional Comments (3)

nemoguardrails/server/api.py, line 263-281 (link)

syntax: Old ResponseBody class should be removed - it conflicts with the new OpenAI-compatible ResponseBody imported from openai.py (line 43). This class also references BaseModel and GenerationLog which are no longer imported, causing NameErrors when the module loads.
nemoguardrails/server/api.py, line 538 (link)

logic: OpenAI parameters not applied in streaming mode - should use generation_options instead of body.options
nemoguardrails/server/api.py, line 546 (link)

logic: OpenAI parameters not applied - should use generation_options instead of body.options

_{6 files reviewed, 4 comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2025-11-04T19:41:00Z

tests/test_threads.py

    )
    res = response.json()
-    assert res["messages"][0]["content"] == "Hello again!"
+    assert res["choices"]["message"][0]["content"] == "Hello again!"


syntax: Wrong array index placement - choices is an array, not a dict

Suggested change

assert res["choices"]["message"][0]["content"] == "Hello again!"

assert res["choices"][0]["message"]["content"] == "Hello again!"

greptile-apps

Greptile Overview

Greptile Summary

This PR successfully refactors the NeMo Guardrails server to support OpenAI-compatible API endpoints.

Key Changes:

Adds new /v1/models endpoint that lists available guardrails configurations in OpenAI model format
Refactors /v1/chat/completions response to use OpenAI's choices array structure with Choice objects
Maps OpenAI request fields (model, max_tokens, temperature, top_p, stop, presence_penalty, frequency_penalty) to internal llm_params
Maintains backward compatibility by including NeMo-specific fields (state, llm_output, output_data, log) in responses
Enables Colang 2.x state deserialization from dict format for stateful conversations
Updates all tests to validate new response structure

Architecture:
The PR introduces a clean separation of concerns by creating nemoguardrails/server/schemas/openai.py with OpenAI-specific models, while extending RequestBody to inherit from OpenAIRequestFields. The model field is automatically mapped to config_id via a validator, and OpenAI parameters are converted to llm_params before passing to the LLM generation.

Confidence Score: 5/5

Safe to merge - well-structured refactoring with comprehensive test coverage and backward compatibility
The implementation is clean and follows OpenAI's API specification correctly. Previous indentation issues have been resolved. All tests are updated appropriately, and the changes maintain backward compatibility by keeping NeMo-specific fields in responses.
No files require special attention - all changes are well-implemented

Important Files Changed

File Analysis

Filename	Score	Overview
nemoguardrails/server/schemas/openai.py	5/5	New file defining OpenAI-compatible schema classes (OpenAIRequestFields, Choice, ResponseBody, Model, ModelsResponse) - well-structured with clear field descriptions
nemoguardrails/server/api.py	4/5	Adds `/v1/models` endpoint and refactors `/v1/chat/completions` to use OpenAI-compatible response format, maps OpenAI parameters to llm_params
nemoguardrails/colang/v2_x/runtime/runtime.py	5/5	Adds state deserialization support for Colang 2.x from dict format using json_to_state - enables state continuity in API calls

Sequence Diagram

sequenceDiagram
    participant Client
    participant API as FastAPI Server
    participant RequestBody
    participant RailsConfig
    participant LLMRails
    participant DataStore
    participant LLM as Language Model

    Client->>API: POST /v1/chat/completions
    API->>RequestBody: Validate request (OpenAI params)
    RequestBody->>RequestBody: Map model → config_id
    RequestBody->>RequestBody: Map OpenAI params → llm_params
    
    alt thread_id provided
        API->>DataStore: Fetch thread messages
        DataStore-->>API: Return thread history
        API->>API: Prepend thread messages
    end
    
    API->>RailsConfig: Load config by config_id
    RailsConfig-->>API: Return LLMRails instance
    
    alt streaming enabled
        API->>LLMRails: generate_async (streaming)
        LLMRails->>LLM: Stream tokens
        LLM-->>Client: Server-sent events
    else non-streaming
        API->>LLMRails: generate_async (options with llm_params)
        LLMRails->>LLM: Generate with parameters
        LLM-->>LLMRails: Response
        LLMRails-->>API: GenerationResponse
        
        alt thread_id provided
            API->>DataStore: Save updated thread messages
        end
        
        API->>API: Build ResponseBody (OpenAI format)
        API-->>Client: Return choices with message
    end
    
    Client->>API: GET /v1/models
    API->>RailsConfig: List available configs
    RailsConfig-->>API: config_ids
    API->>API: Convert to Model objects
    API-->>Client: ModelsResponse with data array

_{6 files reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps

Greptile Overview

Greptile Summary

Modified the /v1/chat/completions endpoint and added /v1/models to be OpenAI-compatible while maintaining backward compatibility with NeMo-Guardrails specific features.

Major changes:

Introduced new OpenAPI schema definitions (openai.py) with OpenAIRequestFields, ResponseBody, Choice, Model, and ModelsResponse classes
Updated RequestBody to inherit from OpenAIRequestFields, mapping model field to config_id for backward compatibility
Modified response format to match OpenAI spec with id, object, created, model, and choices array structure
Added /v1/models endpoint that lists available guardrails configurations as OpenAI-compatible models
OpenAI parameters (max_tokens, temperature, top_p, stop, presence_penalty, frequency_penalty) are now mapped to llm_params
Enhanced Colang 2.x runtime to deserialize state dicts from API calls
Updated all tests to work with the new response format
Error responses now return OpenAI-compatible structure with finish_reason="error"

Backward compatibility:

NeMo-Guardrails specific fields (state, llm_output, output_data, log) are preserved in responses
Existing config_id parameter continues to work alongside the new model parameter
The options parameter and all existing functionality remain unchanged

Confidence Score: 5/5

This PR is safe to merge with excellent test coverage and proper backward compatibility
The implementation is well-designed with comprehensive test coverage across all endpoints and response formats. The code properly maintains backward compatibility by preserving all existing NeMo-Guardrails fields while adding OpenAI compatibility. The OpenAI parameter mapping is correctly implemented outside the thread_id block (as noted in previous comments). State deserialization is properly handled with appropriate error checking. All tests have been updated to reflect the new response structure.
No files require special attention

Important Files Changed

File Analysis

Filename	Score	Overview
nemoguardrails/server/schemas/openai.py	5/5	New file introducing OpenAI-compatible schema definitions for request/response formats, well-structured with proper field descriptions
nemoguardrails/server/api.py	4/5	Modified to support OpenAI compatibility: added `/v1/models` endpoint, updated response format to match OpenAI spec, and added parameter mapping
nemoguardrails/colang/v2_x/runtime/runtime.py	5/5	Added state deserialization from dict format to support API state continuation, properly handles serialized state from API calls

Sequence Diagram

sequenceDiagram
    participant Client
    participant API as FastAPI Server
    participant RequestBody
    participant Rails as LLMRails
    participant DataStore
    participant Runtime as RuntimeV2_x

    Client->>API: POST /v1/chat/completions
    Note over Client,API: OpenAI-compatible request<br/>{model, messages, temperature, etc}
    
    API->>RequestBody: Validate & parse request
    Note over RequestBody: Map `model` → `config_id`<br/>Map OpenAI params → llm_params
    
    RequestBody->>API: body.options.llm_params populated
    
    alt thread_id provided
        API->>DataStore: get(thread-{thread_id})
        DataStore-->>API: Previous messages
        Note over API: Prepend thread messages
    end
    
    alt state provided
        API->>Runtime: Pass state dict
        Runtime->>Runtime: json_to_state(state["state"])
        Note over Runtime: Deserialize state for Colang 2.x
    end
    
    alt streaming enabled
        API->>Rails: generate_async(messages, options, state)
        Rails-->>API: StreamingResponse
        API-->>Client: Server-sent events stream
    else non-streaming
        API->>Rails: generate_async(messages, options, state)
        Rails-->>API: GenerationResponse
        Note over API: Build OpenAI-compatible response<br/>{id, object, created, model, choices}
        
        alt thread_id provided
            API->>DataStore: set(thread-{thread_id}, updated_messages)
        end
        
        API-->>Client: ResponseBody with choices array
    end
    
    Client->>API: GET /v1/models
    API->>API: List config directories
    Note over API: Convert configs → Model objects
    API-->>Client: ModelsResponse {data: [models]}

_{6 files reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

codecov · 2025-11-05T18:42:08Z

Codecov Report

❌ Patch coverage is 84.55285% with 19 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
nemoguardrails/server/api.py	85.71%	13 Missing ⚠️
nemoguardrails/colang/v2_x/runtime/runtime.py	25.00%	3 Missing ⚠️
nemoguardrails/streaming.py	83.33%	2 Missing ⚠️
nemoguardrails/rails/llm/llmrails.py	0.00%	1 Missing ⚠️

📢 Thoughts on this report? Let us know!

Pouyanpi · 2025-11-06T13:55:19Z

thank you @christinaexyou . I like the direction of making the api feel openai-compatible, but before we can call it fully compatible, a few foundational pieces would be helpful to add:

version compatibility and testing

document which openai sdk versions this targets (eg: openai>=1.0.0,<2.0.0) and we need to figure out a way to test againts that matrix, e.g. adding that range to pyproject.toml dev deps so CI tests against it. without explicit version constraints, upstream sdk changes could introduce silent breakage.
add integration tests using a real openai.OpenAI(base_url=...) client to confirm both non-streaming and streaming behave correctly end-to-end. this gives us direct proof of compatibility rather than relying on assumed formats.

schema alignment

the PR currently re defines several schemas (Choice, ResponseBody, etc.). I was thinking if it is possible to extend existing types like openai.types.chat.ChatCompletion instead, and then adding Guardrails specific fields (ex: state, log). this reduces maintenance and keeps us aligned with openai’s evolving schema automatically. But probably there are some concerns that you have avoided it, I'd happy to learn about those.
one concrete example: stop currently only supports a single string, but openai accepts both strings and lists. using their types would surface issues like this early.
for ongoing validation, we could also verify responses against the openapi spec: https://github.com/openai/openai-openapi

streaming format

openai streaming expects SSE with ChatCompletionChunk objects, prefixed with data: and ending in data: [DONE]. reference: https://github.com/openai/openai-python/blob/main/src/openai/types/chat/chat_completion_chunk.py
worth confirming the current streaming response follows this format exactly, since it determines whether client.chat.completions.create(..., stream=True) works out of the box. I know that the streaming functionality was broken on the server.

documentation and scope

adding a feature support matrix would help clarify what’s currently supported (basic chat, streaming) vs not yet (multimodal, function calling, tools, response_format, etc.), and why. that sets clear expectations for users adopting this. (fyi, we support multimodal guardrails and models)

once we have explicit versioning, schema alignment, confirmed streaming format, integration tests, and a clear scope matrix, we’ll be in a solid position to confidently say “openai compatible.” happy to help with any of these pieces.

greptile-apps · 2025-11-11T20:28:08Z

Greptile Overview

Greptile Summary

This PR successfully transforms the NeMo Guardrails API to be OpenAI-compatible by modifying /v1/chat/completions to return responses in OpenAI's choices array format and adding a new /v1/models endpoint.

Key Changes:

Modified response structure from {messages: [...]} to OpenAI-compatible {id, object, created, model, choices: [{index, message, finish_reason}]}
Added /v1/models endpoint that maps guardrails configurations to OpenAI model format
Implemented OpenAI parameter mapping (model → config_id, max_tokens, temperature, etc. → llm_params)
Enhanced streaming to output proper Server-Sent Events format with OpenAI-compatible chunks
Extended ResponseBody to inherit from OpenAI's ChatCompletion while preserving NeMo-specific fields (state, llm_output, etc.)
Updated all tests to validate new response structure
Added comprehensive integration tests using the official OpenAI SDK

Issues Already Addressed in Previous Comments:

OpenAI parameters indentation and application scope
Response path consistency in tests
Index values in streaming chunks

Confidence Score: 4/5

This PR is generally safe to merge with minor concerns around architectural design that should be monitored
Score reflects solid implementation with comprehensive testing and proper OpenAI compatibility, but previous comments identified valid concerns about parameter application scope and response format consistency that have been addressed. The architectural decision to have RequestBody accept OpenAI parameters while using a separate response schema is functional but could benefit from future refinement.
Pay attention to nemoguardrails/server/api.py to verify that previous comments about parameter indentation (line 590-608) and index values have been properly addressed in the implementation

Important Files Changed

File Analysis

Filename	Score	Overview
nemoguardrails/server/api.py	4/5	Added OpenAI-compatible `/v1/models` endpoint and modified `/v1/chat/completions` to return OpenAI format with `choices` array; OpenAI parameters properly set in `llm_params`; streaming response wrapped in SSE format
nemoguardrails/server/schemas/openai.py	5/5	Defines `ResponseBody` extending OpenAI's `ChatCompletion` with NeMo-specific fields (`state`, `llm_output`, etc.) and `ModelsResponse` for models list endpoint
nemoguardrails/streaming.py	5/5	Enhanced `StreamingHandler` to support dict chunks for OpenAI-compatible streaming format, bypassing prefix/suffix processing for structured data
tests/test_api.py	5/5	Comprehensive tests for OpenAI compatibility including `/v1/models`, streaming SSE format, model field mapping, and response structure validation
tests/test_openai_integration.py	5/5	Integration tests using OpenAI SDK client to verify full compatibility with OpenAI API for models, chat completions, and streaming

Sequence Diagram

sequenceDiagram
    participant Client as OpenAI Client
    participant API as FastAPI Server
    participant Rails as LLMRails
    participant LLM as LLM Provider
    
    Note over Client,LLM: Non-Streaming Flow
    
    Client->>API: POST /v1/chat/completions<br/>{model, messages, temperature, ...}
    API->>API: Map model → config_id
    API->>API: Set OpenAI params in llm_params<br/>(max_tokens, temperature, etc.)
    API->>Rails: _get_rails(config_ids)
    Rails-->>API: LLMRails instance
    API->>Rails: generate_async(messages, options)
    Rails->>LLM: Call LLM with parameters
    LLM-->>Rails: Response
    Rails-->>API: GenerationResponse
    API->>API: Build OpenAI-compatible response<br/>{id, object, created, model, choices[]}
    API-->>Client: ResponseBody with choices array
    
    Note over Client,LLM: Streaming Flow
    
    Client->>API: POST /v1/chat/completions<br/>{stream: true, ...}
    API->>API: Map model → config_id
    API->>API: Set OpenAI params in llm_params
    API->>Rails: _get_rails(config_ids)
    Rails-->>API: LLMRails instance
    API->>API: Create StreamingHandler
    API->>Rails: generate_async(messages, streaming_handler)
    API-->>Client: StreamingResponse (SSE)
    
    loop For each token
        Rails->>LLM: Stream token
        LLM-->>Rails: Token chunk
        Rails->>API: push_chunk(chunk)
        API->>API: _format_streaming_response<br/>Wrap in OpenAI SSE format
        API-->>Client: data: {delta, index: 0, ...}\n\n
    end
    
    Rails->>API: push_chunk(END_OF_STREAM)
    API-->>Client: data: [DONE]\n\n
    
    Note over Client,LLM: Models Endpoint
    
    Client->>API: GET /v1/models
    API->>API: List config directories
    API->>API: Convert to Model objects<br/>{id, object, created, owned_by}
    API-->>Client: ModelsResponse with data array

greptile-apps

_{8 files reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

nemoguardrails/server/api.py

greptile-apps

Additional Comments (1)

nemoguardrails/server/api.py, line 628 (link)

logic: OpenAI parameters not applied in streaming mode - should pass generation_options instead of body.options to include the OpenAI parameters set on lines 602-613

_{8 files reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps

_{10 files reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps

_{10 files reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

tgasser-nv

Thanks for the PR, just wanted to add on to @Pouyanpi 's comments above. Using the guardrails configs as model fields in the /chat/completions request makes it less convenient to use Guardrails as a drop-in replacement for the Main LLM. If we keep the Main LLM in the /chat/completions POST request model field, the same request can be issued to the Main LLM directly as the guardrails fields are ignored. Or if we set a default configuration on the server (--default-config-id option in poetry run nemoguardrails server) then only the URL of the endpoint needs to change when moving from Main LLM to Guardrails.

I ran local integration tests and couldn't get streaming to work. I used the nemoguards config and added streaming support to the config.yml. I included the config and steps-to-reproduce below:

# config.yml
models:
  - type: main
    engine: nim
    model: meta/llama-3.3-70b-instruct

  - type: content_safety
    engine: nim
    model: nvidia/llama-3.1-nemoguard-8b-content-safety

  - type: topic_control
    engine: nim
    model: nvidia/llama-3.1-nemoguard-8b-topic-control

streaming: True

rails:
  input:
    flows:
      - content safety check input $model=content_safety
      - topic safety check input $model=topic_control
      - jailbreak detection model

  output:
    flows:
      - content safety check output $model=content_safety
    streaming:
      enabled: True
      chunk_size: 200
      context_size: 50

  config:
    jailbreak_detection:
      nim_base_url: "https://ai.api.nvidia.com"
      nim_server_endpoint: "/v1/security/nvidia/nemoguard-jailbreak-detect"
      api_key_env_var: NVIDIA_API_KEY

# Server command
$ poetry run nemoguardrails server --config examples/configs
INFO:     Started server process [92469]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)

I used the curl command below. It looks like the Main LLM generation isn't streaming from the Guardrails logs, and the client only receives a data: [DONE] response:

$ curl -X POST http://0.0.0.0:8000/v1/chat/completions \
   -H 'Accept: application/json' \
   -H 'Content-Type: application/json' \
   -d '{
      "model": "nemoguards",
      "messages": [
         {
            "role": "user",
            "content": "What can you do for me?"
         }
      ],
      "max_tokens": 256,
      "stream": false,
      "temperature": 1,
      "top_p": 1,
      "guardrails": {
         "config_id": "nemoguards"
      },
      "stream": true
   }'

data: [DONE]

tgasser-nv · 2025-12-01T21:31:22Z

nemoguardrails/server/api.py

+    # Standard OpenAI completion parameters
+    model: Optional[str] = Field(
+        default=None,
+        description="The model to use for chat completion. Maps to config_id for backward compatibility.",


I think it would be better to have the model field to match the main model from the Guardrails configuration, rather than a Guardrails config ID. Then the same /chat/completions request can be POSTed to either the main LLM directly or Guardrails without any changes. On build.nvidia.com the guardrails field is ignored, and Guardrails can set a default config ID if the guardrails field isn't included

Updated the model field to match the main model in the config. If there is no main model defined, it defaults to config_id[0]

tgasser-nv · 2025-12-01T21:58:18Z

nemoguardrails/server/api.py

-        description="A state object that should be used to continue the interaction in the future.",
-    )
+@app.get(
+    "/v1/models",


This looks good, to the comment above I recommend including the main model from each Guardrail configuration and adding an extra field to the list of Models. So for example the nemoguards configuration which currently looks like:

{ "object": "list", "data": [ { "id": "nemoguards", "created": 1764625695, "object": "model", "owned_by": "nemo-guardrails" }, ... }

You'd have:

{ "object": "list", "data": [ { "id": "meta/llama-3.3-70b-instruct", "guardrails_config_id": "nemoguards", "created": 1764625695, "object": "model", "owned_by": "nemo-guardrails" }, ... }

This can be used in a /chat/completions request, where the id is used in the model field, and guardrails_config_id is used in guardrails.config_id

Done - updated id match the model field and extended the existing OpenAI Model API to have a field named guardrail_config_id. If there are no models defined in the config, the id field defaults to config_id[0]

Can you help me understand the intent here? I’m not sure I follow, doesn’t this effectively bind a model to a config?

As a side note: config_id must always be set, potentially with a configurable default.

tgasser-nv · 2025-12-01T22:04:59Z

tests/test_api.py

    res = response.json()
-    assert len(res["messages"]) == 1
-    assert res["messages"][0]["content"]
+    # Check OpenAI-compatible response structure


Could you update the e2e tests to use the LIVE_TEST_MODE approach to be consistent with the test_llm_params_e2e.py

tgasser-nv · 2025-12-01T22:08:25Z

tests/test_openai_integration.py

+    models = openai_client.models.list()
+
+    # Verify the response structure matches OpenAI's ModelList
+    assert models is not None


nit: Would it be simpler to check against an expected dict than each individual field ?

Done - we check that the returned Choice response matches an expected dict and check the id and created timestamp separately since they are variable to change

tgasser-nv · 2025-12-01T22:09:07Z

tests/test_openai_integration.py

+    )
+
+    # Verify response structure matches OpenAI's ChatCompletion object
+    assert response is not None


nit: Would it be clearer to check against an expected dict rather than iterating over each of the attributes?

…etion

christinaexyou · 2025-12-03T13:51:43Z

@Pouyanpi @tgasser-nv thanks so much for your feedback !

By default, the nim engine disables streaming by default so I added a method in llmrails.py called _configure_main_llm_streaming to enable streaming if it's set to True in the config.

There's quite a lot of commits in this PR that don't necessarily relate to the original feature, so should they be kept as separate commits or do you still want me to squash and rebase ?

Pouyanpi · 2025-12-11T10:37:43Z

Thank you @christinaexyou !

By default, the nim engine disables streaming by default so I added a method in llmrails.py called _configure_main_llm_streaming to enable streaming if it's set to True in the config.

Unfortunately we had a bug which is fixed now. I add some detailed comments .

Pouyanpi · 2025-12-11T11:02:40Z

nemoguardrails/server/schemas/openai.py

+from pydantic import BaseModel, Field
+
+
+class ResponseBody(ChatCompletion):


Suggested change

class ResponseBody(ChatCompletion):

class GuardrailsChatCompletion(ChatCompletion):

Pouyanpi · 2025-12-11T11:03:16Z

nemoguardrails/server/schemas/openai.py

+    )
+
+
+class ModelsResponse(BaseModel):


not sure about this, but let's be consistent:

Suggested change

class ModelsResponse(BaseModel):

class GuardrailsModels(BaseModel):

Pouyanpi · 2025-12-11T11:12:20Z

nemoguardrails/server/api.py

        description="A state object that should be used to continue the interaction.",
    )
+    # Standard OpenAI completion parameters
+    model: Optional[str] = Field(


model MUST be provided. It is a required field. It should be the model name that we currently use in the guardrails configuration. Using the request body, we set the main LLM; other types of models come from the config (config.yml).

Guardrails and OpenAI compatibility with respect to the model field brings some challenges:

One is that Guardrails requires an engine in addition to the model.

There are various possible solutions, like posting model metadata (including engine and base_url) to the models endpoint, where this becomes the model ID we use for lookup. This is possible but tricky.

Another option is to set a default engine and make it configurable via an environment variable. Cons: a user cannot experiment with multiple models from multiple providers while controlling for one config_id.

We could also settle on a global models file (I recommend avoiding this).

Another challenge is that such models may be hosted in different places, thus requiring different base_urls. For example, I can have 3 different NIMs with 3 different base_urls. How are we going to address that?

TL;DR: We need a mechanism to obtain <engine, base_url> from a given model name (ID).

What do you think?

Pouyanpi · 2025-12-11T11:13:24Z

nemoguardrails/server/api.py

    @root_validator(pre=True)
    def ensure_config_id(cls, data: Any) -> Any:
        if isinstance(data, dict):
+            if data.get("model") is not None and data.get("config_id") is None:


I'm not sure what is happening here.

model should never be None and we always override the main model defined in config.yml using the model provided via request body.

Pouyanpi · 2025-12-11T11:23:15Z

nemoguardrails/server/schemas/openai.py

Thank you 🥇

Let's do similar for Response schemas.

Pouyanpi · 2025-12-11T11:26:35Z

nemoguardrails/server/schemas/openai.py

+    state: Optional[dict] = Field(default=None, description="State object for continuing the conversation.")
+    llm_output: Optional[dict] = Field(default=None, description="Additional LLM output data.")
+    output_data: Optional[dict] = Field(default=None, description="Additional output data.")
+    log: Optional[dict] = Field(default=None, description="Generation log data.")


guardrails_config_ids: Optional[List[str]] = Field(default=None, description="The list of configuration ids that were used.") state: Optional[dict] = Field(default=None, description="State object for continuing the conversation.") llm_output: Optional[dict] = Field(default=None, description="Contains any additional output coming from the LLM.") output_data: Optional[dict] = Field( default=None, description="The output data, i.e. a dict with the values corresponding to the `output_vars`.", ) log: Optional[GenerationLog] = Field(default=None, description="Additional logging information.")

nit:
re config_ids vs guardrails_config_ids is there any reason that we should explicitly mention guardrails? If not better to rename it to be consistent with the request schema. But I don't mind either name

Pouyanpi · 2025-12-11T11:39:24Z

nemoguardrails/server/api.py

-                    f"An internal error has occurred.",
-                }
-            ]
+            id=f"chatcmpl-{uuid.uuid4()}",


I think these sorts of transformation/conversions on request and response fits nicely to a utils module in schemas.

a simple function with

input: response: GenerationResponse, config_ids=None output: `GuardrailCompletionResponse`

same for streaming.

Also you can include this logic there.

merely a suggestion, what do you think?

Pouyanpi · 2025-12-11T11:49:07Z

nemoguardrails/server/api.py

    return llm_rails


+async def _format_streaming_response(


for streaming it is better to use stream_async method of LLMRails. You can still iterate on it.

This allows you to support another feature. You can run ouput rails while streaming. To do so you can use/adapt following pieces:

class ErrorDetails(BaseModel): message: str type: str param: str code: str class ErrorData(BaseModel): error: ErrorDetails

for now define them here later I will include them in llmrails.py and will update this code.

def process_chunk(chunk: str) -> Union[str, ErrorData]: """ Processes a single chunk from the stream. Args: chunk (str): A single chunk from the stream. Returns: Union[str, ErrorData]: ErrorData instance for errors or the original chunk. """ try: validated_data = ErrorData.model_validate_json(chunk) return validated_data # Return the ErrorData instance directly except ValidationError: # Not an error, just a normal token pass except json.JSONDecodeError: # Invalid JSON format, treat as normal token pass except Exception as e: log.warning(f"Unexpected error processing stream chunk: {type(e).__name__}: {str(e)}", extra={"chunk": chunk})

then when you are iterating over the stream at the begining of your try/except block:

async for chunk in stream_iter: processed_chunk = process_chunk(chunk) if isinstance(processed_chunk, ErrorData): # Yield the error and stop streaming yield f"data: {processed_chunk.model_dump_json()}\n\n" return

Pouyanpi · 2025-12-11T11:51:51Z

nemoguardrails/streaming.py

Is it the changes you made re broken streaming in NIM engine? or there are other reasons?

I think it won't be necessary if you use stream_async. You can revert it locally when you made the changes to the streaming and let me know.

Pouyanpi · 2025-12-11T11:59:49Z

pyproject.toml

we can do this at the end.

we should add an extra dependency called server with these two ( similar to tracing, nvidia etc.)

cparisien requested review from Pouyanpi and tgasser-nv September 3, 2025 19:47

christinaexyou force-pushed the openai-server-compat branch from e5ac825 to f494cf2 Compare September 10, 2025 16:27

Pouyanpi added the enhancement New feature or request label Oct 14, 2025

christinaexyou marked this pull request as draft October 16, 2025 14:04

RobGeada suggested changes Oct 21, 2025

View reviewed changes

christinaexyou force-pushed the openai-server-compat branch 3 times, most recently from 2ec47ee to 96f759f Compare October 24, 2025 14:50

RobGeada approved these changes Nov 4, 2025

View reviewed changes

christinaexyou marked this pull request as ready for review November 4, 2025 18:44

greptile-apps bot reviewed Nov 4, 2025

View reviewed changes

christinaexyou force-pushed the openai-server-compat branch from 96f759f to b411d61 Compare November 4, 2025 19:36

greptile-apps bot reviewed Nov 4, 2025

View reviewed changes

christinaexyou force-pushed the openai-server-compat branch from b411d61 to 8d809d4 Compare November 5, 2025 18:28

greptile-apps bot reviewed Nov 5, 2025

View reviewed changes

christinaexyou force-pushed the openai-server-compat branch from 8d809d4 to 59c1644 Compare November 5, 2025 18:34

greptile-apps bot reviewed Nov 5, 2025

View reviewed changes

greptile-apps bot reviewed Nov 11, 2025

View reviewed changes

nemoguardrails/server/api.py Outdated Show resolved Hide resolved

christinaexyou force-pushed the openai-server-compat branch from 908f066 to f8a4cd2 Compare November 11, 2025 20:30

greptile-apps bot reviewed Nov 11, 2025

View reviewed changes

christinaexyou force-pushed the openai-server-compat branch from f8a4cd2 to 971dd70 Compare November 12, 2025 20:57

greptile-apps bot reviewed Nov 12, 2025

View reviewed changes

christinaexyou force-pushed the openai-server-compat branch from 971dd70 to 4639da5 Compare November 12, 2025 21:01

greptile-apps bot reviewed Nov 12, 2025

View reviewed changes

Pouyanpi mentioned this pull request Nov 19, 2025

feat(benchmark): AIPerf run script #1501

Merged

4 tasks

This was referenced Nov 28, 2025

feat: Multi turn conversation support in Colang 2.0 #945

Open

bug: nemoguardrails server api streaming is not working #893

Open

tgasser-nv reviewed Dec 1, 2025

View reviewed changes

christinaexyou added 5 commits December 2, 2025 16:43

feat: Modify endpoints for OpenAPI compatibility

f5b2d1f

fix: Colang 2.x doesn't support assistant messages

9970296

chore: Move OpenAPI schema and fix typos

4cdf232

Extend existing OpenAI types and add support for streaming chat compl…

aaac161

…etion

Add OpenAI docs and integration tests

416ac39

christinaexyou force-pushed the openai-server-compat branch 3 times, most recently from 8bbe93d to 24ad343 Compare December 2, 2025 22:30

Add model name to response body

c37edfb

christinaexyou force-pushed the openai-server-compat branch from 24ad343 to c37edfb Compare December 3, 2025 13:00

Update poetry.lock

720e6aa

Pouyanpi reviewed Dec 11, 2025

View reviewed changes

nemoguardrails/server/schemas/openai.py

Copy link

Collaborator

Pouyanpi Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you 🥇

Let's do similar for Response schemas.

Pouyanpi reviewed Dec 11, 2025

View reviewed changes

Pouyanpi assigned christinaexyou Dec 11, 2025

Pouyanpi added this to the 0.20.0 milestone Dec 11, 2025

	assert res["choices"]["messages"][0]["content"] == "Hello again!"
	assert res["choices"][0]["message"]["content"] == "Hello again!"

	assert res["choices"]["message"][0]["content"] == "Hello again!"
	assert res["choices"][0]["message"]["content"] == "Hello again!"

		from pydantic import BaseModel, Field


		class ResponseBody(ChatCompletion):

	class ResponseBody(ChatCompletion):
	class GuardrailsChatCompletion(ChatCompletion):

	class ModelsResponse(BaseModel):
	class GuardrailsModels(BaseModel):

feat: Modify endpoints for OpenAPI compatibility #1340

Are you sure you want to change the base?

feat: Modify endpoints for OpenAPI compatibility #1340

Uh oh!

Conversation

christinaexyou commented Aug 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related Issue(s)

Checklist

Uh oh!

christinaexyou commented Sep 10, 2025

Uh oh!

codecov-commenter commented Sep 10, 2025

Codecov Report

Uh oh!

RobGeada commented Oct 16, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Greptile Overview

Greptile Summary

Confidence Score: 2/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Greptile Overview

Greptile Summary

Confidence Score: 1/5

Important Files Changed

Sequence Diagram

Additional Comments (3)

Uh oh!

greptile-apps bot Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Greptile Overview

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Greptile Overview

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

codecov bot commented Nov 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Pouyanpi commented Nov 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

greptile-apps bot commented Nov 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Overview

Greptile Summary

christinaexyou commented Aug 18, 2025 •

edited

Loading

greptile-apps bot left a comment •

edited

Loading

codecov bot commented Nov 5, 2025 •

edited

Loading

Pouyanpi commented Nov 6, 2025 •

edited

Loading

greptile-apps bot commented Nov 11, 2025 •

edited

Loading

greptile-apps bot left a comment •

edited

Loading

tgasser-nv left a comment •

edited

Loading