[issue-4270] [P SDK] Improve GEval compatibility with DashScope Qwen judge model #4271

Susan9001 · 2025-11-28T20:37:15Z

Details

This PR is a follow-up to #4229 and makes DashScope Qwen more robust as a GEval judge model when used via LiteLLMChatModel.

Currently, when a model advertises logprobs and top_logprobs support, GEval enables the logprobs-aware scoring path. For DashScope Qwen this can occasionally lead to MetricComputationError("Failed to calculate g-eval score") because the returned logprobs do not always match the OpenAI-style format expected by the parser.

This PR treats DashScope Qwen as not logprobs-supported in this context, so GEval falls back to the standard text/JSON-based parsing path instead of relying on logprobs.

Change checklist

User facing
Documentation update

Issues

Testing

Locally:

pytest tests/unit/evaluation/models/test_litellm_chat_model.py

Ran more examples with dashscope/qwen-flash as the judge model with code snippets:

      self.judge_model = models.LiteLLMChatModel(
          model_name=judge_model_name,
          api_base="https://dashscope.aliyuncs.com/compatible-mode/v1",
          api_key=os.getenv("DASHSCOPE_API_KEY"),
      )

All samples now score successfully without Failed to calculate g-eval score.

Documentation

…ik into feat-dashscope-qwen-litellm

yaricom · 2025-11-30T12:01:43Z

Hi @Susan9001 ! Thank you for a contribution! Please fix merge conflicts with current branch.

Cheers,
Iaroslav

Susan9001 · 2025-12-11T16:24:13Z

Hi @yaricom ,
I have just resolved the merge conflicts. Sorry it took me a bit to get back to this. Please let me know if anything else I need to adjust.

yaricom · 2025-12-12T12:06:34Z

Hi @Susan9001 ! Thank you for the contribution!

Happy coding!
Iaroslav

Susan9001 and others added 7 commits November 27, 2025 21:02

improve dashscope qwen support in LiteLLMChatModel

904fbcf

refactor model specific filters into per model handlers

b63c6cd

Merge branch 'main' into feat-dashscope-qwen-litellm

4c4d7b9

Merge branch 'main' into feat-dashscope-qwen-litellm

32621f9

Merge branch 'main' into feat-dashscope-qwen-litellm

214e4ed

Refine LiteLLM model filters and tests for GPT-5 and DashScope Qwen

bb9197a

Merge branch 'feat-dashscope-qwen-litellm' of github.com:Susan9001/op…

2e144a6

…ik into feat-dashscope-qwen-litellm

Susan9001 requested a review from a team as a code owner November 28, 2025 20:37

Merge branch 'main' into feat-dashscope-qwen-litellm

f7409b9

Merge branch 'main' into feat-dashscope-qwen-litellm

99ad2de

yaricom approved these changes Dec 12, 2025

View reviewed changes

yaricom merged commit 59a883c into comet-ml:main Dec 12, 2025
35 of 38 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[issue-4270] [P SDK] Improve GEval compatibility with DashScope Qwen judge model #4271

[issue-4270] [P SDK] Improve GEval compatibility with DashScope Qwen judge model #4271

Uh oh!

Susan9001 commented Nov 28, 2025 •

edited by yaricom

Loading

Uh oh!

yaricom commented Nov 30, 2025

Uh oh!

Susan9001 commented Dec 11, 2025

Uh oh!

Uh oh!

yaricom commented Dec 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[issue-4270] [P SDK] Improve GEval compatibility with DashScope Qwen judge model #4271

[issue-4270] [P SDK] Improve GEval compatibility with DashScope Qwen judge model #4271

Uh oh!

Conversation

Susan9001 commented Nov 28, 2025 • edited by yaricom Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Details

Change checklist

Issues

Testing

Documentation

Uh oh!

yaricom commented Nov 30, 2025

Uh oh!

Susan9001 commented Dec 11, 2025

Uh oh!

Uh oh!

yaricom commented Dec 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Susan9001 commented Nov 28, 2025 •

edited by yaricom

Loading