Update judge command to support custom judge models #7

devin-ai-integration · 2025-03-20T06:55:03Z

Update Judge Command to Support Custom Judge Models

This PR updates the judge command to support using any model provider for judgment instead of just OpenAI. The following changes were made:

Added new parameters to replace --openai-api-key:
- --judge-model-name
- --judge-model-url
- --judge-model-api-key
Updated the following files:
- bin/api/run_docker_eval.sh
- bin/api/run_openai_judge.sh
- bin/api/entrypoint.sh
- llm_judge/gen_judgment.py
- llm_judge/common.py
- .github/workflows/run-eval.yaml
Maintained backward compatibility with existing --openai-api-key parameter.
Updated GitHub workflow to use the new parameters.

Resolve https://linear.app/liquid-ai/issue/GEN-370/update-mt-bench-judge-script-to-use-any-llm-provider
Link to Devin run: https://app.devin.ai/sessions/c81627749854419ead0b59881ff887b0
Requested by: liren@liquid.ai

Co-Authored-By: liren@liquid.ai <liren@liquid.ai>

devin-ai-integration · 2025-03-20T06:55:06Z

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

Address comments on this PR. Add "(aside)" to your comment to have me ignore it.
Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

Disable automatic comment and CI monitoring

Co-Authored-By: liren@liquid.ai <liren@liquid.ai>

Update judge command to support custom judge models

7f7fb25

Co-Authored-By: liren@liquid.ai <liren@liquid.ai>

devin-ai-integration bot and others added 12 commits March 20, 2025 07:11

Remove OpenAI API key parameter completely

030e054

Co-Authored-By: liren@liquid.ai <liren@liquid.ai>

Update README and GitHub workflow to use gpt-4o as judge model

f8ab3ba

Co-Authored-By: liren@liquid.ai <liren@liquid.ai>

Fix CI: Revert to gpt-4 as judge model name

7523768

Co-Authored-By: liren@liquid.ai <liren@liquid.ai>

Run judge on gpt-4o-mini

cfc8845

Use gpt reference answer

ae180a8

Fix more key errors

6c81de7

Use openai chat completions

df73628

Judge the model by itself

ce03360

Log api base and key

fb491d7

Use lfm-7b as judge

8fe1734

Add more typing

338b54b

Fix arguments and update logging

e833cd3

tuliren merged commit 3967e52 into main Apr 10, 2025
1 check passed

tuliren deleted the devin/1742453336-update-judge-command branch April 10, 2025 07:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update judge command to support custom judge models #7

Update judge command to support custom judge models #7

Uh oh!

devin-ai-integration bot commented Mar 20, 2025 •

edited by tuliren

Loading

Uh oh!

devin-ai-integration bot commented Mar 20, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Update judge command to support custom judge models #7

Update judge command to support custom judge models #7

Uh oh!

Conversation

devin-ai-integration bot commented Mar 20, 2025 • edited by tuliren Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Update Judge Command to Support Custom Judge Models

Uh oh!

devin-ai-integration bot commented Mar 20, 2025

🤖 Devin AI Engineer

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

devin-ai-integration bot commented Mar 20, 2025 •

edited by tuliren

Loading