Skip to content

Conversation

@devin-ai-integration
Copy link
Contributor

@devin-ai-integration devin-ai-integration bot commented Mar 20, 2025

Update Judge Command to Support Custom Judge Models

This PR updates the judge command to support using any model provider for judgment instead of just OpenAI. The following changes were made:

  1. Added new parameters to replace --openai-api-key:

    • --judge-model-name
    • --judge-model-url
    • --judge-model-api-key
  2. Updated the following files:

    • bin/api/run_docker_eval.sh
    • bin/api/run_openai_judge.sh
    • bin/api/entrypoint.sh
    • llm_judge/gen_judgment.py
    • llm_judge/common.py
    • .github/workflows/run-eval.yaml
  3. Maintained backward compatibility with existing --openai-api-key parameter.

  4. Updated GitHub workflow to use the new parameters.

Resolve https://linear.app/liquid-ai/issue/GEN-370/update-mt-bench-judge-script-to-use-any-llm-provider
Link to Devin run: https://app.devin.ai/sessions/c81627749854419ead0b59881ff887b0
Requested by: liren@liquid.ai

Co-Authored-By: liren@liquid.ai <liren@liquid.ai>
@devin-ai-integration
Copy link
Contributor Author

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add "(aside)" to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

@tuliren tuliren merged commit 3967e52 into main Apr 10, 2025
1 check passed
@tuliren tuliren deleted the devin/1742453336-update-judge-command branch April 10, 2025 07:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants