GitHub - hud-evals/hud-python: OSS RL environment + evals toolkit

The HUD SDK is an open-source Python toolkit for building, evaluating, and training AI agents. Use a unified API for any model provider, wrap your code as MCP environments, run A/B evals at scale, and train with reinforcement learning.

To learn more, check out our Documentation and API Reference.

Install

pip install hud-python

Get your API key at hud.ai and set it:

export HUD_API_KEY=your-key-here

For CLI tools (hud init, hud dev, etc.): uv tool install hud-python --python 3.12

Usage

Unified Model API

Use Claude, GPT, Gemini, or Grok through one OpenAI-compatible endpoint:

from openai import AsyncOpenAI
import os

client = AsyncOpenAI(
    base_url="https://inference.hud.ai",
    api_key=os.environ["HUD_API_KEY"]
)

response = await client.chat.completions.create(
    model="claude-sonnet-4-5",  # or gpt-4o, gemini-2.5-pro (https://hud.ai/models)
    messages=[{"role": "user", "content": "Hello!"}]
)

Every call is traced at hud.ai. → Docs

Environments

Turn your code into tools agents can call. Define how to evaluate them:

from hud import Environment

env = Environment("my-env")

@env.tool()
def add(a: int, b: int) -> int:
    """Add two numbers."""
    return a + b

@env.scenario("solve-math")
async def solve_math(problem: str, answer: int):
    response = yield problem                    # Prompt
    yield 1.0 if str(answer) in response else 0.0  # Reward

async with env("solve-math", problem="What is 2+2?", answer=4) as ctx:
    # Your agent logic here - call tools, get response
    result = await ctx.call_tool("add", a=2, b=2)
    await ctx.submit(f"The answer is {result}")

print(ctx.reward)  # 1.0

The agent runs between the yields. First yield sends the prompt, second yield scores the result. → Docs · Templates

A/B Evals

Test different models. Repeat runs to see the distribution:

from openai import AsyncOpenAI
import os

client = AsyncOpenAI(
    base_url="https://inference.hud.ai",
    api_key=os.environ["HUD_API_KEY"]
)

# Using the env from above
async with env("solve-math", problem="What is 2+2?", answer=4, variants={"model": ["gpt-4o", "claude-sonnet-4-5"]}, group=5) as ctx:
    response = await client.chat.completions.create(
        model=ctx.variants["model"],
        messages=[{"role": "user", "content": ctx.prompt}],
        tools=ctx.tools  # Environment tools available to the model
    )
    await ctx.submit(response.choices[0].message.content)

Variants test configurations. Groups repeat for distribution. Results stream to hud.ai. → Docs

Deploy & Train

Push to GitHub, connect on hud.ai, run at scale:

hud init                  # Scaffold environment
git push                  # Push to GitHub
# Connect on hud.ai → New → Environment
hud eval my-eval --model gpt-4o --group-size 100
# Or create and run tasks on the platform

Every run generates training data. Use it to fine-tune or run RL. → Docs

Links

Enterprise

Building agents at scale? We work with teams on custom environments, benchmarks, and training.

📅 Book a call · 📧 founders@hud.ai

Contributing

We welcome contributions! See CONTRIBUTING.md.

Key areas: Agents · Tools · Environments

Citation

@software{hud2025agentevalplatform,
  author = {HUD and Jay Ram and Lorenss Martinsons and Parth Patel and Govind Pimpale and Dylan Bowman and Jaideep and Nguyen Nhat Minh},
  title  = {HUD: An Evaluation and RL Envrionments Platform for Agents},
  date   = {2025-04},
  url    = {https://github.com/hud-evals/hud-python},
  langid = {en}
}

MIT License · LICENSE

Name		Name	Last commit message	Last commit date
Latest commit History 1,919 Commits
.github/workflows		.github/workflows
docs		docs
examples		examples
hud		hud
scripts		scripts
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Install

Usage

Unified Model API

Environments

A/B Evals

Deploy & Train

Links

Enterprise

Contributing

Citation

About

Uh oh!

Releases 99

Uh oh!

Contributors 24

Languages

License

hud-evals/hud-python

Folders and files

Latest commit

History

Repository files navigation

Install

Usage

Unified Model API

Environments

A/B Evals

Deploy & Train

Links

Enterprise

Contributing

Citation

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 99

Uh oh!

Contributors 24

Languages