Update `bencher` CLI to handle remote jobs by epompeii · Pull Request #672 · bencherdev/bencher

epompeii · 2026-02-19T05:10:20Z

This changeset introduces the concept of a remote job to the bencher CLI.
The runner CLI has also been updated to better match the conventions of the bencher CLI.

github-actions · 2026-02-19T05:12:02Z

🤖 Claude Code Review

PR: #672
Base: devel
Head: u/ep/cli-image
Commit: 1e3739876376f93a927b21bb95ed343be523ba41

Now I have a thorough understanding of the changes. Here's the review:

PR Review: `bencher run --image` + Runner Overrides

Commits: 15 commits covering CLI --image support, runner daemon → up rename, entrypoint/cmd/env overrides, ImageReference validation type, bencher_parser crate, and collection-size validation.

Overall Assessment

This is a well-structured, incremental PR that lays the CLI and runner groundwork for remote benchmark execution. The code is generally high quality with good test coverage. Several issues noted below.

Issues

1. `num_args` used in runner CLI despite CLAUDE.md prohibition

services/runner/src/parser/mod.rs:75-80 uses num_args = 1..=MAX_ENTRYPOINT_LEN and num_args = 1..=MAX_CMD_LEN:

#[arg(long, num_args = 1..=bencher_json::MAX_ENTRYPOINT_LEN)]
pub entrypoint: Option<Vec<String>>,

#[arg(long, num_args = 1..=bencher_json::MAX_CMD_LEN)]
pub cmd: Option<Vec<String>>,

The CLAUDE.md rule (added in this same PR!) says: "Do NOT use num_args on flags in bencher run". The rule specifically references bencher run, and while the runner run subcommand is a different binary, the rationale about trailing_var_arg conflicts applies. Additionally, the spirit of the CLAUDE.md update is to validate at the type/deserialization layer, which is already done in JsonJobConfig. Consider removing num_args here for consistency, or clarify the CLAUDE.md rule scope.

2. `ImageReference` parsing: `registry.io:5000` without a path is treated as `docker.io/library/registry.io:5000`

The test parse_dotted_registry_with_port_no_path asserts that registry.io:5000 is parsed as docker.io/library/registry.io with tag 5000. This differs from Docker behavior, where a dotted name with a port would be treated as a registry. The ambiguity is inherent (no path component means we can't distinguish host:port from image:tag), but worth documenting as a known limitation in the type's doc comment since it could confuse users.

3. `build_time` validation has divergent `#[cfg]` paths that could drift

In services/cli/src/bencher/sub/run/mod.rs:103-112:

#[cfg(feature = "plus")]
if build_time && job.is_none() && cmd.command.is_none() {
    return Err(RunError::BuildTimeNoCommandOrImage.into());
}
#[cfg(not(feature = "plus"))]
if cmd.build_time && cmd.command.is_none() {
    return Err(RunError::BuildTimeNoCommandOrImage.into());
}

The plus path reads build_time from a local binding extracted before cmd is consumed, while the non-plus path reads cmd.build_time. This works, but the divergent cfg branches are a maintenance risk. Consider extracting the check into a helper or at minimum adding a comment explaining why they differ.

4. `PLAN.md` checked into the repo

services/runner/PLAN.md is a 122-line planning document describing implementation gaps. This is useful as internal documentation but reads as a WIP artifact. Consider whether this should be tracked as a GitHub issue instead, or if it's intentionally documentation.

5. `--build-time` constraint relaxed without `--image` guard on non-plus builds

The clap attribute on build_time was changed from requires = "command" to just conflicts_with = "file". On a non-plus build, bencher run --build-time with no command would now fail at a later stage with BuildTimeNoCommandOrImage rather than at the clap parsing layer. This works but gives a less helpful error (runtime vs. parse-time). The plus build has the --image alternative that justifies the relaxation, but non-plus builds gain nothing from it.

6. `generate_remote_report` sets both `start_time` and `end_time` to `DateTime::now()`

In services/cli/src/bencher/sub/run/mod.rs:286-287:

start_time: now.into(),
end_time: now.into(),

For a remote job that hasn't executed yet, setting end_time = now is misleading. The PLAN.md notes this as a known gap ("Set the report's start/end time appropriately for deferred execution"). Consider setting end_time to None or using a sentinel, or add a // TODO comment linking to the gap.

7. `JsonNewRun::job` is `Option<JsonNewRunJob>` but only behind `#[cfg(feature = "plus")]`

The job field on JsonNewRun is conditionally compiled:

#[cfg(feature = "plus")]
#[serde(default, skip_serializing_if = "Option::is_none")]
pub job: Option<JsonNewRunJob>,

The generate_local_report unconditionally sets job: None, which will fail to compile without plus. Checking the actual code at services/cli/src/bencher/sub/run/mod.rs:274 — the job: None is outside any cfg gate. This means the CLI crate must always be compiled with the plus feature, or else it won't compile. Verify this is the case.

8. `bencher_parser` crate has no dependencies

The new bencher_parser crate (lib/bencher_parser/Cargo.toml) has zero dependencies, not even std. This is fine for the current check_env/check_volume/parse_env utilities. But it could also just live in bencher_json or bencher_valid. A standalone crate for 43 lines of code adds workspace/Dockerfile overhead. Low priority.

Positive Observations

Good type discipline: ImageReference as a validated type in bencher_valid with proper Serialize/Deserialize/FromStr impls follows the project's strong typing conventions.
Collection size validation via #[serde(try_from = "...")] is the right pattern — validates at the deserialization boundary rather than in application code.
Comprehensive tests: The job config boundary tests (at-max, over-max) and the runner scenario tests (cli_entrypoint_override, cli_cmd_override, cli_env_*) are thorough.
Docker semantics for entrypoint/cmd override (overriding entrypoint clears CMD) are correctly implemented and tested.
daemon → up rename is clean, touches many files but is mechanical and consistent.
CliTuning extraction into a shared struct eliminates duplication between run and up subcommands.
Both Dockerfiles updated for the new bencher_parser crate as required by CLAUDE.md.
thiserror used consistently for new error types (JobConfigError, UpError, etc.).

_{Model: claude-opus-4-6}

github-actions · 2026-02-19T05:28:16Z

Bencher Report

Branch	u/ep/cli-image
Testbed	ubuntu-22.04

Click to view all benchmark results

Benchmark	Latency	Benchmark Result microseconds (µs) (Result Δ%)	Upper Boundary microseconds (µs) (Limit %)
Adapter::Json	📈 view plot 🚷 view threshold	3.83 µs (+9.32%) Baseline: 3.50 µs	4.68 µs (81.87%)
Adapter::Magic (JSON)	📈 view plot 🚷 view threshold	3.79 µs (+8.33%) Baseline: 3.50 µs	4.61 µs (82.33%)
Adapter::Magic (Rust)	📈 view plot 🚷 view threshold	26.58 µs (+3.25%) Baseline: 25.74 µs	29.52 µs (90.02%)
Adapter::Rust	📈 view plot 🚷 view threshold	3.07 µs (+9.72%) Baseline: 2.80 µs	3.18 µs (96.58%)
Adapter::RustBench	📈 view plot 🚷 view threshold	3.08 µs (+9.91%) Baseline: 2.81 µs	3.19 µs (96.66%)
head_version_insert/batch/10	📈 view plot 🚷 view threshold	95.57 µs (+5.07%) Baseline: 90.96 µs	112.42 µs (85.02%)
head_version_insert/batch/100	📈 view plot 🚷 view threshold	232.54 µs (+2.00%) Baseline: 227.98 µs	248.63 µs (93.53%)
head_version_insert/batch/255	📈 view plot 🚷 view threshold	458.95 µs (+1.48%) Baseline: 452.24 µs	485.73 µs (94.49%)
head_version_insert/batch/50	📈 view plot 🚷 view threshold	156.38 µs (+3.31%) Baseline: 151.38 µs	173.26 µs (90.26%)
threshold_query/join/10	📈 view plot 🚷 view threshold	136.60 µs (+1.09%) Baseline: 135.13 µs	157.16 µs (86.92%)
threshold_query/join/20	📈 view plot 🚷 view threshold	149.72 µs (-0.04%) Baseline: 149.77 µs	170.79 µs (87.66%)
threshold_query/join/5	📈 view plot 🚷 view threshold	130.77 µs (+2.12%) Baseline: 128.06 µs	150.04 µs (87.16%)
threshold_query/join/50	📈 view plot 🚷 view threshold	191.41 µs (+0.34%) Baseline: 190.75 µs	212.83 µs (89.94%)

🐰 View full continuous benchmarking report in Bencher

epompeii added 10 commits February 18, 2026 16:19

unchecked_job_config

2b30a46

new_run_job

7eb8270

cli_no_runner

a2b839a

runner_up

62500ff

runner_args

df8095d

run_plan

084af3a

self_review

3fd60cd

bencher_parser

35541c5

claude_1

08791f5

build_time

6ea2bd2

epompeii self-assigned this Feb 19, 2026

epompeii added the claude label Feb 19, 2026

epompeii added 2 commits February 19, 2026 05:32

override_scenario

c77f88e

claude_2

2e7c65e

epompeii force-pushed the u/ep/cli-image branch from a09d9ef to 2e7c65e Compare February 19, 2026 06:20

epompeii added 3 commits February 19, 2026 13:17

rm_num_args

183cac6

claude_3

e82ba9d

claude_4

1e37398

epompeii merged commit 05c3ecf into devel Feb 20, 2026
60 of 62 checks passed

epompeii deleted the u/ep/cli-image branch February 20, 2026 04:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Update `bencher` CLI to handle remote jobs#672

Update `bencher` CLI to handle remote jobs#672
epompeii merged 15 commits intodevelfrom
u/ep/cli-image

epompeii commented Feb 19, 2026

Uh oh!

github-actions bot commented Feb 19, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Feb 19, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

epompeii commented Feb 19, 2026

Uh oh!

github-actions bot commented Feb 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🤖 Claude Code Review

PR Review: bencher run --image + Runner Overrides

Overall Assessment

Issues

1. num_args used in runner CLI despite CLAUDE.md prohibition

2. ImageReference parsing: registry.io:5000 without a path is treated as docker.io/library/registry.io:5000

3. build_time validation has divergent #[cfg] paths that could drift

4. PLAN.md checked into the repo

5. --build-time constraint relaxed without --image guard on non-plus builds

6. generate_remote_report sets both start_time and end_time to DateTime::now()

7. JsonNewRun::job is Option<JsonNewRunJob> but only behind #[cfg(feature = "plus")]

8. bencher_parser crate has no dependencies

Positive Observations

Uh oh!

github-actions bot commented Feb 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Bencher Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

github-actions bot commented Feb 19, 2026 •

edited

Loading

PR Review: `bencher run --image` + Runner Overrides

1. `num_args` used in runner CLI despite CLAUDE.md prohibition

2. `ImageReference` parsing: `registry.io:5000` without a path is treated as `docker.io/library/registry.io:5000`

3. `build_time` validation has divergent `#[cfg]` paths that could drift

4. `PLAN.md` checked into the repo

5. `--build-time` constraint relaxed without `--image` guard on non-plus builds

6. `generate_remote_report` sets both `start_time` and `end_time` to `DateTime::now()`

7. `JsonNewRun::job` is `Option<JsonNewRunJob>` but only behind `#[cfg(feature = "plus")]`

8. `bencher_parser` crate has no dependencies

github-actions bot commented Feb 19, 2026 •

edited

Loading