Skip to content

Comments

Update bencher CLI to handle remote jobs#672

Merged
epompeii merged 15 commits intodevelfrom
u/ep/cli-image
Feb 20, 2026
Merged

Update bencher CLI to handle remote jobs#672
epompeii merged 15 commits intodevelfrom
u/ep/cli-image

Conversation

@epompeii
Copy link
Member

This changeset introduces the concept of a remote job to the bencher CLI.
The runner CLI has also been updated to better match the conventions of the bencher CLI.

@epompeii epompeii self-assigned this Feb 19, 2026
@github-actions
Copy link

github-actions bot commented Feb 19, 2026

🤖 Claude Code Review

PR: #672
Base: devel
Head: u/ep/cli-image
Commit: 1e3739876376f93a927b21bb95ed343be523ba41


Now I have a thorough understanding of the changes. Here's the review:


PR Review: bencher run --image + Runner Overrides

Commits: 15 commits covering CLI --image support, runner daemon → up rename, entrypoint/cmd/env overrides, ImageReference validation type, bencher_parser crate, and collection-size validation.

Overall Assessment

This is a well-structured, incremental PR that lays the CLI and runner groundwork for remote benchmark execution. The code is generally high quality with good test coverage. Several issues noted below.


Issues

1. num_args used in runner CLI despite CLAUDE.md prohibition

services/runner/src/parser/mod.rs:75-80 uses num_args = 1..=MAX_ENTRYPOINT_LEN and num_args = 1..=MAX_CMD_LEN:

#[arg(long, num_args = 1..=bencher_json::MAX_ENTRYPOINT_LEN)]
pub entrypoint: Option<Vec<String>>,

#[arg(long, num_args = 1..=bencher_json::MAX_CMD_LEN)]
pub cmd: Option<Vec<String>>,

The CLAUDE.md rule (added in this same PR!) says: "Do NOT use num_args on flags in bencher run". The rule specifically references bencher run, and while the runner run subcommand is a different binary, the rationale about trailing_var_arg conflicts applies. Additionally, the spirit of the CLAUDE.md update is to validate at the type/deserialization layer, which is already done in JsonJobConfig. Consider removing num_args here for consistency, or clarify the CLAUDE.md rule scope.

2. ImageReference parsing: registry.io:5000 without a path is treated as docker.io/library/registry.io:5000

The test parse_dotted_registry_with_port_no_path asserts that registry.io:5000 is parsed as docker.io/library/registry.io with tag 5000. This differs from Docker behavior, where a dotted name with a port would be treated as a registry. The ambiguity is inherent (no path component means we can't distinguish host:port from image:tag), but worth documenting as a known limitation in the type's doc comment since it could confuse users.

3. build_time validation has divergent #[cfg] paths that could drift

In services/cli/src/bencher/sub/run/mod.rs:103-112:

#[cfg(feature = "plus")]
if build_time && job.is_none() && cmd.command.is_none() {
    return Err(RunError::BuildTimeNoCommandOrImage.into());
}
#[cfg(not(feature = "plus"))]
if cmd.build_time && cmd.command.is_none() {
    return Err(RunError::BuildTimeNoCommandOrImage.into());
}

The plus path reads build_time from a local binding extracted before cmd is consumed, while the non-plus path reads cmd.build_time. This works, but the divergent cfg branches are a maintenance risk. Consider extracting the check into a helper or at minimum adding a comment explaining why they differ.

4. PLAN.md checked into the repo

services/runner/PLAN.md is a 122-line planning document describing implementation gaps. This is useful as internal documentation but reads as a WIP artifact. Consider whether this should be tracked as a GitHub issue instead, or if it's intentionally documentation.

5. --build-time constraint relaxed without --image guard on non-plus builds

The clap attribute on build_time was changed from requires = "command" to just conflicts_with = "file". On a non-plus build, bencher run --build-time with no command would now fail at a later stage with BuildTimeNoCommandOrImage rather than at the clap parsing layer. This works but gives a less helpful error (runtime vs. parse-time). The plus build has the --image alternative that justifies the relaxation, but non-plus builds gain nothing from it.

6. generate_remote_report sets both start_time and end_time to DateTime::now()

In services/cli/src/bencher/sub/run/mod.rs:286-287:

start_time: now.into(),
end_time: now.into(),

For a remote job that hasn't executed yet, setting end_time = now is misleading. The PLAN.md notes this as a known gap ("Set the report's start/end time appropriately for deferred execution"). Consider setting end_time to None or using a sentinel, or add a // TODO comment linking to the gap.

7. JsonNewRun::job is Option<JsonNewRunJob> but only behind #[cfg(feature = "plus")]

The job field on JsonNewRun is conditionally compiled:

#[cfg(feature = "plus")]
#[serde(default, skip_serializing_if = "Option::is_none")]
pub job: Option<JsonNewRunJob>,

The generate_local_report unconditionally sets job: None, which will fail to compile without plus. Checking the actual code at services/cli/src/bencher/sub/run/mod.rs:274 — the job: None is outside any cfg gate. This means the CLI crate must always be compiled with the plus feature, or else it won't compile. Verify this is the case.

8. bencher_parser crate has no dependencies

The new bencher_parser crate (lib/bencher_parser/Cargo.toml) has zero dependencies, not even std. This is fine for the current check_env/check_volume/parse_env utilities. But it could also just live in bencher_json or bencher_valid. A standalone crate for 43 lines of code adds workspace/Dockerfile overhead. Low priority.


Positive Observations

  • Good type discipline: ImageReference as a validated type in bencher_valid with proper Serialize/Deserialize/FromStr impls follows the project's strong typing conventions.
  • Collection size validation via #[serde(try_from = "...")] is the right pattern — validates at the deserialization boundary rather than in application code.
  • Comprehensive tests: The job config boundary tests (at-max, over-max) and the runner scenario tests (cli_entrypoint_override, cli_cmd_override, cli_env_*) are thorough.
  • Docker semantics for entrypoint/cmd override (overriding entrypoint clears CMD) are correctly implemented and tested.
  • daemon → up rename is clean, touches many files but is mechanical and consistent.
  • CliTuning extraction into a shared struct eliminates duplication between run and up subcommands.
  • Both Dockerfiles updated for the new bencher_parser crate as required by CLAUDE.md.
  • thiserror used consistently for new error types (JobConfigError, UpError, etc.).

Model: claude-opus-4-6

@github-actions
Copy link

github-actions bot commented Feb 19, 2026

🐰 Bencher Report

Branchu/ep/cli-image
Testbedubuntu-22.04
Click to view all benchmark results
BenchmarkLatencyBenchmark Result
microseconds (µs)
(Result Δ%)
Upper Boundary
microseconds (µs)
(Limit %)
Adapter::Json📈 view plot
🚷 view threshold
3.83 µs
(+9.32%)Baseline: 3.50 µs
4.68 µs
(81.87%)
Adapter::Magic (JSON)📈 view plot
🚷 view threshold
3.79 µs
(+8.33%)Baseline: 3.50 µs
4.61 µs
(82.33%)
Adapter::Magic (Rust)📈 view plot
🚷 view threshold
26.58 µs
(+3.25%)Baseline: 25.74 µs
29.52 µs
(90.02%)
Adapter::Rust📈 view plot
🚷 view threshold
3.07 µs
(+9.72%)Baseline: 2.80 µs
3.18 µs
(96.58%)
Adapter::RustBench📈 view plot
🚷 view threshold
3.08 µs
(+9.91%)Baseline: 2.81 µs
3.19 µs
(96.66%)
head_version_insert/batch/10📈 view plot
🚷 view threshold
95.57 µs
(+5.07%)Baseline: 90.96 µs
112.42 µs
(85.02%)
head_version_insert/batch/100📈 view plot
🚷 view threshold
232.54 µs
(+2.00%)Baseline: 227.98 µs
248.63 µs
(93.53%)
head_version_insert/batch/255📈 view plot
🚷 view threshold
458.95 µs
(+1.48%)Baseline: 452.24 µs
485.73 µs
(94.49%)
head_version_insert/batch/50📈 view plot
🚷 view threshold
156.38 µs
(+3.31%)Baseline: 151.38 µs
173.26 µs
(90.26%)
threshold_query/join/10📈 view plot
🚷 view threshold
136.60 µs
(+1.09%)Baseline: 135.13 µs
157.16 µs
(86.92%)
threshold_query/join/20📈 view plot
🚷 view threshold
149.72 µs
(-0.04%)Baseline: 149.77 µs
170.79 µs
(87.66%)
threshold_query/join/5📈 view plot
🚷 view threshold
130.77 µs
(+2.12%)Baseline: 128.06 µs
150.04 µs
(87.16%)
threshold_query/join/50📈 view plot
🚷 view threshold
191.41 µs
(+0.34%)Baseline: 190.75 µs
212.83 µs
(89.94%)
🐰 View full continuous benchmarking report in Bencher

@epompeii epompeii merged commit 05c3ecf into devel Feb 20, 2026
60 of 62 checks passed
@epompeii epompeii deleted the u/ep/cli-image branch February 20, 2026 04:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant