Add the ability for a Testbed to have a Spec by epompeii · Pull Request #673 · bencherdev/bencher

epompeii · 2026-02-21T06:58:21Z

This changeset adds the ability for a Testbed to have an associated Spec.

github-actions · 2026-02-21T07:00:27Z

🤖 Claude Code Review

PR: #673
Base: devel
Head: u/ep/testbed-spec
Commit: 5b3bfeda7500d760344d5eb35ffdae906d05338a

I now have a thorough understanding of the full diff. Let me write the review.

Pull Request Review: Testbed Spec Feature

Summary

This PR adds a "spec" (hardware specification) field to testbeds, allowing users to associate a hardware spec with a testbed and use specs to filter performance data. The changes span the full stack: database migration, Rust API/schema, CLI, and frontend console.

Issues and Concerns

1. Bug: `job.spec_id` is `NOT NULL` in schema, but queried as `Option<SpecId>`

In lib/bencher_schema/src/schema.rs:80, job.spec_id is -> Integer (not nullable). However, the queries in report/mod.rs:170-175, report/mod.rs:235-240, and metrics.rs:212-218 all do:

let spec_id: Option<SpecId> = schema::job::table
    .filter(schema::job::report_id.eq(...))
    .select(schema::job::spec_id)
    .first::<SpecId>(conn)
    .optional()  // <-- this makes the whole query optional (row not found)
    ...

The .optional() here only covers "no row found" (i.e., no job for this report), which is correct since not all reports come from jobs. But first::<SpecId> will deserialize the spec_id integer directly, not as Option<SpecId>. If a report does have a job, the spec ID will always be present since the column is non-nullable.

This means: for reports without a job, spec_id is None (correct). For reports with a job, spec_id is always Some(...) (correct). This is fine, but it's worth documenting the intent -- the Option wraps the row existence, not the column nullability.

2. Duplicated spec_id lookup pattern — consider extracting a helper

The same job → spec_id lookup query is copy-pasted in three places:

lib/bencher_schema/src/model/project/report/mod.rs:170-175 (report creation)
lib/bencher_schema/src/model/project/report/mod.rs:235-240 (report JSON serialization)
lib/api_projects/src/metrics.rs:212-218 (metric query)

This is a good candidate for a small helper function, e.g., QueryJob::spec_id_for_report(conn, report_id).

3. `metrics.rs` uses `inner_join` while `report/mod.rs` uses direct filter

In metrics.rs:213, the query uses .inner_join(schema::report::table) and filters by report::uuid. In report/mod.rs:171 and :236, the query filters by job::report_id directly. The inner_join variant in metrics.rs is correct but subtly different — if job.report_id references a report, both should work, but the inconsistency could mask bugs. Consider standardizing.

4. `into_json_for_project` signature change is a breaking API change within the crate

QueryTestbed::into_json_for_project changed from self → JsonTestbed to self → Result<JsonTestbed, HttpError>. This required updates in every caller. The migration is complete, but the #[expect(clippy::too_many_lines)] on QueryReport::create (report/mod.rs:75) suggests this function is getting unwieldy. The reason given ("spec_id binding extraction pushes over limit") is honest but worth watching.

5. Potential N+1 query in testbed listing

In lib/api_projects/src/testbeds.rs, the get_ls_inner function now iterates over testbeds and calls into_json_for_project for each, which resolves the spec via a separate query per testbed. For listings with many testbeds, this creates an N+1 query pattern. Consider joining the spec in the initial query.

6. `QueryTestbed::into_json_for_spec` — `#[cfg(not(feature = "plus"))] let _ = conn;`

At testbed.rs:175, when the plus feature is off, conn is explicitly silenced with let _ = conn. This works but is a code smell. An alternative is to use #[cfg_attr(not(feature = "plus"), expect(unused_variables))] on the parameter, though the #[cfg(...)] parameter attribute pattern used elsewhere in the PR is consistent.

7. `InsertTestbed::localhost` uses `expect` — justified but fragile

At testbed.rs:260-265:

#[expect(clippy::expect_used, reason = "localhost has no spec, so from_json cannot fail")]
pub fn localhost(conn: &mut DbConnection, project_id: ProjectId) -> Self {
    Self::from_json(conn, project_id, JsonNewTestbed::localhost())
        .expect("Failed to create localhost testbed")
}

The expect is justified by the comment — localhost() sets spec: None, so no DB lookup occurs. However, if from_json gains new fallible paths in the future, this will silently panic. The lint suppression reason is good documentation.

8. Custom `Deserialize` for `JsonUpdateTestbed` — correct but complex

The manual Deserialize impl in lib/bencher_json/src/project/testbed.rs (lines ~112-245) handles the null vs absent distinction for the spec field. This is a well-known pattern for PATCH semantics. The implementation looks correct:

spec absent → Patch with spec: None (don't change)
spec: "some-uuid" → Patch with spec: Some(resource_id) (set)
spec: null → Null variant (clear)

One concern: the #[cfg(not(feature = "plus"))] path always returns Patch and ignores spec, which means non-plus builds silently accept and discard spec in the request body. This is probably intentional for forward compatibility.

9. `unreachable!()` in `UpdateTestbed::from` for non-plus `Null` variant

At testbed.rs:303-307:

#[cfg(not(feature = "plus"))]
#[expect(clippy::unreachable, reason = "Null variant is only constructed with the plus feature")]
JsonUpdateTestbed::Null(_) => unreachable!(),

This is correct given the deserializer's behavior, but unreachable!() is a panic. Since the Null variant is never constructed without plus, this is truly unreachable. The #[expect] with reason is appropriate.

10. Frontend: `fmtBytes` uses base-2 units (GiB/MiB/KiB)

The fmtBytes function in ViewCard.tsx uses binary thresholds (1024) with binary suffixes (GiB, MiB, KiB). This is correct — the memory/disk values from the spec appear to be in bytes. Just confirm the API actually returns bytes (e.g., memory: 0x0002_0000_0000 = 8 GiB).

11. Frontend: `specs` not cleared on plot load

In PerfPanel.tsx:1059-1066, when loading a plot, HEADS_PARAM is set to null but SPECS_PARAM is also set to null. This is correct. Similarly, on clear all (PerfPanel.tsx:1109), SPECS_PARAM is nulled. The handling looks consistent.

12. CLI: `update.rs` parameter rename `create` → `update`

In services/cli/src/bencher/sub/project/branch/update.rs:26 and services/cli/src/bencher/sub/project/testbed/update.rs:30, the parameter name was corrected from create to update. Good cleanup.

13. CLI: `dimension.rs` adds `self.log_success(action)` calls

Three new self.log_success(action) calls were added in services/cli/src/bencher/sub/project/archive/dimension.rs. These appear to have been missing before — verify this doesn't duplicate logging if log_success was already called elsewhere in the call chain.

14. `size_specs_to_testbeds` could be simplified

The function at perf.rs:210-221 uses fold with push when a simpler map would suffice:

fn size_specs_to_testbeds(testbeds: &[TestbedUuid], specs: &[Option<SpecUuid>]) -> Vec<Option<SpecUuid>> {
    (0..testbeds.len())
        .map(|i| specs.get(i).copied().flatten())
        .collect()
}

15. Migration approach is solid

The migration uses the standard SQLite pattern: create new table, copy data, drop old, rename. ON DELETE SET NULL for spec_id is the right choice — deleting a spec shouldn't cascade-delete testbeds. The index recreation is correct.

16. Missing `cargo gen-types` verification

The openapi.json and bencher.ts changes look auto-generated. The ?spec query parameter appears in the OpenAPI spec for the testbed GET endpoint. Verify this was generated via cargo gen-types and not hand-edited.

Minor Nits

services/runner/PLAN.md has 161 lines changed — this is a plan doc, not production code, but the diff is large. Confirm it's intentional to include in this PR.
.claude/settings.json is modified — verify this shouldn't be in .gitignore.

Overall Assessment

The feature is well-implemented with good test coverage (9 unit tests for testbed-spec CRUD, extensive integration tests for perf). The spec-aware detector filtering in data.rs is a key correctness improvement — it ensures threshold detection only considers metrics from the same hardware spec. The custom deserializer for PATCH null semantics is correct and well-structured.

The main areas for improvement are the duplicated job → spec_id lookup pattern and the N+1 query in testbed listing. Neither is a blocker.

_{Model: claude-opus-4-6}

github-actions · 2026-02-21T07:31:12Z

Bencher Report

Branch	u/ep/testbed-spec
Testbed	ubuntu-22.04

Click to view all benchmark results

Benchmark	Latency	Benchmark Result microseconds (µs) (Result Δ%)	Upper Boundary microseconds (µs) (Limit %)
Adapter::Json	📈 view plot 🚷 view threshold	3.83 µs (+9.10%) Baseline: 3.51 µs	4.68 µs (81.85%)
Adapter::Magic (JSON)	📈 view plot 🚷 view threshold	3.78 µs (+7.83%) Baseline: 3.51 µs	4.61 µs (82.08%)
Adapter::Magic (Rust)	📈 view plot 🚷 view threshold	25.42 µs (-1.17%) Baseline: 25.72 µs	29.48 µs (86.22%)
Adapter::Rust	📈 view plot 🚷 view threshold	2.90 µs (+3.67%) Baseline: 2.80 µs	3.16 µs (91.84%)
Adapter::RustBench	📈 view plot 🚷 view threshold	2.91 µs (+3.80%) Baseline: 2.81 µs	3.17 µs (91.87%)
head_version_insert/batch/10	📈 view plot 🚷 view threshold	95.14 µs (+4.40%) Baseline: 91.13 µs	112.15 µs (84.83%)
head_version_insert/batch/100	📈 view plot 🚷 view threshold	231.60 µs (+1.58%) Baseline: 228.01 µs	247.86 µs (93.44%)
head_version_insert/batch/255	📈 view plot 🚷 view threshold	455.48 µs (+0.73%) Baseline: 452.16 µs	484.09 µs (94.09%)
head_version_insert/batch/50	📈 view plot 🚷 view threshold	154.32 µs (+1.89%) Baseline: 151.46 µs	172.57 µs (89.42%)
threshold_query/join/10	📈 view plot 🚷 view threshold	138.48 µs (+2.33%) Baseline: 135.33 µs	156.95 µs (88.23%)
threshold_query/join/20	📈 view plot 🚷 view threshold	151.65 µs (+1.11%) Baseline: 149.98 µs	170.52 µs (88.93%)
threshold_query/join/5	📈 view plot 🚷 view threshold	130.73 µs (+2.00%) Baseline: 128.17 µs	149.54 µs (87.42%)
threshold_query/join/50	📈 view plot 🚷 view threshold	194.64 µs (+1.86%) Baseline: 191.08 µs	213.01 µs (91.38%)

🐰 View full continuous benchmarking report in Bencher

epompeii added 8 commits February 21, 2026 04:23

spec_plan

8d5d3e7

step_0

3e53c18

testbed_spec

3799bbc

testbed_spec_cli

6fcd2bb

comment_query_params

0b34a76

testbed_spec_ui

7228bf5

testbed_spec_docs

e5cff47

testbed_spec_card

994876c

epompeii self-assigned this Feb 21, 2026

epompeii added claude deploy labels Feb 21, 2026

epompeii added 7 commits February 22, 2026 01:10

claude_1

42dd967

perf_query_spec

c6467d8

one_metric_spec

d281810

agent_teams

88708bb

detector_spec

ab99d57

claude_2

0faddbe

perf_tests

5b3bfed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Add the ability for a Testbed to have a Spec#673

Add the ability for a Testbed to have a Spec#673
epompeii wants to merge 15 commits intodevelfrom
u/ep/testbed-spec

epompeii commented Feb 21, 2026

Uh oh!

github-actions bot commented Feb 21, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Feb 21, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

epompeii commented Feb 21, 2026

Uh oh!

github-actions bot commented Feb 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🤖 Claude Code Review

Pull Request Review: Testbed Spec Feature

Summary

Issues and Concerns

1. Bug: job.spec_id is NOT NULL in schema, but queried as Option<SpecId>

2. Duplicated spec_id lookup pattern — consider extracting a helper

3. metrics.rs uses inner_join while report/mod.rs uses direct filter

4. into_json_for_project signature change is a breaking API change within the crate

5. Potential N+1 query in testbed listing

6. QueryTestbed::into_json_for_spec — #[cfg(not(feature = "plus"))] let _ = conn;

7. InsertTestbed::localhost uses expect — justified but fragile

8. Custom Deserialize for JsonUpdateTestbed — correct but complex

9. unreachable!() in UpdateTestbed::from for non-plus Null variant

10. Frontend: fmtBytes uses base-2 units (GiB/MiB/KiB)

11. Frontend: specs not cleared on plot load

12. CLI: update.rs parameter rename create → update

13. CLI: dimension.rs adds self.log_success(action) calls

14. size_specs_to_testbeds could be simplified

15. Migration approach is solid

16. Missing cargo gen-types verification

Minor Nits

Overall Assessment

Uh oh!

github-actions bot commented Feb 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Bencher Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

github-actions bot commented Feb 21, 2026 •

edited

Loading

1. Bug: `job.spec_id` is `NOT NULL` in schema, but queried as `Option<SpecId>`

3. `metrics.rs` uses `inner_join` while `report/mod.rs` uses direct filter

4. `into_json_for_project` signature change is a breaking API change within the crate

6. `QueryTestbed::into_json_for_spec` — `#[cfg(not(feature = "plus"))] let _ = conn;`

7. `InsertTestbed::localhost` uses `expect` — justified but fragile

8. Custom `Deserialize` for `JsonUpdateTestbed` — correct but complex

9. `unreachable!()` in `UpdateTestbed::from` for non-plus `Null` variant

10. Frontend: `fmtBytes` uses base-2 units (GiB/MiB/KiB)

11. Frontend: `specs` not cleared on plot load

12. CLI: `update.rs` parameter rename `create` → `update`

13. CLI: `dimension.rs` adds `self.log_success(action)` calls

14. `size_specs_to_testbeds` could be simplified

16. Missing `cargo gen-types` verification

github-actions bot commented Feb 21, 2026 •

edited

Loading