Skip to content

Comments

Add the ability for a Testbed to have a Spec#673

Open
epompeii wants to merge 15 commits intodevelfrom
u/ep/testbed-spec
Open

Add the ability for a Testbed to have a Spec#673
epompeii wants to merge 15 commits intodevelfrom
u/ep/testbed-spec

Conversation

@epompeii
Copy link
Member

This changeset adds the ability for a Testbed to have an associated Spec.

@github-actions
Copy link

github-actions bot commented Feb 21, 2026

🤖 Claude Code Review

PR: #673
Base: devel
Head: u/ep/testbed-spec
Commit: 5b3bfeda7500d760344d5eb35ffdae906d05338a


I now have a thorough understanding of the full diff. Let me write the review.


Pull Request Review: Testbed Spec Feature

Summary

This PR adds a "spec" (hardware specification) field to testbeds, allowing users to associate a hardware spec with a testbed and use specs to filter performance data. The changes span the full stack: database migration, Rust API/schema, CLI, and frontend console.


Issues and Concerns

1. Bug: job.spec_id is NOT NULL in schema, but queried as Option<SpecId>

In lib/bencher_schema/src/schema.rs:80, job.spec_id is -> Integer (not nullable). However, the queries in report/mod.rs:170-175, report/mod.rs:235-240, and metrics.rs:212-218 all do:

let spec_id: Option<SpecId> = schema::job::table
    .filter(schema::job::report_id.eq(...))
    .select(schema::job::spec_id)
    .first::<SpecId>(conn)
    .optional()  // <-- this makes the whole query optional (row not found)
    ...

The .optional() here only covers "no row found" (i.e., no job for this report), which is correct since not all reports come from jobs. But first::<SpecId> will deserialize the spec_id integer directly, not as Option<SpecId>. If a report does have a job, the spec ID will always be present since the column is non-nullable.

This means: for reports without a job, spec_id is None (correct). For reports with a job, spec_id is always Some(...) (correct). This is fine, but it's worth documenting the intent -- the Option wraps the row existence, not the column nullability.

2. Duplicated spec_id lookup pattern — consider extracting a helper

The same job → spec_id lookup query is copy-pasted in three places:

  • lib/bencher_schema/src/model/project/report/mod.rs:170-175 (report creation)
  • lib/bencher_schema/src/model/project/report/mod.rs:235-240 (report JSON serialization)
  • lib/api_projects/src/metrics.rs:212-218 (metric query)

This is a good candidate for a small helper function, e.g., QueryJob::spec_id_for_report(conn, report_id).

3. metrics.rs uses inner_join while report/mod.rs uses direct filter

In metrics.rs:213, the query uses .inner_join(schema::report::table) and filters by report::uuid. In report/mod.rs:171 and :236, the query filters by job::report_id directly. The inner_join variant in metrics.rs is correct but subtly different — if job.report_id references a report, both should work, but the inconsistency could mask bugs. Consider standardizing.

4. into_json_for_project signature change is a breaking API change within the crate

QueryTestbed::into_json_for_project changed from self → JsonTestbed to self → Result<JsonTestbed, HttpError>. This required updates in every caller. The migration is complete, but the #[expect(clippy::too_many_lines)] on QueryReport::create (report/mod.rs:75) suggests this function is getting unwieldy. The reason given ("spec_id binding extraction pushes over limit") is honest but worth watching.

5. Potential N+1 query in testbed listing

In lib/api_projects/src/testbeds.rs, the get_ls_inner function now iterates over testbeds and calls into_json_for_project for each, which resolves the spec via a separate query per testbed. For listings with many testbeds, this creates an N+1 query pattern. Consider joining the spec in the initial query.

6. QueryTestbed::into_json_for_spec#[cfg(not(feature = "plus"))] let _ = conn;

At testbed.rs:175, when the plus feature is off, conn is explicitly silenced with let _ = conn. This works but is a code smell. An alternative is to use #[cfg_attr(not(feature = "plus"), expect(unused_variables))] on the parameter, though the #[cfg(...)] parameter attribute pattern used elsewhere in the PR is consistent.

7. InsertTestbed::localhost uses expect — justified but fragile

At testbed.rs:260-265:

#[expect(clippy::expect_used, reason = "localhost has no spec, so from_json cannot fail")]
pub fn localhost(conn: &mut DbConnection, project_id: ProjectId) -> Self {
    Self::from_json(conn, project_id, JsonNewTestbed::localhost())
        .expect("Failed to create localhost testbed")
}

The expect is justified by the comment — localhost() sets spec: None, so no DB lookup occurs. However, if from_json gains new fallible paths in the future, this will silently panic. The lint suppression reason is good documentation.

8. Custom Deserialize for JsonUpdateTestbed — correct but complex

The manual Deserialize impl in lib/bencher_json/src/project/testbed.rs (lines ~112-245) handles the null vs absent distinction for the spec field. This is a well-known pattern for PATCH semantics. The implementation looks correct:

  • spec absent → Patch with spec: None (don't change)
  • spec: "some-uuid"Patch with spec: Some(resource_id) (set)
  • spec: nullNull variant (clear)

One concern: the #[cfg(not(feature = "plus"))] path always returns Patch and ignores spec, which means non-plus builds silently accept and discard spec in the request body. This is probably intentional for forward compatibility.

9. unreachable!() in UpdateTestbed::from for non-plus Null variant

At testbed.rs:303-307:

#[cfg(not(feature = "plus"))]
#[expect(clippy::unreachable, reason = "Null variant is only constructed with the plus feature")]
JsonUpdateTestbed::Null(_) => unreachable!(),

This is correct given the deserializer's behavior, but unreachable!() is a panic. Since the Null variant is never constructed without plus, this is truly unreachable. The #[expect] with reason is appropriate.

10. Frontend: fmtBytes uses base-2 units (GiB/MiB/KiB)

The fmtBytes function in ViewCard.tsx uses binary thresholds (1024) with binary suffixes (GiB, MiB, KiB). This is correct — the memory/disk values from the spec appear to be in bytes. Just confirm the API actually returns bytes (e.g., memory: 0x0002_0000_0000 = 8 GiB).

11. Frontend: specs not cleared on plot load

In PerfPanel.tsx:1059-1066, when loading a plot, HEADS_PARAM is set to null but SPECS_PARAM is also set to null. This is correct. Similarly, on clear all (PerfPanel.tsx:1109), SPECS_PARAM is nulled. The handling looks consistent.

12. CLI: update.rs parameter rename createupdate

In services/cli/src/bencher/sub/project/branch/update.rs:26 and services/cli/src/bencher/sub/project/testbed/update.rs:30, the parameter name was corrected from create to update. Good cleanup.

13. CLI: dimension.rs adds self.log_success(action) calls

Three new self.log_success(action) calls were added in services/cli/src/bencher/sub/project/archive/dimension.rs. These appear to have been missing before — verify this doesn't duplicate logging if log_success was already called elsewhere in the call chain.

14. size_specs_to_testbeds could be simplified

The function at perf.rs:210-221 uses fold with push when a simpler map would suffice:

fn size_specs_to_testbeds(testbeds: &[TestbedUuid], specs: &[Option<SpecUuid>]) -> Vec<Option<SpecUuid>> {
    (0..testbeds.len())
        .map(|i| specs.get(i).copied().flatten())
        .collect()
}

15. Migration approach is solid

The migration uses the standard SQLite pattern: create new table, copy data, drop old, rename. ON DELETE SET NULL for spec_id is the right choice — deleting a spec shouldn't cascade-delete testbeds. The index recreation is correct.

16. Missing cargo gen-types verification

The openapi.json and bencher.ts changes look auto-generated. The ?spec query parameter appears in the OpenAPI spec for the testbed GET endpoint. Verify this was generated via cargo gen-types and not hand-edited.


Minor Nits

  • services/runner/PLAN.md has 161 lines changed — this is a plan doc, not production code, but the diff is large. Confirm it's intentional to include in this PR.
  • .claude/settings.json is modified — verify this shouldn't be in .gitignore.

Overall Assessment

The feature is well-implemented with good test coverage (9 unit tests for testbed-spec CRUD, extensive integration tests for perf). The spec-aware detector filtering in data.rs is a key correctness improvement — it ensures threshold detection only considers metrics from the same hardware spec. The custom deserializer for PATCH null semantics is correct and well-structured.

The main areas for improvement are the duplicated job → spec_id lookup pattern and the N+1 query in testbed listing. Neither is a blocker.


Model: claude-opus-4-6

@github-actions
Copy link

github-actions bot commented Feb 21, 2026

🐰 Bencher Report

Branchu/ep/testbed-spec
Testbedubuntu-22.04
Click to view all benchmark results
BenchmarkLatencyBenchmark Result
microseconds (µs)
(Result Δ%)
Upper Boundary
microseconds (µs)
(Limit %)
Adapter::Json📈 view plot
🚷 view threshold
3.83 µs
(+9.10%)Baseline: 3.51 µs
4.68 µs
(81.85%)
Adapter::Magic (JSON)📈 view plot
🚷 view threshold
3.78 µs
(+7.83%)Baseline: 3.51 µs
4.61 µs
(82.08%)
Adapter::Magic (Rust)📈 view plot
🚷 view threshold
25.42 µs
(-1.17%)Baseline: 25.72 µs
29.48 µs
(86.22%)
Adapter::Rust📈 view plot
🚷 view threshold
2.90 µs
(+3.67%)Baseline: 2.80 µs
3.16 µs
(91.84%)
Adapter::RustBench📈 view plot
🚷 view threshold
2.91 µs
(+3.80%)Baseline: 2.81 µs
3.17 µs
(91.87%)
head_version_insert/batch/10📈 view plot
🚷 view threshold
95.14 µs
(+4.40%)Baseline: 91.13 µs
112.15 µs
(84.83%)
head_version_insert/batch/100📈 view plot
🚷 view threshold
231.60 µs
(+1.58%)Baseline: 228.01 µs
247.86 µs
(93.44%)
head_version_insert/batch/255📈 view plot
🚷 view threshold
455.48 µs
(+0.73%)Baseline: 452.16 µs
484.09 µs
(94.09%)
head_version_insert/batch/50📈 view plot
🚷 view threshold
154.32 µs
(+1.89%)Baseline: 151.46 µs
172.57 µs
(89.42%)
threshold_query/join/10📈 view plot
🚷 view threshold
138.48 µs
(+2.33%)Baseline: 135.33 µs
156.95 µs
(88.23%)
threshold_query/join/20📈 view plot
🚷 view threshold
151.65 µs
(+1.11%)Baseline: 149.98 µs
170.52 µs
(88.93%)
threshold_query/join/5📈 view plot
🚷 view threshold
130.73 µs
(+2.00%)Baseline: 128.17 µs
149.54 µs
(87.42%)
threshold_query/join/50📈 view plot
🚷 view threshold
194.64 µs
(+1.86%)Baseline: 191.08 µs
213.01 µs
(91.38%)
🐰 View full continuous benchmarking report in Bencher

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant