Skip to content

Add scope-based session management for long-running processes and interactive commands#76

Open
hawkeyexl wants to merge 16 commits intomainfrom
scope
Open

Add scope-based session management for long-running processes and interactive commands#76
hawkeyexl wants to merge 16 commits intomainfrom
scope

Conversation

@hawkeyexl
Copy link
Contributor

@hawkeyexl hawkeyexl commented Jan 5, 2026

Summary by CodeRabbit

  • New Features
    • Named scopes for managing parallel and long‑running shell/code sessions across steps
    • waitUntil conditions to pause steps until stdout/stderr or DOM/network conditions are met
    • terminateScope command to gracefully stop named processes
    • Type/run actions can target named scopes and send input/output to scoped processes
  • Tests
    • New test suites and artifacts covering scope lifecycle, waitUntil, typing to scopes, and edge cases
  • Bug Fixes
    • Safer optional property access to avoid runtime errors in app detection

✏️ Tip: You can customize this high-level summary in your review settings.

hawkeyexl and others added 15 commits December 13, 2025 08:35
…modules

Co-authored-by: hawkeyexl <5209367+hawkeyexl@users.noreply.github.com>
… action

Co-authored-by: hawkeyexl <5209367+hawkeyexl@users.noreply.github.com>
…or scopes

Co-authored-by: hawkeyexl <5209367+hawkeyexl@users.noreply.github.com>
Co-authored-by: hawkeyexl <5209367+hawkeyexl@users.noreply.github.com>
Co-authored-by: hawkeyexl <5209367+hawkeyexl@users.noreply.github.com>
…quences, use consistent logging

Co-authored-by: hawkeyexl <5209367+hawkeyexl@users.noreply.github.com>
- Simplify the exitWithCleanup function in cleanup.js to improve error handling and ensure proper exit behavior.
- Enhance the waitForConditions function in waitUntil.js to check conditions against accumulated buffer data for stdout and stderr.
- Expand the specialKeyTerminalMap in typeKeys.js to include additional control, navigation, and function keys.
- Add multiple new test JSON files to cover various scenarios including timeout, failure handling, and regression testing.
- Create a regression output file to store expected output for comparison in tests.
- Replace hardcoded absolute paths with relative paths in test files
- Update regex flags documentation to include 'd' flag in waitUntil.js
- Document potential race condition in waitForConditions()
…sion-management

Add scope-based session management for long-running processes and interactive commands
@coderabbitai
Copy link

coderabbitai bot commented Jan 5, 2026

Walkthrough

Adds a named scope system for terminal/code execution with schema support for scope and waitUntil, PTY-backed terminal scopes, an in-memory registry, wait utilities, cleanup handlers, test integrations (type/runShell/runCode), a terminateScope step, and tests; also adds node-pty dependency.

Changes

Cohort / File(s) Summary
Schema changes (outputs & sources)
common/src/schemas/output_schemas/*.schema.json, common/src/schemas/src_schemas/*.schema.json, common/src/schemas/dereferenceSchemas.js
Added optional scope (string, pattern ^[a-zA-Z0-9_-]{1,64}$) across many step/action schemas and waitUntil objects (stdio/stdout/stderr), added terminateScope_v3.schema.json, and updated dereference list and local ref variable scoping.
New output/source terminate schema
common/src/schemas/output_schemas/terminateScope_v3.schema.json, common/src/schemas/src_schemas/terminateScope_v3.schema.json
New schema files defining terminateScope as a validated scope identifier (1–64 chars, alnum/underscore/hyphen) with examples and description.
Scope registry & utilities
core/src/scopes/registry.js, core/src/scopes/waitUntil.js
New in-memory scope registry (register/get/has/remove/getAll/getScopesByTestId/validate), stdout/stderr buffering (max 1000 lines), and wait-for-conditions utilities (string/regex parsing, matchesCondition, waitForConditions).
Terminal scope implementation
core/src/scopes/terminal.js
PTY-backed terminal scopes: createTerminalScope (spawn shell via node-pty), terminateTerminalScope (SIGTERM→wait→SIGKILL), writeToTerminal (escape processing).
Code scope scaffold
core/src/scopes/code.js
Placeholder APIs for code scopes: createCodeScope, terminateCodeScope, writeToCode (currently throw "not implemented").
Cleanup handlers
core/src/scopes/cleanup.js
cleanupTestScopes, cleanupAllScopes, and setupCleanupHandlers to teardown scopes LIFO, install process handlers, and do best-effort cleanup on signals/exits.
Scope aggregator export
core/src/scopes/index.js
Central export re-exporting registry, terminal, code, waitUntil, and cleanup APIs.
Test-run integration
core/src/tests.js, core/src/tests/runShell.js, core/src/tests/runCode.js
Added ScopeRegistry class and wiring: initialize per-run, pass scopeRegistry through steps, call setupCleanupHandlers, call cleanupTestScopes after tests; runShell signature extended (test, scopeRegistry params); runCode propagates scope/waitUntil/workingDirectory/timeout to shellStep.
Step handlers
core/src/tests/typeKeys.js, core/src/tests/terminateScope.js
typeKeys gained optional scopeRegistry param and a scope-aware branch to write keystrokes into scopes (terminal key token mapping). New terminateScope step validates and terminates named scopes, removes them from registry and returns standardized result.
Tests & artifacts
core/test/*.json, core/test/artifacts/*.spec.json, core/test/core.test.js
New test artifacts and fixtures validating scope lifecycle, waitUntil (stdout string and regex), type-to-scope input, termination of non-existent scopes, and broad scope-related test coverage; multiple temporary test fixtures added.
Dependency & minor config
core/package.json, core/src/config.js
Added node-pty dependency; minor optional chaining for browser version reads.

Sequence Diagram(s)

sequenceDiagram
    participant Test as Test Executor
    participant Reg as Scope Registry
    participant PTY as PTY/Shell
    participant Wait as WaitUntil
    participant Cleanup as Cleanup Handler

    Test->>Reg: registerScope("dev-server", meta)
    Reg-->>Test: scope object

    Test->>PTY: createTerminalScope("dev-server","npm run dev")
    PTY-->>Reg: stdout/stderr events buffered
    Reg-->>Reg: store buffers (max 1000 lines)

    Test->>Wait: waitForConditions({scope:"dev-server", stdout:"Server listening"})
    Wait->>Reg: check buffer
    Reg-->>Wait: matched? (initial)
    Wait->>PTY: attach listener for new data
    PTY->>Wait: "Server listening on port 3000"
    Wait-->>Test: resolved (match found)

    Test->>Cleanup: terminateTerminalScope("dev-server")
    Cleanup->>PTY: send SIGTERM (wait 5s → SIGKILL if needed)
    PTY-->>Reg: process exit event
    Cleanup->>Reg: removeScope("dev-server")
    Cleanup-->>Test: scope terminated
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Poem

🐰 In burrows of bytes the scopes take flight,
PTYs hum softly through day and night,
Waits that listen, registries that keep,
Terminate whispers tuck processes to sleep.
A rabbit applauds the tests running light 🥕

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title accurately summarizes the main change: adding scope-based session management for long-running processes and interactive commands, which is reflected throughout the changeset in new schema definitions, registry implementations, and integration into test execution.
Docstring Coverage ✅ Passed Docstring coverage is 89.66% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@hawkeyexl hawkeyexl changed the title Scope Add scope-based session management for long-running processes and interactive commands Jan 5, 2026
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 10

Fix all issues with AI Agents 🤖
In @common/src/schemas/output_schemas/report_v3.schema.json:
- Around line 4968-4992: The schema-added `scope` and `waitUntil` fields must be
implemented in the runShell handler: update the `runShell` function in
core/src/tests/runShell.js to accept and forward `scope` to the terminal
management API (use the existing terminal/terminal.js scope creation/selection
functions) so commands run in the named terminal scope, and integrate the
existing waitUntil module (core/src/scopes/waitUntil.js) to evaluate
`waitUntil.stdio.stdout` and `waitUntil.stdio.stderr` using full
`/pattern/flags` parsing with a 30s timeout; ensure waiting does not terminate
the running process when conditions are met (process continues), and reuse
terminal.js termination behavior (SIGTERM, 5s grace, then SIGKILL) for
bail/timeout cleanup. If you prefer not to support these features, remove
`scope` and `waitUntil` from the runShell schema instead.

In @core/package.json:
- Line 45: Update the "node-pty" dependency in core/package.json from "0.10.1"
to "1.1.0" (change the "node-pty" version string), then run the package manager
to refresh lockfiles (e.g., npm install or yarn install) and run tests/build to
catch any compatibility changes; ensure any CI or docker build steps that rely
on lockfiles are also updated so the new version is used.

In @core/src/scopes/index.js:
- Around line 1-17: The index exports currently re-exports a code-scopes module
that only exposes non-functional stubs (createCodeScope, terminateCodeScope,
writeToCode) which can mislead consumers; either remove the spread of './code'
from the public exports, or wrap it behind a clear feature flag/conditional so
those functions are not exported unless the feature is enabled, and update any
docs accordingly; specifically edit the module.exports block to stop
re-exporting './code' (or gate the require('./code') and the spread operator
with a boolean like ENABLE_CODE_SCOPES) so createCodeScope, terminateCodeScope
and writeToCode are not available by default.

In @core/src/scopes/registry.js:
- Line 68: The code splits incoming process data with
data.toString().split('\n') which leaves CR characters on Windows; update both
the stdout and stderr listeners in core/src/scopes/registry.js to split on a
cross-platform regex like data.toString().split(/\r?\n/) (or split on
/\r\n|\n|\r/), and optionally trim or filter out empty trailing entries so
buffered lines don't contain '\r' characters.
- Around line 132-134: getAllScopes currently returns the internal mutable Map
"scopes" directly which allows external mutation; change getAllScopes to return
a defensive copy (e.g., construct and return a new Map from "scopes" or return
an immutable snapshot like Array.from(scopes.entries())) so callers cannot
modify the registry, and update any JSDoc/comments to reflect the returned copy.
- Around line 54-58: registerScope currently allows silent overwrites; before
inserting the new scope, check the existing scopes map/registry for name (the
same map used by registerScope) and if an entry exists either throw a
descriptive Error (e.g., "Scope already exists") or require an explicit config
flag (e.g., config.force or config.overwrite) to permit replacement; if allowing
overwrite, first invoke the existing scope's cleanup/shutdown routines (e.g.,
stop/close/cleanup methods on the stored scope) to release processes/streams,
then replace with scopeData—ensure the check references the same registry key
used by registerScope to avoid races.

In @core/src/tests/runCode.js:
- Around line 112-122: The code redundantly reassigns scope and attempts to
delete a non-existent property: remove the conditional block that sets
shellStep.runShell.scope when step.runCode.scope is already assigned (the lines
referencing shellStep.runShell.scope and the surrounding if (step.runCode.scope)
check) and drop the no-op delete shellStep.runCode; statement, leaving the
object literal that sets scope, waitUntil, workingDirectory, and timeout under
shellStep.runShell intact.

In @core/test/core.test.js:
- Around line 1169-1197: The test "should handle empty scope name gracefully" is
using an empty string which fails the schema pattern `^[a-zA-Z0-9_-]{1,64}$`;
update the test to assert the expected validation failure instead of success:
call runTests(config) and expect it to throw or return a validation error/result
indicating schema rejection, and assert that the error message or validation
result references the scope field; alternatively, if the intent is to allow
empty scopes, change the fixture to use a valid non-empty scope and keep the
existing success assertions for runTests.
- Around line 1199-1228: The test "should handle special characters in scope
name" uses an invalid scope string "my-scope_123.test" that violates the schema
pattern; update the test data in specialCharScopeTest so the scope uses only
allowed characters (e.g., replace the '.' with '-' or remove it like
"my-scope_123-test" or "my-scope_123test") so runTests can exercise
special-character handling rather than fail schema validation, or alternatively
change assertions to expect a validation error from runTests if you intend to
test validation behavior.
🧹 Nitpick comments (18)
core/test/temp-unsafe-test.json (1)

4-9: Consider adding scope features to align with PR objectives.

This test doesn't utilize any of the scope management features introduced in this PR (e.g., scope, waitUntil, terminateScope). While the test appears valid for testing the unsafe flag, consider whether it would be beneficial to include scope-related test coverage here or create a separate test file that demonstrates the interaction between scope features and unsafe steps.

core/src/config.js (1)

351-351: Optional chaining is redundant inside this conditional block.

The optional chaining on firefox?.buildId is unnecessary here since the containing if statement at line 348 already ensures firefox is truthy. Using firefox.buildId would be sufficient.

🔎 Suggested simplification
-      version: firefox?.buildId,
+      version: firefox.buildId,
common/src/schemas/output_schemas/runCode_v3.schema.json (1)

31-50: Consider making the waitUntil timeout configurable.

The description mentions a hardcoded 30-second timeout for waitUntil conditions. While this may be reasonable for many use cases, long-running processes or slow environments might need more flexibility. Consider adding a configurable timeout parameter to the waitUntil object.

🔎 Proposed enhancement to add configurable timeout
 "waitUntil": {
   "type": "object",
-  "description": "Conditions to wait for in the code's output before considering the step complete. When specified, the process continues running after conditions are met. Times out after 30 seconds.",
+  "description": "Conditions to wait for in the code's output before considering the step complete. When specified, the process continues running after conditions are met.",
   "properties": {
+    "timeout": {
+      "type": "integer",
+      "description": "Max time in milliseconds to wait for conditions to be met.",
+      "default": 30000,
+      "minimum": 1000
+    },
     "stdio": {
       "type": "object",
       "description": "Standard I/O conditions to wait for.",

Also applies to: 147-166

core/src/scopes/code.js (1)

6-8: Consider moving exports after function definitions for consistency.

While placing exports before function definitions works in Node.js, the conventional pattern (as seen in terminal.js and other scope modules) is to export at the end of the file. This improves code readability and aligns with common JavaScript conventions.

🔎 Suggested placement

Move these lines to the end of the file, after all function definitions.

-exports.createCodeScope = createCodeScope;
-exports.terminateCodeScope = terminateCodeScope;
-exports.writeToCode = writeToCode;
-
 /**
  * Creates a new code scope (REPL).
  ...
 async function writeToCode(name, code, config) {
   throw new Error('Code scopes are not yet implemented.');
 }
+
+exports.createCodeScope = createCodeScope;
+exports.terminateCodeScope = terminateCodeScope;
+exports.writeToCode = writeToCode;
core/test/temp-initial-screenshot-test.json (1)

1-18: Consider .gitignore for temporary test files.

Files with the temp- prefix are typically temporary test artifacts. Consider:

  • Adding these to .gitignore if they're generated during development
  • Removing the temp- prefix if they're permanent test fixtures
  • Documenting why temporary files are committed if that's intentional
core/test/temp-all-contexts-skipped.json (1)

1-24: Test structure looks appropriate for context-skipping validation.

The test correctly defines a specific context (Firefox on Windows) with a command that shouldn't execute, which aligns with testing context-based skipping behavior.

However, consider whether the temp- prefix is appropriate. If this is a permanent test fixture, remove the prefix; if temporary, add to .gitignore.

common/src/schemas/src_schemas/terminateScope_v3.schema.json (1)

1-12: Well-defined schema with clear validation rules.

The schema properly defines the terminateScope string format with appropriate pattern validation and helpful examples.

Minor consideration: The description hardcodes "waits up to 5 seconds" before sending SIGKILL. If this timeout is configurable or might change in the implementation, consider making the description more generic (e.g., "waits briefly before sending SIGKILL") to avoid documentation drift.

Additionally, the pattern ^[a-zA-Z0-9_-]{1,64}$ is quite restrictive. Consider whether scope names might benefit from supporting additional characters like dots (.) for namespace-style scopes (e.g., test.unit.server) if that aligns with your scope naming conventions.

common/src/schemas/output_schemas/runShell_v3.schema.json (1)

30-49: Consider exposing waitUntil timeout as a schema property.

The description mentions a hardcoded 30-second timeout, which matches the default in waitUntil.js:90. However, this timeout is not configurable via the schema, while the command itself has a configurable timeout property (line 110-114).

Consider adding a timeout property to the waitUntil object to allow users to customize wait durations for different scenarios.

core/src/tests/typeKeys.js (2)

6-7: Remove unused import hasScope.

hasScope is imported but never used in this file. The code uses scopeRegistry.has() instead.

🔎 Proposed fix
-const { hasScope, getScope } = require("../scopes/registry");
+const { getScope } = require("../scopes/registry");

129-137: Simplify redundant key processing branches.

Lines 132-136 both append key unchanged to inputData. The conditional can be simplified.

🔎 Proposed fix
     let inputData = "";
     for (const key of step.type.keys) {
       if (terminalSpecialKeyMap[key]) {
         inputData += terminalSpecialKeyMap[key];
-      } else if (key.startsWith("$") && key.endsWith("$")) {
-        inputData += key;
       } else {
         inputData += key;
       }
     }
core/src/tests.js (1)

33-104: LGTM with minor suggestion.

The ScopeRegistry class is well-structured for managing process-based scopes. The cleanup method correctly iterates and terminates processes.

Consider logging silently swallowed errors in the cleanup method (line 98-99) at debug level for troubleshooting:

🔎 Proposed fix
       if (scope.process && !scope.closed) {
         try {
           scope.process.kill("SIGTERM");
         } catch (err) {
+          // Ignore - process may already be terminated
         }
       }
core/src/scopes/terminal.js (1)

69-73: Consider updating scope state on process exit.

The onExit handler logs the exit but doesn't update the scope's state (e.g., marking it as closed). This could cause subsequent operations to attempt writing to a terminated PTY.

🔎 Proposed fix
     // Handle process exit
     ptyProcess.onExit(({ exitCode, signal }) => {
       log(config, "debug", `Terminal scope '${name}' exited with code ${exitCode}, signal ${signal}`);
+      const existingScope = getScope(name);
+      if (existingScope) {
+        existingScope.exitCode = exitCode;
+        existingScope.closed = true;
+      }
     });
common/src/schemas/output_schemas/config_v3.schema.json (1)

4540-4559: Consider adding validation constraints to waitUntil.stdio.

The stdio object accepts both stdout and stderr as optional, which means an empty object { stdio: {} } would be valid but meaningless. Additionally, without additionalProperties: false, typos like stout instead of stdout would be silently accepted.

🔎 Proposed enhancement
                                                      "waitUntil": {
                                                        "type": "object",
                                                        "description": "Conditions to wait for in the command's output before considering the step complete. When specified, the process continues running after conditions are met. Times out after 30 seconds.",
                                                        "properties": {
                                                          "stdio": {
                                                            "type": "object",
                                                            "description": "Standard I/O conditions to wait for.",
+                                                           "additionalProperties": false,
+                                                           "anyOf": [
+                                                             { "required": ["stdout"] },
+                                                             { "required": ["stderr"] }
+                                                           ],
                                                            "properties": {
                                                              "stdout": {
                                                                "type": "string",
                                                                "description": "String or regex pattern to wait for in stdout. Use `/pattern/flags` format for regex."
                                                              },
                                                              "stderr": {
                                                                "type": "string",
                                                                "description": "String or regex pattern to wait for in stderr. Use `/pattern/flags` format for regex."
                                                              }
                                                            }
                                                          }
-                                                        }
+                                                        },
+                                                        "additionalProperties": false
                                                      },
common/src/schemas/output_schemas/spec_v3.schema.json (1)

4955-4974: Consider making the waitUntil timeout configurable.

The timeout is currently hardcoded to 30 seconds in the description. Different processes have varying startup times, and allowing users to configure this timeout would improve flexibility for both quick and slow-starting services.

💡 Suggested schema enhancement

Add an optional timeout property to the waitUntil object:

 "waitUntil": {
   "type": "object",
-  "description": "Conditions to wait for in the command's output before considering the step complete. When specified, the process continues running after conditions are met. Times out after 30 seconds.",
+  "description": "Conditions to wait for in the command's output before considering the step complete. When specified, the process continues running after conditions are met.",
   "properties": {
+    "timeout": {
+      "type": "number",
+      "description": "Maximum time in milliseconds to wait for conditions to be met. Defaults to 30000 (30 seconds).",
+      "default": 30000,
+      "minimum": 0
+    },
     "stdio": {
       "type": "object",
       "description": "Standard I/O conditions to wait for.",

Also applies to: 5070-5089, 5331-5350, 5447-5466, 12781-12800, 12896-12915, 13157-13176, 13273-13292

common/src/schemas/output_schemas/test_v3.schema.json (2)

4349-4373: Consider making the waitUntil timeout configurable.

The description mentions a hardcoded 30-second timeout, but some processes (e.g., database containers, heavy build tools) may take longer to start. Consider adding an optional timeout property to allow users to customize this value.

Also, the waitUntil and stdio objects don't specify additionalProperties: false, which means typos like stdOut instead of stdout would be silently ignored rather than causing validation errors.

🔎 Suggested schema additions
 "waitUntil": {
   "type": "object",
   "description": "Conditions to wait for in the command's output before considering the step complete. When specified, the process continues running after conditions are met. Times out after 30 seconds.",
   "properties": {
     "stdio": {
       "type": "object",
       "description": "Standard I/O conditions to wait for.",
       "properties": {
         "stdout": {
           "type": "string",
           "description": "String or regex pattern to wait for in stdout. Use `/pattern/flags` format for regex."
         },
         "stderr": {
           "type": "string",
           "description": "String or regex pattern to wait for in stderr. Use `/pattern/flags` format for regex."
         }
-      }
-    }
-  }
+      },
+      "additionalProperties": false
+    },
+    "timeout": {
+      "type": "number",
+      "description": "Timeout in milliseconds to wait for conditions. Defaults to 30000 (30 seconds).",
+      "default": 30000
+    }
+  },
+  "additionalProperties": false
 }

7749-7770: Nested $schema inside property definition may cause validation issues.

The terminateScope property definition (lines 7757-7758) contains an embedded $schema declaration. This is unusual for a property within an object and may cause issues with some JSON Schema validators or tooling that expect $schema only at the document root. This appears to have been copied from a standalone schema file.

Consider removing the nested $schema from the property definition:

🔎 Suggested fix
 "properties": {
   "terminateScope": {
-    "$schema": "http://json-schema.org/draft-07/schema#",
     "title": "terminateScope",
     "description": "Terminate a named scope. Sends SIGTERM to the process, waits up to 5 seconds, then sends SIGKILL if needed.",
     "type": "string",
     "pattern": "^[a-zA-Z0-9_-]{1,64}$",
     "examples": [
       "dev-server",
       "npm-init",
       "test-scope"
     ]
   }
 }
common/src/schemas/output_schemas/step_v3.schema.json (2)

1158-1166: Clarify scope usage and interaction with DOM-targeting fields in type/typeKeys schemas

The new scope field is consistently added to the detailed typeKeys/type object variants (both under find.type and the standalone type step), and the examples using "scope": "npm-init" look coherent.

However, with this shape it’s now valid to specify both:

  • a terminal scope (per the description, targeting a terminal scope), and
  • DOM-targeting fields (selector, elementText, elementId, etc.)

in the same object, and JSON Schema doesn’t express whether that combination is meaningful or should be rejected. If at runtime only one of these modes is supported, consider:

  • Splitting the detailed object into two mutually exclusive branches (e.g., an anyOf with one variant that requires scope and another that requires at least one DOM locator, each using not to forbid the other), or
  • At minimum, documenting which takes precedence when both are present, to avoid surprising behavior.

Also applies to: 1270-1274, 1362-1366, 1821-1825, 1933-1937, 2025-2029, 4491-4495, 4603-4607, 4695-4699


3750-3774: Tighten waitUntil.stdio validation for runShell/runCode scopes

The addition of scope and waitUntil to runShell/runCode (plus the examples with dev-server and npm-init) is a nice, consistent way to model long‑running scoped processes.

Schema-wise, waitUntil.stdio is currently quite loose:

  • waitUntil and stdio don’t have additionalProperties: false, so arbitrary keys are allowed.
  • There’s no requirement that at least one of stdout or stderr be present, so { "waitUntil": {} } or { "waitUntil": { "stdio": {} } } are valid but semantically ambiguous.

If you want schema validation to catch misconfigurations early, consider tightening this block, for example:

  • Add additionalProperties: false to both waitUntil and stdio.
  • Add an anyOf on stdio requiring at least one of stdout / stderr.
Example schema tweaks (illustrative only)
- "waitUntil": {
-   "type": "object",
-   "description": "...",
-   "properties": {
-     "stdio": {
-       "type": "object",
-       "description": "Standard I/O conditions to wait for.",
-       "properties": {
-         "stdout": { "type": "string", ... },
-         "stderr": { "type": "string", ... }
-       }
-     }
-   }
- }
+ "waitUntil": {
+   "type": "object",
+   "description": "...",
+   "additionalProperties": false,
+   "properties": {
+     "stdio": {
+       "type": "object",
+       "description": "Standard I/O conditions to wait for.",
+       "additionalProperties": false,
+       "anyOf": [
+         { "required": ["stdout"] },
+         { "required": ["stderr"] }
+       ],
+       "properties": {
+         "stdout": { "type": "string", ... },
+         "stderr": { "type": "string", ... }
+       }
+     }
+   }
+ }

This would bring these new blocks closer to the stricter patterns you already use elsewhere in the schema (e.g., goTo.waitUntil.find).

Also applies to: 3865-3889, 4009-4027, 4126-4150, 4242-4266

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6d58027 and e94adb6.

⛔ Files ignored due to path filters (12)
  • common/dist/schemas/config_v3.schema.json is excluded by !**/dist/**
  • common/dist/schemas/find_v3.schema.json is excluded by !**/dist/**
  • common/dist/schemas/report_v3.schema.json is excluded by !**/dist/**
  • common/dist/schemas/resolvedTests_v3.schema.json is excluded by !**/dist/**
  • common/dist/schemas/runCode_v3.schema.json is excluded by !**/dist/**
  • common/dist/schemas/runShell_v3.schema.json is excluded by !**/dist/**
  • common/dist/schemas/spec_v3.schema.json is excluded by !**/dist/**
  • common/dist/schemas/step_v3.schema.json is excluded by !**/dist/**
  • common/dist/schemas/terminateScope_v3.schema.json is excluded by !**/dist/**
  • common/dist/schemas/test_v3.schema.json is excluded by !**/dist/**
  • common/dist/schemas/type_v3.schema.json is excluded by !**/dist/**
  • core/package-lock.json is excluded by !**/package-lock.json
📒 Files selected for processing (42)
  • common/src/schemas/dereferenceSchemas.js
  • common/src/schemas/output_schemas/config_v3.schema.json
  • common/src/schemas/output_schemas/find_v3.schema.json
  • common/src/schemas/output_schemas/report_v3.schema.json
  • common/src/schemas/output_schemas/resolvedTests_v3.schema.json
  • common/src/schemas/output_schemas/runCode_v3.schema.json
  • common/src/schemas/output_schemas/runShell_v3.schema.json
  • common/src/schemas/output_schemas/spec_v3.schema.json
  • common/src/schemas/output_schemas/step_v3.schema.json
  • common/src/schemas/output_schemas/terminateScope_v3.schema.json
  • common/src/schemas/output_schemas/test_v3.schema.json
  • common/src/schemas/output_schemas/type_v3.schema.json
  • common/src/schemas/schemas.json
  • common/src/schemas/src_schemas/runCode_v3.schema.json
  • common/src/schemas/src_schemas/runShell_v3.schema.json
  • common/src/schemas/src_schemas/step_v3.schema.json
  • common/src/schemas/src_schemas/terminateScope_v3.schema.json
  • common/src/schemas/src_schemas/type_v3.schema.json
  • core/package.json
  • core/src/config.js
  • core/src/scopes/cleanup.js
  • core/src/scopes/code.js
  • core/src/scopes/index.js
  • core/src/scopes/registry.js
  • core/src/scopes/terminal.js
  • core/src/scopes/waitUntil.js
  • core/src/tests.js
  • core/src/tests/runCode.js
  • core/src/tests/terminateScope.js
  • core/src/tests/typeKeys.js
  • core/test/artifacts/scopes-lifecycle.spec.json
  • core/test/artifacts/scopes-type.spec.json
  • core/test/core.test.js
  • core/test/temp-all-contexts-skipped.json
  • core/test/temp-dom-timeout-test.json
  • core/test/temp-element-timeout-test.json
  • core/test/temp-failure-test.json
  • core/test/temp-initial-screenshot-test.json
  • core/test/temp-network-timeout-test.json
  • core/test/temp-regression-output.txt
  • core/test/temp-regression-test.json
  • core/test/temp-unsafe-test.json
🧰 Additional context used
🧬 Code graph analysis (6)
core/src/scopes/index.js (5)
core/src/scopes/waitUntil.js (1)
  • require (1-1)
core/src/scopes/cleanup.js (3)
  • require (1-1)
  • require (2-2)
  • require (3-3)
core/src/scopes/terminal.js (2)
  • require (3-3)
  • require (4-4)
core/src/scopes/code.js (1)
  • require (1-1)
core/src/scopes/registry.js (1)
  • require (1-1)
core/src/tests/terminateScope.js (2)
core/src/tests/typeKeys.js (3)
  • isValidStep (84-84)
  • scopeName (113-113)
  • scope (121-121)
core/src/tests/runShell.js (1)
  • isValidStep (28-28)
core/src/scopes/code.js (5)
core/src/scopes/waitUntil.js (1)
  • require (1-1)
core/src/scopes/cleanup.js (3)
  • require (1-1)
  • require (2-2)
  • require (3-3)
core/src/scopes/terminal.js (2)
  • require (3-3)
  • require (4-4)
core/src/scopes/registry.js (1)
  • require (1-1)
core/src/tests/runShell.js (1)
  • options (53-53)
core/src/scopes/registry.js (1)
core/src/scopes/terminal.js (6)
  • require (3-3)
  • require (4-4)
  • scopeData (54-64)
  • testId (23-23)
  • scope (90-90)
  • scope (162-162)
core/src/tests/typeKeys.js (1)
core/src/tests/terminateScope.js (3)
  • isValidStep (28-28)
  • scopeName (39-39)
  • scope (55-55)
core/test/core.test.js (4)
core/src/tests.js (3)
  • path (25-25)
  • config (443-443)
  • config (1099-1099)
core/src/tests/runCode.js (4)
  • path (5-5)
  • fs (4-4)
  • config (149-151)
  • result (40-44)
core/src/tests/runShell.js (3)
  • path (8-8)
  • fs (7-7)
  • result (15-25)
core/src/tests/typeKeys.js (1)
  • result (82-82)
🪛 Biome (2.1.2)
core/src/scopes/terminal.js

[error] 183-183: Unexpected control character in a regular expression.

Control characters are unusual and potentially incorrect inputs, so they are disallowed.

(lint/suspicious/noControlCharactersInRegex)

🔇 Additional comments (75)
core/test/temp-unsafe-test.json (1)

1-12: Verify if this temporary test file should be committed.

The filename prefix "temp" suggests this is a temporary test artifact. Please confirm whether this file should be part of the permanent test suite or if it was accidentally included in the commit.

If this is intended to be a permanent test fixture, consider:

  • Renaming it to remove the "temp" prefix (e.g., unsafe-step-test.json)
  • Adding test metadata for clarity:
{
  "tests": [
    {
      "id": "unsafe-step-test",
      "description": "Test to verify unsafe step handling",
      "steps": [
        {
          "runShell": "echo 'This step is unsafe'",
          "unsafe": true
        }
      ]
    }
  ]
}
core/src/config.js (1)

323-323: Good defensive practice with optional chaining.

Using optional chaining here is appropriate since chrome may be undefined when this line executes (before the conditional check at line 331).

core/test/temp-network-timeout-test.json (1)

1-17: Verify if temporary test files should be committed.

The filename prefix "temp-" suggests this may be a temporary test file for local debugging rather than a permanent test artifact. Please confirm whether these files are intended to be part of the permanent test suite or should be excluded from version control (e.g., via .gitignore).

core/test/temp-failure-test.json (1)

1-14: Verify if temporary test files should be committed.

Similar to temp-network-timeout-test.json, the "temp-" prefix suggests this is a temporary test file. Please confirm whether this file should be part of the permanent test suite or excluded from version control.

common/src/schemas/output_schemas/runCode_v3.schema.json (1)

26-30: LGTM: Scope naming pattern is well-defined.

The regex pattern for scope names is appropriate, limiting to alphanumeric characters, hyphens, and underscores with a reasonable 1-64 character length constraint. This prevents naming conflicts and ensures cross-platform compatibility.

core/src/scopes/code.js (1)

20-22: LGTM: Clear placeholder with helpful error message.

The unimplemented function correctly throws an error with a message that guides users to use terminal scopes instead. This is an appropriate approach for placeholder functionality that's explicitly documented as future work.

common/src/schemas/output_schemas/type_v3.schema.json (2)

48-52: LGTM! Scope property is well-defined and consistently applied.

The scope property is properly defined with appropriate validation (1-64 character alphanumeric with hyphens/underscores) and clear documentation. The definition is consistently applied in both the main schema and the components section, which maintains schema integrity.

Also applies to: 160-164


253-256: Good example demonstrating scope usage.

The example effectively illustrates how to use the scope property with a typeKeys action, showing a practical use case for typing into a named terminal scope.

core/test/temp-initial-screenshot-test.json (1)

12-12: No action required. The overwrite field is correctly set to the string "false". According to the screenshot schema (v3), the overwrite field is defined as type "string" with allowed enum values of "true", "false", and "aboveVariation". String values are the correct type for this field, not booleans.

core/test/temp-regression-output.txt (1)

1-1: This file is an intentional test fixture used for regression testing, not a temporary artifact. It contains the initial expected output for the runShell regression test, which verifies that output variation stays within acceptable thresholds. The file should remain committed to version control as part of the test suite.

Likely an incorrect or invalid review comment.

core/test/temp-element-timeout-test.json (1)

1-19: Good edge case coverage for element timeout behavior.

The test appropriately validates timeout handling when an awaited element never appears, using the new waitUntil feature. The intentionally non-existent selector .nonexistent-element-that-will-never-appear makes the test's purpose clear.

The temp- naming convention is consistent throughout the test suite and the localhost:8092 dependency is properly managed by the test framework's built-in server infrastructure (started and stopped automatically via before/after hooks), requiring no additional CI/CD configuration.

common/src/schemas/dereferenceSchemas.js (2)

52-52: LGTM - terminateScope schema added to v3 processing.

The addition of terminateScope_v3.schema.json properly extends the v3 schema processing list to include the new scope termination capability introduced in this PR.


179-182: Good refactor to use const declarations.

Using const declarations for valueFile, valueAttribute, and valuePath improves code clarity and enforces immutability for these intermediate values. The functional behavior remains unchanged.

core/test/temp-regression-test.json (1)

1-19: Consider removing "temp" prefix if test is permanent.

The test configuration properly exercises the runShell command with regression detection settings (maxVariation and overwrite). The structure is correct.

However, the "temp" prefix in the filename suggests this is temporary or experimental. If this test is part of the permanent test suite, consider renaming it to something more descriptive like regression-runshell-output-test.json.

core/test/temp-dom-timeout-test.json (1)

1-17: Consider clarifying the purpose of "temp" prefix naming convention.

The test configuration is well-structured and properly exercises the new waitUntil.domIdleTime capability. The test dependencies are available:

  • The HTML fixture waitUntil-test-dom-mutations-forever.html exists at core/test/server/public/.
  • The test server at localhost:8092 is properly configured in core/test/server/index.js (default port).

However, the "temp-" prefix is used consistently across multiple test files in the same directory (7 files total: temp-dom-timeout-test.json, temp-element-timeout-test.json, temp-failure-test.json, temp-initial-screenshot-test.json, temp-network-timeout-test.json, temp-regression-test.json, temp-unsafe-test.json). If this naming convention indicates these are permanent test fixtures, document the purpose of the "temp" categorization to avoid confusion.

common/src/schemas/output_schemas/terminateScope_v3.schema.json (1)

1-12: LGTM!

The schema is well-structured with appropriate pattern validation (^[a-zA-Z0-9_-]{1,64}$) that matches the scope pattern used in other schemas. The description clearly documents the SIGTERM/SIGKILL termination semantics.

core/src/tests/runCode.js (1)

158-164: LGTM!

The updated invocation correctly uses the object-form call signature that matches the function definition at line 39.

core/test/artifacts/scopes-type.spec.json (1)

10-19: Verify platform compatibility for this test.

The read -p command is bash-specific and may not work on Windows without WSL or Git Bash. Consider adding a platform constraint in the contexts if this test should only run on Unix-like systems.

core/test/core.test.js (1)

835-837: Well-structured test suite for scope management.

The test coverage is comprehensive, covering schema acceptance, background processes, scope persistence, type-to-scope interactions, lifecycle management, scope isolation, and concurrent scopes. Good use of try/finally for cleanup.

core/test/artifacts/scopes-lifecycle.spec.json (1)

1-74: LGTM!

Good coverage of scope lifecycle scenarios including basic lifecycle, string-based waitUntil, regex-based waitUntil, and graceful handling of non-existent scope termination. The regex format /listening.*port/ follows the documented convention.

common/src/schemas/src_schemas/step_v3.schema.json (2)

361-378: LGTM!

The terminateScope step type is correctly integrated following the established pattern with allOf composition, properly referencing terminateScope_v3.schema.json.


647-653: LGTM!

Examples for terminateScope are valid and demonstrate typical usage patterns.

common/src/schemas/src_schemas/runShell_v3.schema.json (2)

34-58: LGTM with consideration for future enhancement.

The scope and waitUntil properties are well-designed. The pattern validation is consistent with other scope-related schemas. The waitUntil description notes a 30-second timeout, but this is not configurable via schema properties. Consider whether a timeout property should be added to waitUntil for user control in future iterations.


178-196: LGTM!

Good examples demonstrating practical scope and waitUntil usage with npm run dev and npm init commands.

common/src/schemas/src_schemas/type_v3.schema.json (2)

41-45: LGTM! Well-defined scope property.

The scope property is well-designed with appropriate constraints (1-64 characters, alphanumeric with hyphens/underscores) and clear documentation about its purpose for referencing terminal scopes.


122-125: LGTM! Practical example.

The example effectively demonstrates the scope feature with a realistic npm init scenario.

common/src/schemas/output_schemas/find_v3.schema.json (1)

436-440: LGTM! Consistent scope property additions.

The scope property is correctly added to all typeKeys schema variants within the find schema, maintaining consistency with the type_v3 schema definitions.

Also applies to: 548-552, 1099-1103, 1211-1215

core/src/scopes/waitUntil.js (4)

75-79: Race condition acknowledged but acceptable.

The comment clearly documents the potential race condition between buffer check and listener attachment. This is generally acceptable for most use cases since patterns will be detected when subsequent data arrives. Consider if your use case has strict real-time requirements.


102-116: LGTM! Well-designed buffer handling for pattern matching across chunks.

The implementation correctly checks accumulated buffers before and during monitoring, enabling pattern detection across chunk boundaries. The join('\n') approach is appropriate for line-based stdout/stderr output.

Also applies to: 134-151, 153-171


90-188: LGTM! Robust async waiting mechanism with proper cleanup.

The implementation includes:

  • Early exit for already-met conditions
  • Proper timeout handling
  • Memory leak prevention via listener cleanup
  • Clear error messages for unmet conditions

The function assumes the scope object has valid stdout/stderr streams if conditions are specified. Ensure scope validation occurs in calling code.


16-26: The regex flag validation is compatible with the project's target Node.js versions. The core package's CI pipeline tests against Node.js 20, 22, and 24, all of which support the 'd' (hasIndices) flag introduced in Node.js 16. No action needed.

Likely an incorrect or invalid review comment.

common/src/schemas/output_schemas/runShell_v3.schema.json (2)

284-302: LGTM! Practical examples demonstrating scope and waitUntil features.

The examples effectively show:

  • Dev server scenario waiting for startup message
  • Interactive command (npm init) with regex pattern matching

Both examples demonstrate realistic use cases for the new scope management features.


25-29: Semantic difference between runShell and type scopes is intentional and properly implemented.

The scope property in runShell correctly creates new terminal scopes for long-running/parallel processes or references existing ones, while type only references existing scopes. This semantic difference is intentional (as confirmed by implementation and test documentation) and accurately documented in the schema. No changes needed.

core/src/tests/terminateScope.js (4)

28-36: LGTM! Consistent validation pattern.

The step validation follows the same pattern used in other test modules (e.g., typeKeys.js, runShell.js), properly handling validation errors and accepting coerced values.


48-52: Confirm idempotent behavior for missing scopes.

The function returns a PASS status when the scope doesn't exist, treating it as a non-error with a warning. This provides idempotent behavior but may mask issues where users misspell scope names.

Consider if this lenient behavior aligns with your intended user experience, or if it should fail to help catch configuration errors.


68-73: LGTM! Proper cleanup ordering.

The scope is removed from the registry only after successful termination, ensuring the registry remains consistent even if termination fails.


57-66: Good defensive handling of scope types.

The implementation:

  • Properly handles terminal scopes via delegated termination
  • Explicitly fails for unimplemented code scopes with a clear message
  • Catches unknown scope types

The terminateTerminalScope function correctly implements the documented SIGTERM/SIGKILL sequence: sends SIGTERM, waits 5 seconds, then sends SIGKILL if the process hasn't terminated.

core/src/scopes/cleanup.js (4)

1-9: LGTM!

Imports and exports are well-organized. The module correctly imports required utilities from sibling modules.


19-46: LGTM!

The LIFO cleanup order is correctly implemented. Error handling allows cleanup to continue even if individual scope terminations fail, which is appropriate for cleanup routines.


55-82: LGTM!

Consistent cleanup pattern with cleanupTestScopes. The default config parameter ensures the function can be called even if config is unavailable during emergency cleanup.


90-155: LGTM!

The cleanup handler setup is well-designed:

  • Guard flag prevents duplicate handler installations
  • Async cleanup is properly handled with .finally() to ensure exit
  • The 5-second forced exit timeout with .unref() is a good safety mechanism
  • The synchronous exit handler appropriately only warns about remaining scopes
core/src/tests/typeKeys.js (2)

11-19: LGTM!

The terminal special key mappings correctly translate key tokens to their terminal escape sequence equivalents.


150-236: LGTM!

The existing element-based typing logic remains unchanged and correctly handles element discovery, focusing, key substitution, and per-character or bulk typing modes.

core/src/tests.js (3)

445-448: LGTM!

Process-level cleanup handlers are correctly installed at the start of runSpecs before any test execution begins.


600-600: Verify the dual scope registry design is intentional.

There appear to be two scope management systems:

  1. Local ScopeRegistry instance per context (line 600) - cleaned up at line 816
  2. Global scope registry (via cleanupTestScopes) - cleaned up at lines 855-859

This dual-cleanup approach handles different scope lifetimes (context vs. test level). Please confirm this design is intentional and not an oversight.

Also applies to: 815-817, 853-859


998-1002: LGTM!

The terminateScope step is correctly integrated. Based on the relevant code snippets, it uses getScope from the global registry rather than the local scopeRegistry, which is consistent with the dual-registry design.

core/src/scopes/terminal.js (4)

1-8: LGTM!

Imports and exports are correctly organized. The module uses node-pty for PTY functionality and integrates with the scope registry.


89-150: LGTM!

The graceful termination pattern is well-implemented:

  • Checks process existence before attempting termination
  • Sends SIGTERM first with a 5-second grace period
  • Falls back to SIGKILL if the process doesn't terminate
  • Handles edge cases appropriately without crashing

172-184: Escape sequence processing is correct.

The use of \x00 (NUL character) as a temporary placeholder is an intentional pattern to correctly handle escaped backslashes. The static analysis warning about control characters is a false positive in this context.

The processing order is correct:

  1. Replace \\ with placeholder
  2. Process other escape sequences
  3. Restore placeholder to single backslash

185-192: LGTM!

The write operation correctly handles errors and provides informative error messages.

common/src/schemas/output_schemas/config_v3.schema.json (4)

1943-1947: LGTM!

The scope property is well-defined with an appropriate pattern constraint and clear description distinguishing it as a reference to an existing scope.


7941-7954: Verify intent of nested $schema property.

The terminateScope property contains a nested $schema declaration (line 7943). This is unusual since $schema is typically a document-level keyword rather than a property within a definition. While some validators may ignore it, others might interpret it unexpectedly.

If this schema fragment is also used standalone elsewhere, this makes sense. Otherwise, consider removing it for cleaner schema definitions.


4794-4812: Good examples demonstrating the scope lifecycle.

The examples effectively illustrate the complete workflow: creating a scoped process with waitUntil, using regex pattern matching (line 4809), and the relationship between scope creation and termination.


8345-8351: LGTM!

The terminateScope examples correctly reference the scope names (dev-server, npm-init) from the earlier runShell examples, demonstrating proper scope lifecycle management.

core/src/scopes/registry.js (2)

23-37: LGTM! Clear validation logic.

The validation function properly checks for string type and pattern compliance with helpful error messages. The pattern matches the schema definition in runCode_v3.schema.json.


101-103: LGTM! Accessor functions are well-implemented.

The getScope, hasScope, and getScopesByTestId functions are straightforward and correct. The filtering logic in getScopesByTestId properly matches scopes by testId.

Also applies to: 111-113, 142-150

common/src/schemas/src_schemas/runCode_v3.schema.json (2)

32-36: LGTM! Schema pattern matches registry validation.

The scope property is well-defined with a pattern that matches the validation logic in core/src/scopes/registry.js (line 28). The description clearly explains the purpose and constraints.


37-56: The implementation correctly matches all schema specifications. The 30-second timeout is accurate (default parameter: 30000ms), regex parsing properly handles the /pattern/flags format with dedicated validation and parsing functions, and the process behavior is correctly described—the waitForConditions() function resolves upon meeting conditions without terminating the process, allowing it to continue running.

common/src/schemas/output_schemas/spec_v3.schema.json (5)

2358-2362: LGTM! Consistent scope property implementation.

The scope property is consistently defined across all actions that interact with terminal scopes. The pattern validation (^[a-zA-Z0-9_-]{1,64}$) is appropriate, and the description clearly indicates that these actions reference existing scopes rather than creating new ones.

Also applies to: 2470-2474, 3021-3025, 3133-3137, 5691-5695, 5803-5807, 10184-10188, 10296-10300, 10847-10851, 10959-10963, 13517-13521, 13629-13633


4950-4954: LGTM! runShell scope property and examples are well-designed.

The scope property correctly distinguishes between "create or interact with" for runShell actions, and the examples demonstrate practical use cases including waiting for server startup messages. The examples also show both literal string matching and regex pattern matching for waitUntil conditions.

Also applies to: 5065-5069, 5210-5227, 12776-12780, 12891-12895, 13037-13053


5326-5330: LGTM! runCode scope property properly differentiates code execution scopes.

The scope property appropriately describes "code execution scope" instead of "terminal scope", providing clear differentiation between the two scope types while maintaining structural consistency with runShell.

Also applies to: 5442-5446, 13152-13156, 13268-13272


8290-8372: LGTM! terminateScope step type is well-defined.

The terminateScope step type follows a clean schema design with appropriate pattern validation matching the scope naming convention. The examples demonstrate proper usage for terminating both dev-server and npm-init scopes.

Note: The 5-second timeout between SIGTERM and SIGKILL is mentioned in the description. If processes consistently need more time for graceful shutdown, consider making this configurable in a future iteration.

Also applies to: 8760-8766, 16116-16198, 16586-16592


2563-2566: LGTM! Examples effectively demonstrate scope usage with type actions.

The examples show practical usage of the scope property with type actions, demonstrating interaction with a named terminal scope (npm-init) created by a previous runShell step. The inclusion of the newline character in the keys correctly simulates pressing Enter.

Also applies to: 3226-3229, 5896-5899, 10389-10392, 11052-11055, 13722-13725

common/src/schemas/output_schemas/report_v3.schema.json (6)

2376-2380: LGTM! Consistent scope property for type actions.

The scope property is consistently defined across all type actions with appropriate pattern validation and clear error behavior description.

Also applies to: 2488-2492, 3039-3043, 3151-3155, 5709-5713, 5821-5825, 10202-10206, 10314-10318, 10865-10869, 10977-10981, 13535-13539, 13647-13651


2581-2584: LGTM! Clear scope usage examples.

The examples effectively demonstrate how to send input to a named scope created by a previous step.

Also applies to: 3244-3247, 5914-5917, 10407-10410, 11070-11073, 13740-13743


5228-5245: LGTM! Excellent runShell examples.

The examples effectively demonstrate both string and regex pattern matching in waitUntil conditions, showing practical use cases for background process management.

Also applies to: 13054-13071


5344-5368: LGTM! Consistent scope and waitUntil for runCode.

The runCode action properly mirrors the runShell design, maintaining consistency across execution contexts.

Also applies to: 5460-5484, 13170-13194, 13286-13310


8307-8390: LGTM! Clean terminateScope action design.

The terminateScope action has a simple, clear design with the scope name as a direct string value. The pattern validation is consistent with scope properties in other actions.

Also applies to: 16133-16216


8778-8784: LGTM! Examples demonstrate complete scope lifecycle.

The terminateScope examples show proper cleanup of background processes created in earlier examples, demonstrating the full create-use-terminate workflow.

Also applies to: 16604-16610

common/src/schemas/output_schemas/test_v3.schema.json (5)

1757-1761: LGTM - Scope property definitions for keys steps are consistent.

The scope property additions across the keys/input steps are well-defined with appropriate pattern validation and clear documentation indicating the scope must already exist.

Also applies to: 1869-1873, 2420-2424, 2532-2536


1961-1965: Good examples demonstrating scope interaction.

The examples clearly show how to send keystrokes to an existing named scope, which helps users understand the intended workflow.

Also applies to: 2624-2628, 5294-5298


4608-4626: Excellent examples showcasing scope and waitUntil usage.

The examples effectively demonstrate both plain string matching ("Server listening on port") and regex pattern matching ("/package name:/i"), covering the two primary use cases well.


4725-4749: Consistent with runShell scope/waitUntil pattern.

The runCode scope and waitUntil definitions appropriately mirror the runShell structure. The same suggestions about configurable timeout and additionalProperties: false would apply here as well.


8159-8165: Complete scope lifecycle demonstrated in examples.

The terminateScope examples properly clean up the scopes (dev-server, npm-init) created earlier in the examples, demonstrating the full lifecycle: create → interact → terminate.

common/src/schemas/output_schemas/step_v3.schema.json (1)

7090-7171: terminateScope step wiring and examples look consistent

The new terminateScope step variant:

  • Reuses the common step shape (including unsafe, outputs, etc.).
  • Constrains the terminateScope value with the same ^[a-zA-Z0-9_-]{1,64}$ pattern as other scope identifiers.
  • Provides examples ("dev-server", "npm-init", "test-scope") that line up with the earlier scope examples in runShell/type/typeKeys.

From a schema and consistency standpoint this looks solid and ready to use.

Also applies to: 7560-7566

Comment on lines +4968 to +4992
"scope": {
"type": "string",
"pattern": "^[a-zA-Z0-9_-]{1,64}$",
"description": "Named scope to create or interact with. Creates a new terminal scope for long-running/parallel processes, or references an existing scope. Must be 1-64 characters and contain only letters, numbers, hyphens, and underscores."
},
"waitUntil": {
"type": "object",
"description": "Conditions to wait for in the command's output before considering the step complete. When specified, the process continues running after conditions are met. Times out after 30 seconds.",
"properties": {
"stdio": {
"type": "object",
"description": "Standard I/O conditions to wait for.",
"properties": {
"stdout": {
"type": "string",
"description": "String or regex pattern to wait for in stdout. Use `/pattern/flags` format for regex."
},
"stderr": {
"type": "string",
"description": "String or regex pattern to wait for in stderr. Use `/pattern/flags` format for regex."
}
}
}
}
},
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Search for waitUntil timeout values and regex parsing in scope-related runtime code

echo "=== Searching for waitUntil timeout values ==="
rg -n -C3 'waitUntil|timeout.*30|30.*second' --type=ts --type=js -g '!*.schema.json'

echo -e "\n=== Searching for regex pattern parsing ==="
rg -n -C3 'pattern.*flags?|\/.*\/[gimsuvy]' --type=ts --type=js -g '!*.schema.json' | head -50

echo -e "\n=== Searching for SIGTERM/SIGKILL implementation ==="
rg -n -C3 'SIGTERM|SIGKILL|terminateScope' --type=ts --type=js -g '!*.schema.json' | head -50

echo -e "\n=== Searching for scope-related implementations ==="
rg -n 'scope.*terminal|runShell|waitUntil' --type=ts --type=js -g '!*.schema.json' -g '!*.test.ts' | head -30

Repository: doc-detective/.github

Length of output: 186


🏁 Script executed:

#!/bin/bash
# First, explore the repository structure
echo "=== Repository structure ==="
git ls-files | head -20
find . -type f -name "*.ts" -o -name "*.js" | grep -v node_modules | head -20

echo -e "\n=== Search for runShell/scope/waitUntil implementations ==="
rg 'runShell|waitUntil.*timeout|scope.*terminal' -i --type=ts --type=js -g '!node_modules' | head -40

echo -e "\n=== Search for timeout hardcoding (30 seconds) ==="
rg '30.*000|30000|timeout.*30' --type=ts --type=js -g '!node_modules' -g '!*.schema.json' | head -20

echo -e "\n=== Search for SIGTERM/SIGKILL ==="
rg 'SIGTERM|SIGKILL' --type=ts --type=js -g '!node_modules' | head -20

Repository: doc-detective/.github

Length of output: 7357


🏁 Script executed:

#!/bin/bash
echo "=== Examining waitUntil.js for timeout and regex handling ==="
cat -n core/src/scopes/waitUntil.js

echo -e "\n=== Examining runShell.js for stdio regex parsing ==="
cat -n core/src/tests/runShell.js | sed -n '1,150p'

echo -e "\n=== Examining terminal.js for SIGTERM/SIGKILL timing ==="
cat -n core/src/scopes/terminal.js | head -80

Repository: doc-detective/.github

Length of output: 17491


🏁 Script executed:

#!/bin/bash
echo "=== Remaining portion of terminal.js (SIGTERM/SIGKILL implementation) ==="
cat -n core/src/scopes/terminal.js | sed -n '80,150p'

Repository: doc-detective/.github

Length of output: 2692


🏁 Script executed:

#!/bin/bash
echo "=== Lines 4968-4992 (runShell scope/waitUntil) ==="
sed -n '4968,4992p' common/src/schemas/output_schemas/report_v3.schema.json

echo -e "\n=== Lines 5083-5107 (checking which action type) ==="
sed -n '5000,5110p' common/src/schemas/output_schemas/report_v3.schema.json | head -70

echo -e "\n=== Checking if runShell actually uses waitUntil from scope ==="
rg -A10 -B5 'scope.*waitUntil|waitUntil.*scope' core/src/tests/runShell.js

Repository: doc-detective/.github

Length of output: 7923


🏁 Script executed:

#!/bin/bash
echo "=== Checking if scope/waitUntil features are actually implemented in runShell ==="
rg -n 'scope|waitUntil' core/src/tests/runShell.js

echo -e "\n=== Checking what action types are at those other line numbers ==="
sed -n '5050,5090p' common/src/schemas/output_schemas/report_v3.schema.json | grep -E '"(command|action|type|runShell)'

echo -e "\n=== Checking for scope usage across all action types ==="
rg -n '"scope"' common/src/schemas/output_schemas/report_v3.schema.json | head -20

echo -e "\n=== Checking for waitUntil usage across schema ==="
rg -n '"waitUntil"' common/src/schemas/output_schemas/report_v3.schema.json

echo -e "\n=== Checking if scope is used in actual action implementations ==="
rg -rn 'step\.scope|scope.*action' core/src/tests/ --type=js

Repository: doc-detective/.github

Length of output: 4313


🏁 Script executed:

#!/bin/bash
echo "=== Identifying action type at line 5083 ==="
sed -n '5020,5090p' common/src/schemas/output_schemas/report_v3.schema.json | grep -E '"(title|description)"' | head -5

echo -e "\n=== Identifying action type at line 12794 ==="
sed -n '12750,12820p' common/src/schemas/output_schemas/report_v3.schema.json | grep -E '"(title|description)"' | head -5

echo -e "\n=== Identifying action type at line 12909 ==="
sed -n '12860,12940p' common/src/schemas/output_schemas/report_v3.schema.json | grep -E '"(title|description)"' | head -5

echo -e "\n=== Check if scope/waitUntil are actually implemented in any action ==="
rg -l 'scope|waitUntil' core/src/tests/*.js | xargs -I {} basename {}

echo -e "\n=== Looking for scope implementation in action handlers ==="
rg -n 'createScope|createTerminalScope|waitUntil|waitForConditions' core/src/tests/*.js | head -20

Repository: doc-detective/.github

Length of output: 4958


🏁 Script executed:

#!/bin/bash
echo "=== Check terminateScope.js to understand scope implementation ==="
cat -n core/src/tests/terminateScope.js

echo -e "\n=== Check if runShell imports or uses scope/waitUntil modules ==="
head -50 core/src/tests/runShell.js | grep -E 'require|import|scope|waitUntil'

echo -e "\n=== Check what's actually in core/src/tests/index.js to see all actions ==="
rg -n 'runShell|terminateScope' core/src/tests/index.js | head -10

echo -e "\n=== Verify schema property structure - is runShell the 5083 location? ==="
sed -n '4950,5000p' common/src/schemas/output_schemas/report_v3.schema.json

Repository: doc-detective/.github

Length of output: 7442


🏁 Script executed:

#!/bin/bash
echo "=== Find the actual tests directory structure ==="
ls -la core/src/tests/ | head -30

echo -e "\n=== Check if runShell uses scope/waitUntil by searching for how actions are invoked ==="
rg -l 'runShell' core/src/ --type=js | head -10

echo -e "\n=== Look for how actions are called/dispatched ==="
rg -n 'case.*runShell|runShell.*:' core/src/ --type=js | head -20

echo -e "\n=== Verify the schema - are scope/waitUntil actually part of runShell or a different action? ==="
sed -n '4920,4980p' common/src/schemas/output_schemas/report_v3.schema.json | grep -E '(runShell|scope|waitUntil|"title")'

Repository: doc-detective/.github

Length of output: 2985


🏁 Script executed:

#!/bin/bash
echo "=== Full runShell.js implementation to check for scope/waitUntil support ==="
cat -n core/src/tests/runShell.js

echo -e "\n=== Check how runShell is called in tests.js ==="
sed -n '980,1000p' core/src/tests.js

Repository: doc-detective/.github

Length of output: 7793


🏁 Script executed:

#!/bin/bash
echo "=== Check if there are other action variants that use scope/waitUntil ==="
sed -n '4900,4930p' common/src/schemas/output_schemas/report_v3.schema.json | grep -E '(runShell|interactWithElement|type.*action)'

echo -e "\n=== Check schema structure around line 5083 context ==="
sed -n '5050,5100p' common/src/schemas/output_schemas/report_v3.schema.json | head -40

echo -e "\n=== Verify if runShell handler is passed scopeRegistry but doesn't use it ==="
grep -A5 -B5 'runShell.*scopeRegistry' core/src/tests.js

Repository: doc-detective/.github

Length of output: 3689


The schema defines scope and waitUntil properties for runShell that are not implemented in the handler.

The runShell function in core/src/tests/runShell.js does not process or use the scope or waitUntil properties, despite these being defined in the schema at lines 4968-4992 and 5083-5107. Additionally, the runShell implementation only supports basic /pattern/ regex format (without flags), while the schema promises /pattern/flags support matching the implementation in core/src/scopes/waitUntil.js.

The required functionality exists elsewhere in the codebase (waitUntil.js with 30-second timeout and full flag support; terminal.js with SIGTERM/5-second/SIGKILL behavior), but is not integrated into runShell. Either remove these schema properties from runShell or implement full scope/waitUntil support in the handler.

🤖 Prompt for AI Agents
In @common/src/schemas/output_schemas/report_v3.schema.json around lines
4968-4992, The schema-added `scope` and `waitUntil` fields must be implemented
in the runShell handler: update the `runShell` function in
core/src/tests/runShell.js to accept and forward `scope` to the terminal
management API (use the existing terminal/terminal.js scope creation/selection
functions) so commands run in the named terminal scope, and integrate the
existing waitUntil module (core/src/scopes/waitUntil.js) to evaluate
`waitUntil.stdio.stdout` and `waitUntil.stdio.stderr` using full
`/pattern/flags` parsing with a 30s timeout; ensure waiting does not terminate
the running process when conditions are met (process continues), and reuse
terminal.js termination behavior (SIGTERM, 5s grace, then SIGKILL) for
bail/timeout cleanup. If you prefer not to support these features, remove
`scope` and `waitUntil` from the runShell schema instead.

Comment on lines 1 to 17
// Scope management exports
module.exports = {
// Registry
...require('./registry'),

// Terminal scopes
...require('./terminal'),

// Code scopes (future)
...require('./code'),

// WaitUntil utilities
...require('./waitUntil'),

// Cleanup
...require('./cleanup')
};
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check the implementation status of code.js module
if [ -f core/src/scopes/code.js ]; then
  echo "=== code.js content (first 50 lines) ==="
  head -50 core/src/scopes/code.js
else
  echo "code.js not found"
fi

Repository: doc-detective/.github

Length of output: 1641


🏁 Script executed:

#!/bin/bash
# List all scope module files to understand the structure
ls -la core/src/scopes/

Repository: doc-detective/.github

Length of output: 535


🏁 Script executed:

#!/bin/bash
# Extract exported names from each scope module
for module in registry terminal code waitUntil cleanup; do
  echo "=== $module.js exports ==="
  if [ -f "core/src/scopes/$module.js" ]; then
    grep -E "^(module\.exports|exports\.|const|function|class)" "core/src/scopes/$module.js" | head -20
  else
    echo "File not found"
  fi
done

Repository: doc-detective/.github

Length of output: 1962


Address non-functional placeholder code in exports.

The code module exports three functions (createCodeScope, terminateCodeScope, writeToCode) that all throw "Code scopes are not yet implemented" errors. While no export name collisions exist across the five modules, exporting non-functional code risks unintended usage and misleading API signatures. Either remove the code module from exports until implementation is ready, guard exports behind a feature flag, or document the limitation prominently in the API documentation.

🤖 Prompt for AI Agents
In @core/src/scopes/index.js around lines 1-17, The index exports currently
re-exports a code-scopes module that only exposes non-functional stubs
(createCodeScope, terminateCodeScope, writeToCode) which can mislead consumers;
either remove the spread of './code' from the public exports, or wrap it behind
a clear feature flag/conditional so those functions are not exported unless the
feature is enabled, and update any docs accordingly; specifically edit the
module.exports block to stop re-exporting './code' (or gate the
require('./code') and the spread operator with a boolean like
ENABLE_CODE_SCOPES) so createCodeScope, terminateCodeScope and writeToCode are
not available by default.

- Bump node-pty version to ^1.1.0 in package.json.
- Add detailed comments for code scopes in index.js.
- Implement line ending normalization in registry.js.
- Enhance error handling for scope registration in registry.js.
- Clean up stream listeners in removeScope function to prevent memory leaks.
- Update runShell function to accept unused parameters for future scope-based commands.
- Refactor tests to improve clarity and accuracy in core.test.js.
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
core/src/tests/runCode.js (1)

112-115: Consider conditional property spreading to avoid explicit undefined values.

The current implementation unconditionally assigns scope and waitUntil to the shellStep.runShell object, which means they'll be explicitly set to undefined if not present in step.runCode. While this likely works fine if runShell handles undefined values properly, it's cleaner to only include these properties when they have actual values.

🔎 Optional refactor using conditional spreading
     const shellStep = {
       runShell: {
         command:
           step.runCode.language.toLowerCase() === "python"
             ? "python"
             : step.runCode.language.toLowerCase() === "javascript"
             ? "node"
             : "bash",
         args: [scriptPath, ...step.runCode.args],
-        scope: step.runCode.scope,
-        waitUntil: step.runCode.waitUntil,
+        ...(step.runCode.scope && { scope: step.runCode.scope }),
+        ...(step.runCode.waitUntil && { waitUntil: step.runCode.waitUntil }),
         workingDirectory: step.runCode.workingDirectory,
         timeout: step.runCode.timeout,
       },
     };
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e94adb6 and 1a6d66a.

📒 Files selected for processing (6)
  • core/package.json
  • core/src/scopes/index.js
  • core/src/scopes/registry.js
  • core/src/tests/runCode.js
  • core/src/tests/runShell.js
  • core/test/core.test.js
🚧 Files skipped from review as they are similar to previous changes (3)
  • core/src/scopes/index.js
  • core/package.json
  • core/src/scopes/registry.js
🧰 Additional context used
🧬 Code graph analysis (1)
core/test/core.test.js (5)
core/src/tests/runCode.js (4)
  • path (5-5)
  • fs (4-4)
  • config (144-146)
  • result (40-44)
core/src/tests/runShell.js (3)
  • path (8-8)
  • fs (7-7)
  • result (17-27)
core/src/tests.js (3)
  • path (25-25)
  • config (443-443)
  • config (1099-1099)
core/src/scopes/registry.js (1)
  • result (190-190)
core/src/tests/typeKeys.js (1)
  • result (82-82)
🔇 Additional comments (5)
core/src/tests/runCode.js (1)

153-153: LGTM!

The updated invocation pattern using object destructuring is consistent with the new function signature.

core/src/tests/runShell.js (1)

13-15: LGTM! Good incremental setup for scope infrastructure.

The addition of test and scopeRegistry parameters with clear documentation about their intended future use is a solid approach for incremental feature development. The explanatory comments help future maintainers understand the function's evolution.

core/test/core.test.js (3)

1169-1199: LGTM! Past review feedback properly addressed.

The test has been updated to correctly expect validation failure for empty scope names (line 1195 now asserts result.summary.tests.fail, 1), which aligns with the schema pattern ^[a-zA-Z0-9_-]{1,64}$ that requires at least one character.


1201-1230: LGTM! Past review feedback properly addressed.

The scope name has been corrected to "my-scope_123-test" (line 1209), which only uses valid characters (alphanumeric, hyphens, underscores) that match the schema pattern ^[a-zA-Z0-9_-]{1,64}$.


827-1277: Comprehensive test coverage for scope feature.

The new scope test suite provides excellent coverage across multiple dimensions: basic acceptance, background processes, persistence, cross-action integration (runShell, runCode, type), lifecycle management, and edge cases. The tests are well-structured with appropriate cleanup and clear intent.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements scope-based session management to enable long-running processes and interactive terminal commands in Doc Detective's test framework. The implementation adds named scopes for managing parallel shell/code sessions, waitUntil conditions for pausing until stdout/stderr patterns appear, and a terminateScope command for gracefully stopping processes. The changes span schema definitions, core execution logic using node-pty for PTY-based terminals, comprehensive lifecycle management with cleanup handlers, and extensive test coverage.

Key changes:

  • New scope management system with terminal and code scope support (terminal fully implemented, code as placeholder)
  • waitUntil conditions for stdout/stderr pattern matching with regex support
  • terminateScope action with graceful SIGTERM → SIGKILL fallback
  • Type action enhancement to send input to scoped terminals
  • Process-level cleanup handlers to prevent resource leaks

Reviewed changes

Copilot reviewed 41 out of 55 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
core/src/scopes/*.js New scope management infrastructure: registry, terminal PTY handling, waitUntil logic, cleanup handlers
core/src/tests/terminateScope.js New action to terminate scoped processes with graceful shutdown
core/src/tests/typeKeys.js Enhanced to send input to terminal scopes
core/src/tests.js Integrated ScopeRegistry and cleanup into test orchestration
core/src/tests/runShell.js Added parameters for scope support (implementation deferred)
core/src/tests/runCode.js Passes scope/waitUntil options to runShell
core/src/config.js Bug fix: optional chaining for safe property access
core/package.json Added node-pty dependency for PTY support
core/test/core.test.js Comprehensive test suite for scope lifecycle, edge cases, and integration
core/test/artifacts/*.spec.json Integration test specifications for scope features
common/src/schemas/**/*.schema.json Schema definitions for scope, waitUntil, and terminateScope across all schema variants
common/src/schemas/dereferenceSchemas.js Added terminateScope schema to build process; fixed variable scoping
core/test/temp-*.json Temporary test files for various test scenarios

@@ -0,0 +1,47 @@
const { log } = require("../utils");
Copy link

Copilot AI Jan 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused variable log.

Copilot uses AI. Check for mistakes.
const {
findElementByCriteria,
} = require("./findStrategies");
const { hasScope, getScope } = require("../scopes/registry");
Copy link

Copilot AI Jan 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused variable hasScope.

Copilot uses AI. Check for mistakes.
const {
findElementByCriteria,
} = require("./findStrategies");
const { hasScope, getScope } = require("../scopes/registry");
Copy link

Copilot AI Jan 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused variable getScope.

Copilot uses AI. Check for mistakes.
findElementByCriteria,
} = require("./findStrategies");
const { hasScope, getScope } = require("../scopes/registry");
const { writeToTerminal } = require("../scopes/terminal");
Copy link

Copilot AI Jan 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused variable writeToTerminal.

Copilot uses AI. Check for mistakes.
}

async cleanup() {
for (const [name, scope] of this.scopes) {
Copy link

Copilot AI Jan 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused variable name.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments