Skip to content

Add AST-based test detection for documentation formats#69

Draft
Copilot wants to merge 6 commits intomainfrom
copilot/extend-markupdefinition-schema
Draft

Add AST-based test detection for documentation formats#69
Copilot wants to merge 6 commits intomainfrom
copilot/extend-markupdefinition-schema

Conversation

Copy link

Copilot AI commented Dec 5, 2025

Extends markupDefinition schema to support AST node matching alongside regex patterns. When both ast and regex are specified, AST identifies candidate nodes first, then regex filters matched content (AND operation). At least one of ast or regex must be specified.

Schema Changes

  • Added ast property to markupDefinition with astNodeMatch type
  • astNodeMatch supports: nodeType, attributes, content, children (recursive), extract (capture group mapping)
  • Attribute matching: exact string, regex (/pattern/), any-of (array), exists-check (boolean)

AST Parsers (resolver/src/ast/)

  • markdown.js — unified + remark-parse → MDAST
  • html.js — unified + rehype-parse → HAST
  • asciidoc.js — asciidoctor → Asciidoctor AST
  • rst.js — restructured → RST tree
  • xml.js — @xmldom/xmldom → DOM-like AST for DITA
  • cache.js — AST cache by file path + content hash
  • matcher.jsmatchNodes(ast, astConfig) returns matched nodes with positions and extracted values

Integration

  • Modified parseContent() in utils.js to handle AST-based matching before regex
  • Added dependencies: unified, remark-parse, rehype-parse, asciidoctor, restructured, @xmldom/xmldom
  • Fixed schema dereferencing for recursive astNodeMatch.children self-reference

Example

{
  "name": "bashCodeBlock",
  "ast": {
    "nodeType": "code",
    "attributes": { "lang": ["bash", "sh"] },
    "extract": { "$1": "lang", "$2": "value" }
  },
  "actions": [{ "runShell": { "command": "$2" } }]
}

Combined AST + regex (filters bash blocks containing # IMPORTANT:):

{
  "ast": { "nodeType": "code", "attributes": { "lang": "bash" } },
  "regex": ["# IMPORTANT:([\\s\\S]*)"],
  "actions": [{ "runShell": { "command": "$1" } }]
}
Original prompt

Plan: AST-Based Test Detection for Documentation Formats

Extend the markupDefinition schema to support AST node matching alongside regex patterns. When a markup definition specifies both ast and regex, the AST matcher identifies candidate nodes first, then regex applies to matched node content (AND operation). At least one of ast or regex must be specified per markup definition.

Steps

  1. Add AST parser adapters in new resolver/src/ast/ directory:

    • parsers/markdown.js using unified + remark-parse → mdast
    • parsers/html.js using unified + rehype-parse → hast
    • parsers/asciidoc.js using asciidoctor.js → Asciidoctor AST
    • parsers/rst.js using restructured → RST tree
    • parsers/xml.js using @xmldom/xmldom → XML DOM for DITA
    • index.js exposing parseToAst(content, format) with try/catch returning null on parse failure
  2. Extend markupDefinition schema in common/src/schemas/src_schemas/config.json:

    • Add ast property as astNodeMatch object with: nodeType (string/array), attributes (object with value matchers), content (text/pattern match), children (nested matchers), extract (maps $1/$2 to node paths)
    • Add validation requiring at least one of ast or regex via anyOf with two branches
    • Run npm run build in common/ to regenerate output schemas
  3. Add AST node matching engine in resolver/src/ast/matcher.js:

    • matchNodes(ast, astConfig) → returns array of {node, position, extracted} objects
    • Support attribute matching: exact string, /regex/ pattern, ["a", "b"] any-of, true/false exists-check
    • Support extract mapping: "$1": "attributes.lang", "$2": "content" paths
  4. Modify markup processing in utils.js parseContent() function (~line 271):

    • Before iterating markup.regex, check if markup.ast exists
    • If ast specified: call parseToAst(), if returns null (parse failure) skip this markup definition for this file
    • Call matchNodes() to get candidate nodes with positions and extracted values
    • If regex also specified: apply regex patterns to each matched node's content, filter to nodes where regex matches
    • If ast-only: use extracted values directly as capture groups for action substitution
    • Create detectedStep statements with sortIndex from node positions
  5. Add default AST patterns to file type definitions in config.js:

    • Markdown: { nodeType: "code", attributes: { lang: ["bash", "sh"] }, extract: { "$1": "attributes.lang", "$2": "value" } }
    • HTML: { nodeType: "element", attributes: { tagName: "pre" }, children: [{ nodeType: "element", attributes: { tagName: "code", className: "/language-.*/" } }] }
    • DITA: { nodeType: "element", attributes: { tagName: "codeblock", outputclass: ["bash", "shell"] } }
    • AsciiDoc: { nodeType: "listing", attributes: { language: ["bash", "shell"] } }
  6. Add caching for parsed ASTs in resolver/src/ast/cache.js:

    • Cache parsed ASTs by file path + content hash
    • getOrParse(content, format, filePath) returns cached AST or parses and caches
    • Clear cache entry when file content changes (hash mismatch)
  7. Add dependencies to package.json:

    • unified, remark-parse, rehype-parse, asciidoctor, restructured, @xmldom/xmldom

The user has attached the following file paths as relevant context:

  • .github/copilot-instructions.md

Created from VS Code.


💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

@coderabbitai
Copy link

coderabbitai bot commented Dec 5, 2025

Important

Review skipped

Bot user detected.

To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.


Comment @coderabbitai help to get the list of available commands and usage tips.

Copilot AI and others added 5 commits December 5, 2025 10:43
- Add AST parser adapters in resolver/src/ast/ directory:
  - parsers/markdown.js using unified + remark-parse
  - parsers/html.js using unified + rehype-parse
  - parsers/asciidoc.js using asciidoctor
  - parsers/rst.js using restructured
  - parsers/xml.js using @xmldom/xmldom
  - index.js exposing parseToAst(content, format)

- Add AST node matching engine in resolver/src/ast/matcher.js:
  - matchNodes(ast, astConfig) returns matched nodes with positions and extracted values
  - Supports attribute matching: exact string, regex, any-of, exists-check
  - Supports extract mapping for capture groups

- Add caching for parsed ASTs in resolver/src/ast/cache.js

- Extend markupDefinition schema in common/src/schemas/src_schemas/config_v3.schema.json:
  - Add ast property as astNodeMatch object
  - Add validation requiring at least one of ast or regex

- Modify markup processing in utils.js parseContent() function:
  - Add AST-based matching before regex patterns
  - Handle AST-only and AST+regex (AND) modes

- Add dependencies to resolver/package.json:
  - unified, remark-parse, rehype-parse, asciidoctor, restructured, @xmldom/xmldom

- Add tests for AST-based detection

Co-authored-by: hawkeyexl <5209367+hawkeyexl@users.noreply.github.com>
- Fix format determination to check array length before accessing first element
- Improve regex pattern detection to require minimum length between delimiters

Co-authored-by: hawkeyexl <5209367+hawkeyexl@users.noreply.github.com>
- Modify dereferenceSchemas.js to use bundle instead of dereference for config_v3
- Add WeakSet-based cycle detection to deleteDollarIds function
- This preserves $refs for recursive schemas like astNodeMatch.children

Co-authored-by: hawkeyexl <5209367+hawkeyexl@users.noreply.github.com>
- Use full dereferencing and handle circular refs with custom clone function
- The breakCircularRefs function preserves schema structure while breaking cycles
- Both config_v3 and resolvedTests_v3 schemas are now handled correctly
- All 306 common tests and 43 resolver tests pass

Co-authored-by: hawkeyexl <5209367+hawkeyexl@users.noreply.github.com>
Co-authored-by: hawkeyexl <5209367+hawkeyexl@users.noreply.github.com>
Copilot AI changed the title [WIP] Extend markupDefinition schema to add AST node matching Add AST-based test detection for documentation formats Dec 5, 2025
Copilot AI requested a review from hawkeyexl December 5, 2025 11:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments