Add NFA infrastructure for regular grammar compilation #1856

steveluc · 2026-01-23T23:32:49Z

Summary

Implements token-based NFA (Nondeterministic Finite Automaton) infrastructure for compiling and matching regular grammars.

This provides the foundation for:

Determining if regular grammars are powerful enough for the pattern start → preamble command postamble
Future DFA compilation for performance optimization
Grammar combination and merging capabilities
Dynamic rule loading

Key Components

nfa.ts: Core NFA data structures and builder
- NFABuilder for constructing NFAs
- combineNFAs() for sequence and choice operations
nfaCompiler.ts: Compiles Grammar to NFA
- Handles alternatives, sequences, wildcards, optionals
- Supports nested rules
nfaInterpreter.ts: Interprets/runs NFAs for debugging
- Epsilon closure computation
- Parallel state tracking
- Variable capturing
- Debug printing

Features

✅ Token-based matching (words as atomic units, not characters)
✅ Epsilon transitions with proper closure computation
✅ Wildcard capturing with type constraints
✅ Optional parts using epsilon transitions
✅ Grammar combination (sequence and choice operations)
✅ Debug printing and execution tracing

Testing

Unit tests for all components
Integration tests with real production grammars:
- Player grammar: 303 states, 413 transitions
- Calendar grammar: 94 states, 161 transitions
Tests passing for pause, resume, play commands
Device selection working correctly

Documentation

See NFA_README.md for:

Architecture overview
Usage examples
Token-based vs character-based approach
Compilation strategy
Future work (DFA compilation, grammar merging)

Test Plan

cd packages/actionGrammar
npm test -- nfa.spec
npm test -- nfaRealGrammars.spec

All tests passing ✓

🤖 Generated with Claude Code

Created a VS Code extension that provides comprehensive syntax highlighting for Action Grammar files used in TypeAgent. Features: - Rule definition highlighting (@ <RuleName> = ...) - Rule reference highlighting (<RuleName>) - Capture syntax highlighting ($(name:Type) and $(name)) - Action object highlighting with embedded JavaScript syntax (-> { }) - Operator highlighting (|, ?, *, +) - Comment support (//) - String literal highlighting - Bracket matching and auto-closing pairs - Language configuration for editor features File structure: - package.json: Extension manifest with language contributions - language-configuration.json: Editor behavior configuration - syntaxes/agr.tmLanguage.json: TextMate grammar definition - README.md: Installation and usage documentation - OVERVIEW.md: Technical implementation details - sample.agr: Sample file demonstrating all syntax features - LICENSE: MIT license The extension uses TextMate grammar for syntax highlighting and follows VS Code extension best practices. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

- Added required package.json fields: license, author, homepage, repository, private - Sorted package.json fields according to project conventions - Added Trademarks section to README per Microsoft guidelines Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

…nsion

- Reordered package.json fields to match project conventions - Added third-party trademarks clause to README Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Used sort-package-json to ensure fields are in the correct order per project standards. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

This adds a generator that creates .agr grammar files from TypeAgent action schemas. The generator: 1. Takes a complete action schema as input 2. Identifies the most common actions and generates example requests 3. Uses Claude to generate an efficient grammar with shared sub-rules 4. Validates the grammar and automatically fixes syntax errors 5. Outputs a complete .agr file ready for use in TypeAgent Key features: - Automatically extracts shared patterns across actions (e.g., <Polite>, <DateExpr>) - Handles union types (e.g., CalendarTime | CalendarTimeRange) - Validates generated grammar using the action-grammar compiler - Provides error feedback loop to fix syntax issues - Exports CLI tool: generate-grammar Files added: - packages/agentSdkWrapper/src/schemaToGrammarGenerator.ts: Main generator class - packages/agentSdkWrapper/src/generate-grammar-cli.ts: CLI interface Files modified: - packages/agentSdkWrapper/src/schemaReader.ts: Added union type handling - packages/agentSdkWrapper/src/index.ts: Export new generator classes - packages/agentSdkWrapper/package.json: Add action-grammar dependency and CLI command Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

…nerator

Enables extending existing .agr grammars with new examples and improvements rather than always generating from scratch. New CLI options: - --input/-i: Load existing .agr file to extend - --improve: Provide improvement instructions to Claude Key features: - Extension mode uses specialized prompt to maintain consistency - Outputs to .extended.agr by default to avoid overwriting original - Successfully tested with calendar grammar Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Changes both grammar generators to use exact action names (e.g., scheduleEvent) instead of capitalized names (e.g., ScheduleEvent) for action rules. This convention enables: - Easy targeting of specific actions when extending grammars incrementally - When a new example for scheduleEvent comes in, can extend just the @ <scheduleEvent> rule without affecting other actions - Better factoring for incremental grammar updates Format: - @ <Start> = <scheduleEvent> | <findEvents> | ... - @ <scheduleEvent> = ... -> { actionName: "scheduleEvent", ... } - @ <findEvents> = ... -> { actionName: "findEvents", ... } Shared sub-rules can still use any naming convention (e.g., <Polite>, <DateSpec>). Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Create new independent package that converts stream-of-consciousness or raw text into well-formatted markdown documents using Claude. Features: - CLI utility with stdin/stdout support - Uses Claude Agent SDK query() function (same pattern as grammarGenerator) - Configurable model and custom formatting instructions - Compact and readable output Usage: thoughts input.txt -o output.md cat notes.txt | thoughts > output.md thoughts notes.txt --instructions "Format as meeting notes" Package structure: - thoughtsProcessor.ts - Core processor using Claude - cli.ts - Command-line interface - Independent package (no workspace dependencies) Also updated pnpm-workspace.yaml to include packages/mcp/* pattern. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

- Add audioTranscriber module using OpenAI Whisper API - Update CLI to detect and transcribe .wav files automatically - Add openai package dependency - Update README with audio transcription examples - Document OPENAI_API_KEY environment variable requirement The CLI now supports both text and audio input, transcribing WAV files before processing with Claude. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Major changes: - Replace OpenAI Whisper with Azure Cognitive Services for transcription - Add --tags flag for appending keywords to markdown documents - Add inline tag support: say "tag this as X" during recording - Claude extracts inline tags and inserts markers at the right locations - Tags formatted as markdown headings and inline markers (🏷️) Technical details: - Use microsoft-cognitiveservices-speech-sdk instead of openai - Update audioTranscriber to use Azure Speech SDK - Update thoughtsProcessor prompt to recognize tag phrases - Add CLI flag: -t, --tags for comma-separated tags - Environment variables: AZURE_SPEECH_KEY, AZURE_SPEECH_REGION Inline tags example: Input: "idea 1... tag this as design... idea 2..." Output: markdown with **🏷️ design** marker inserted Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

- Add dotenv dependency for loading environment variables - Load .env from repository root (ts/.env) with path resolution - Check if .env file exists and show warning if not found - Add managed identity support for Azure Speech Services - Use aiclient package to get tokens for managed identity - Handle SPEECH_SDK_* environment variables - Support both subscription key and identity-based authentication Path resolution: - From dist/ go up to: thoughts/ -> mcp/ -> packages/ -> ts/ - Load .env from ts directory (4 levels up) Managed identity: - Check if speechKey is "identity" - Create Azure token provider with CogServices scope - Use fromAuthorizationToken with aad#endpoint#token format Successfully tested with managed identity authentication. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Use startContinuousRecognitionAsync to transcribe entire audio files without stopping at pauses. Changes: - Replace recognizeOnceAsync with startContinuousRecognitionAsync - Collect all recognized text segments in array - Handle recognized, canceled, and sessionStopped events - Join all segments with spaces for complete transcription - Handle cancellation gracefully if text was captured This captures the full audio file content instead of just the first utterance. Tested with 2.5-minute recording: - Before: 77 characters (stopped at first pause) - After: 1566 characters (full transcription) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Implements token-based NFA (Nondeterministic Finite Automaton) system for compiling and matching regular grammars. Key components: - NFA data structures and builder (nfa.ts) - Grammar to NFA compiler (nfaCompiler.ts) - NFA interpreter for debugging and matching (nfaInterpreter.ts) - Comprehensive test suite with real grammars - Documentation (NFA_README.md) Features: - Token-based matching (words, not characters) - Epsilon closure computation - Wildcard capturing with type constraints - Grammar combination (sequence/choice) - Debug printing and tracing - Successfully compiles player grammar (303 states) and calendar grammar This provides foundation for: 1. DFA compilation (future optimization) 2. Grammar merging capabilities 3. Dynamic rule loading Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

steveluc and others added 15 commits January 22, 2026 14:34

Merge remote-tracking branch 'origin/main' into add-agr-language-exte…

8bb0b17

…nsion

Fix package.json field order and complete trademarks section

b8f7ee2

- Reordered package.json fields to match project conventions - Added third-party trademarks clause to README Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Fix package.json field ordering with sort-package-json

9dec3a7

Used sort-package-json to ensure fields are in the correct order per project standards. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Merge remote-tracking branch 'origin/main' into add-schema-grammar-ge…

e2f7122

…nerator

steveluc requested a deployment to development-fork January 23, 2026 23:32 — with GitHub Actions Waiting

steveluc mentioned this pull request Jan 23, 2026

Add grammar module system and dynamic loading capabilities #1858

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add NFA infrastructure for regular grammar compilation #1856

Add NFA infrastructure for regular grammar compilation #1856

Uh oh!

steveluc commented Jan 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add NFA infrastructure for regular grammar compilation #1856

Are you sure you want to change the base?

Add NFA infrastructure for regular grammar compilation #1856

Uh oh!

Conversation

steveluc commented Jan 23, 2026

Summary

Key Components

Features

Testing

Documentation

Test Plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants