Skip to content

Conversation

@steveluc
Copy link
Contributor

Summary

Implements token-based NFA (Nondeterministic Finite Automaton) infrastructure for compiling and matching regular grammars.

This provides the foundation for:

  • Determining if regular grammars are powerful enough for the pattern start → preamble command postamble
  • Future DFA compilation for performance optimization
  • Grammar combination and merging capabilities
  • Dynamic rule loading

Key Components

  • nfa.ts: Core NFA data structures and builder

    • NFABuilder for constructing NFAs
    • combineNFAs() for sequence and choice operations
  • nfaCompiler.ts: Compiles Grammar to NFA

    • Handles alternatives, sequences, wildcards, optionals
    • Supports nested rules
  • nfaInterpreter.ts: Interprets/runs NFAs for debugging

    • Epsilon closure computation
    • Parallel state tracking
    • Variable capturing
    • Debug printing

Features

✅ Token-based matching (words as atomic units, not characters)
✅ Epsilon transitions with proper closure computation
✅ Wildcard capturing with type constraints
✅ Optional parts using epsilon transitions
✅ Grammar combination (sequence and choice operations)
✅ Debug printing and execution tracing

Testing

  • Unit tests for all components
  • Integration tests with real production grammars:
    • Player grammar: 303 states, 413 transitions
    • Calendar grammar: 94 states, 161 transitions
  • Tests passing for pause, resume, play commands
  • Device selection working correctly

Documentation

See NFA_README.md for:

  • Architecture overview
  • Usage examples
  • Token-based vs character-based approach
  • Compilation strategy
  • Future work (DFA compilation, grammar merging)

Test Plan

cd packages/actionGrammar
npm test -- nfa.spec
npm test -- nfaRealGrammars.spec

All tests passing ✓

🤖 Generated with Claude Code

steveluc and others added 15 commits January 22, 2026 14:34
Created a VS Code extension that provides comprehensive syntax highlighting
for Action Grammar files used in TypeAgent.

Features:
- Rule definition highlighting (@ <RuleName> = ...)
- Rule reference highlighting (<RuleName>)
- Capture syntax highlighting ($(name:Type) and $(name))
- Action object highlighting with embedded JavaScript syntax (-> { })
- Operator highlighting (|, ?, *, +)
- Comment support (//)
- String literal highlighting
- Bracket matching and auto-closing pairs
- Language configuration for editor features

File structure:
- package.json: Extension manifest with language contributions
- language-configuration.json: Editor behavior configuration
- syntaxes/agr.tmLanguage.json: TextMate grammar definition
- README.md: Installation and usage documentation
- OVERVIEW.md: Technical implementation details
- sample.agr: Sample file demonstrating all syntax features
- LICENSE: MIT license

The extension uses TextMate grammar for syntax highlighting and follows
VS Code extension best practices.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Added required package.json fields: license, author, homepage, repository, private
- Sorted package.json fields according to project conventions
- Added Trademarks section to README per Microsoft guidelines

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Reordered package.json fields to match project conventions
- Added third-party trademarks clause to README

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Used sort-package-json to ensure fields are in the correct order per project standards.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This adds a generator that creates .agr grammar files from TypeAgent action schemas.

The generator:
1. Takes a complete action schema as input
2. Identifies the most common actions and generates example requests
3. Uses Claude to generate an efficient grammar with shared sub-rules
4. Validates the grammar and automatically fixes syntax errors
5. Outputs a complete .agr file ready for use in TypeAgent

Key features:
- Automatically extracts shared patterns across actions (e.g., <Polite>, <DateExpr>)
- Handles union types (e.g., CalendarTime | CalendarTimeRange)
- Validates generated grammar using the action-grammar compiler
- Provides error feedback loop to fix syntax issues
- Exports CLI tool: generate-grammar

Files added:
- packages/agentSdkWrapper/src/schemaToGrammarGenerator.ts: Main generator class
- packages/agentSdkWrapper/src/generate-grammar-cli.ts: CLI interface

Files modified:
- packages/agentSdkWrapper/src/schemaReader.ts: Added union type handling
- packages/agentSdkWrapper/src/index.ts: Export new generator classes
- packages/agentSdkWrapper/package.json: Add action-grammar dependency and CLI command

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Enables extending existing .agr grammars with new examples and improvements
rather than always generating from scratch.

New CLI options:
- --input/-i: Load existing .agr file to extend
- --improve: Provide improvement instructions to Claude

Key features:
- Extension mode uses specialized prompt to maintain consistency
- Outputs to .extended.agr by default to avoid overwriting original
- Successfully tested with calendar grammar

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Changes both grammar generators to use exact action names (e.g., scheduleEvent)
instead of capitalized names (e.g., ScheduleEvent) for action rules.

This convention enables:
- Easy targeting of specific actions when extending grammars incrementally
- When a new example for scheduleEvent comes in, can extend just the
  @ <scheduleEvent> rule without affecting other actions
- Better factoring for incremental grammar updates

Format:
- @ <Start> = <scheduleEvent> | <findEvents> | ...
- @ <scheduleEvent> = ... -> { actionName: "scheduleEvent", ... }
- @ <findEvents> = ... -> { actionName: "findEvents", ... }

Shared sub-rules can still use any naming convention (e.g., <Polite>, <DateSpec>).

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Create new independent package that converts stream-of-consciousness or raw
text into well-formatted markdown documents using Claude.

Features:
- CLI utility with stdin/stdout support
- Uses Claude Agent SDK query() function (same pattern as grammarGenerator)
- Configurable model and custom formatting instructions
- Compact and readable output

Usage:
  thoughts input.txt -o output.md
  cat notes.txt | thoughts > output.md
  thoughts notes.txt --instructions "Format as meeting notes"

Package structure:
- thoughtsProcessor.ts - Core processor using Claude
- cli.ts - Command-line interface
- Independent package (no workspace dependencies)

Also updated pnpm-workspace.yaml to include packages/mcp/* pattern.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Add audioTranscriber module using OpenAI Whisper API
- Update CLI to detect and transcribe .wav files automatically
- Add openai package dependency
- Update README with audio transcription examples
- Document OPENAI_API_KEY environment variable requirement

The CLI now supports both text and audio input, transcribing
WAV files before processing with Claude.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Major changes:
- Replace OpenAI Whisper with Azure Cognitive Services for transcription
- Add --tags flag for appending keywords to markdown documents
- Add inline tag support: say "tag this as X" during recording
- Claude extracts inline tags and inserts markers at the right locations
- Tags formatted as markdown headings and inline markers (🏷️)

Technical details:
- Use microsoft-cognitiveservices-speech-sdk instead of openai
- Update audioTranscriber to use Azure Speech SDK
- Update thoughtsProcessor prompt to recognize tag phrases
- Add CLI flag: -t, --tags for comma-separated tags
- Environment variables: AZURE_SPEECH_KEY, AZURE_SPEECH_REGION

Inline tags example:
  Input: "idea 1... tag this as design... idea 2..."
  Output: markdown with **🏷️ design** marker inserted

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Add dotenv dependency for loading environment variables
- Load .env from repository root (ts/.env) with path resolution
- Check if .env file exists and show warning if not found
- Add managed identity support for Azure Speech Services
- Use aiclient package to get tokens for managed identity
- Handle SPEECH_SDK_* environment variables
- Support both subscription key and identity-based authentication

Path resolution:
- From dist/ go up to: thoughts/ -> mcp/ -> packages/ -> ts/
- Load .env from ts directory (4 levels up)

Managed identity:
- Check if speechKey is "identity"
- Create Azure token provider with CogServices scope
- Use fromAuthorizationToken with aad#endpoint#token format

Successfully tested with managed identity authentication.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Use startContinuousRecognitionAsync to transcribe entire audio files
without stopping at pauses.

Changes:
- Replace recognizeOnceAsync with startContinuousRecognitionAsync
- Collect all recognized text segments in array
- Handle recognized, canceled, and sessionStopped events
- Join all segments with spaces for complete transcription
- Handle cancellation gracefully if text was captured

This captures the full audio file content instead of just the first
utterance. Tested with 2.5-minute recording:
- Before: 77 characters (stopped at first pause)
- After: 1566 characters (full transcription)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Implements token-based NFA (Nondeterministic Finite Automaton) system for
compiling and matching regular grammars.

Key components:
- NFA data structures and builder (nfa.ts)
- Grammar to NFA compiler (nfaCompiler.ts)
- NFA interpreter for debugging and matching (nfaInterpreter.ts)
- Comprehensive test suite with real grammars
- Documentation (NFA_README.md)

Features:
- Token-based matching (words, not characters)
- Epsilon closure computation
- Wildcard capturing with type constraints
- Grammar combination (sequence/choice)
- Debug printing and tracing
- Successfully compiles player grammar (303 states) and calendar grammar

This provides foundation for:
1. DFA compilation (future optimization)
2. Grammar merging capabilities
3. Dynamic rule loading

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants