-
Notifications
You must be signed in to change notification settings - Fork 83
Add NFA infrastructure for regular grammar compilation #1856
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
steveluc
wants to merge
15
commits into
main
Choose a base branch
from
nfa-infrastructure
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Created a VS Code extension that provides comprehensive syntax highlighting
for Action Grammar files used in TypeAgent.
Features:
- Rule definition highlighting (@ <RuleName> = ...)
- Rule reference highlighting (<RuleName>)
- Capture syntax highlighting ($(name:Type) and $(name))
- Action object highlighting with embedded JavaScript syntax (-> { })
- Operator highlighting (|, ?, *, +)
- Comment support (//)
- String literal highlighting
- Bracket matching and auto-closing pairs
- Language configuration for editor features
File structure:
- package.json: Extension manifest with language contributions
- language-configuration.json: Editor behavior configuration
- syntaxes/agr.tmLanguage.json: TextMate grammar definition
- README.md: Installation and usage documentation
- OVERVIEW.md: Technical implementation details
- sample.agr: Sample file demonstrating all syntax features
- LICENSE: MIT license
The extension uses TextMate grammar for syntax highlighting and follows
VS Code extension best practices.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Added required package.json fields: license, author, homepage, repository, private - Sorted package.json fields according to project conventions - Added Trademarks section to README per Microsoft guidelines Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Reordered package.json fields to match project conventions - Added third-party trademarks clause to README Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Used sort-package-json to ensure fields are in the correct order per project standards. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This adds a generator that creates .agr grammar files from TypeAgent action schemas. The generator: 1. Takes a complete action schema as input 2. Identifies the most common actions and generates example requests 3. Uses Claude to generate an efficient grammar with shared sub-rules 4. Validates the grammar and automatically fixes syntax errors 5. Outputs a complete .agr file ready for use in TypeAgent Key features: - Automatically extracts shared patterns across actions (e.g., <Polite>, <DateExpr>) - Handles union types (e.g., CalendarTime | CalendarTimeRange) - Validates generated grammar using the action-grammar compiler - Provides error feedback loop to fix syntax issues - Exports CLI tool: generate-grammar Files added: - packages/agentSdkWrapper/src/schemaToGrammarGenerator.ts: Main generator class - packages/agentSdkWrapper/src/generate-grammar-cli.ts: CLI interface Files modified: - packages/agentSdkWrapper/src/schemaReader.ts: Added union type handling - packages/agentSdkWrapper/src/index.ts: Export new generator classes - packages/agentSdkWrapper/package.json: Add action-grammar dependency and CLI command Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Enables extending existing .agr grammars with new examples and improvements rather than always generating from scratch. New CLI options: - --input/-i: Load existing .agr file to extend - --improve: Provide improvement instructions to Claude Key features: - Extension mode uses specialized prompt to maintain consistency - Outputs to .extended.agr by default to avoid overwriting original - Successfully tested with calendar grammar Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Changes both grammar generators to use exact action names (e.g., scheduleEvent)
instead of capitalized names (e.g., ScheduleEvent) for action rules.
This convention enables:
- Easy targeting of specific actions when extending grammars incrementally
- When a new example for scheduleEvent comes in, can extend just the
@ <scheduleEvent> rule without affecting other actions
- Better factoring for incremental grammar updates
Format:
- @ <Start> = <scheduleEvent> | <findEvents> | ...
- @ <scheduleEvent> = ... -> { actionName: "scheduleEvent", ... }
- @ <findEvents> = ... -> { actionName: "findEvents", ... }
Shared sub-rules can still use any naming convention (e.g., <Polite>, <DateSpec>).
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Create new independent package that converts stream-of-consciousness or raw text into well-formatted markdown documents using Claude. Features: - CLI utility with stdin/stdout support - Uses Claude Agent SDK query() function (same pattern as grammarGenerator) - Configurable model and custom formatting instructions - Compact and readable output Usage: thoughts input.txt -o output.md cat notes.txt | thoughts > output.md thoughts notes.txt --instructions "Format as meeting notes" Package structure: - thoughtsProcessor.ts - Core processor using Claude - cli.ts - Command-line interface - Independent package (no workspace dependencies) Also updated pnpm-workspace.yaml to include packages/mcp/* pattern. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Add audioTranscriber module using OpenAI Whisper API - Update CLI to detect and transcribe .wav files automatically - Add openai package dependency - Update README with audio transcription examples - Document OPENAI_API_KEY environment variable requirement The CLI now supports both text and audio input, transcribing WAV files before processing with Claude. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Major changes: - Replace OpenAI Whisper with Azure Cognitive Services for transcription - Add --tags flag for appending keywords to markdown documents - Add inline tag support: say "tag this as X" during recording - Claude extracts inline tags and inserts markers at the right locations - Tags formatted as markdown headings and inline markers (🏷️) Technical details: - Use microsoft-cognitiveservices-speech-sdk instead of openai - Update audioTranscriber to use Azure Speech SDK - Update thoughtsProcessor prompt to recognize tag phrases - Add CLI flag: -t, --tags for comma-separated tags - Environment variables: AZURE_SPEECH_KEY, AZURE_SPEECH_REGION Inline tags example: Input: "idea 1... tag this as design... idea 2..." Output: markdown with **🏷️ design** marker inserted Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Add dotenv dependency for loading environment variables - Load .env from repository root (ts/.env) with path resolution - Check if .env file exists and show warning if not found - Add managed identity support for Azure Speech Services - Use aiclient package to get tokens for managed identity - Handle SPEECH_SDK_* environment variables - Support both subscription key and identity-based authentication Path resolution: - From dist/ go up to: thoughts/ -> mcp/ -> packages/ -> ts/ - Load .env from ts directory (4 levels up) Managed identity: - Check if speechKey is "identity" - Create Azure token provider with CogServices scope - Use fromAuthorizationToken with aad#endpoint#token format Successfully tested with managed identity authentication. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Use startContinuousRecognitionAsync to transcribe entire audio files without stopping at pauses. Changes: - Replace recognizeOnceAsync with startContinuousRecognitionAsync - Collect all recognized text segments in array - Handle recognized, canceled, and sessionStopped events - Join all segments with spaces for complete transcription - Handle cancellation gracefully if text was captured This captures the full audio file content instead of just the first utterance. Tested with 2.5-minute recording: - Before: 77 characters (stopped at first pause) - After: 1566 characters (full transcription) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Implements token-based NFA (Nondeterministic Finite Automaton) system for compiling and matching regular grammars. Key components: - NFA data structures and builder (nfa.ts) - Grammar to NFA compiler (nfaCompiler.ts) - NFA interpreter for debugging and matching (nfaInterpreter.ts) - Comprehensive test suite with real grammars - Documentation (NFA_README.md) Features: - Token-based matching (words, not characters) - Epsilon closure computation - Wildcard capturing with type constraints - Grammar combination (sequence/choice) - Debug printing and tracing - Successfully compiles player grammar (303 states) and calendar grammar This provides foundation for: 1. DFA compilation (future optimization) 2. Grammar merging capabilities 3. Dynamic rule loading Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
Implements token-based NFA (Nondeterministic Finite Automaton) infrastructure for compiling and matching regular grammars.
This provides the foundation for:
start → preamble command postambleKey Components
nfa.ts: Core NFA data structures and builder
NFABuilderfor constructing NFAscombineNFAs()for sequence and choice operationsnfaCompiler.ts: Compiles Grammar to NFA
nfaInterpreter.ts: Interprets/runs NFAs for debugging
Features
✅ Token-based matching (words as atomic units, not characters)
✅ Epsilon transitions with proper closure computation
✅ Wildcard capturing with type constraints
✅ Optional parts using epsilon transitions
✅ Grammar combination (sequence and choice operations)
✅ Debug printing and execution tracing
Testing
Documentation
See NFA_README.md for:
Test Plan
All tests passing ✓
🤖 Generated with Claude Code