Skip to content

feat: Implement architectural improvements (Items 7-10)#53

Open
mingjerli wants to merge 1 commit intomainfrom
fix/mechanical-review-fixes
Open

feat: Implement architectural improvements (Items 7-10)#53
mingjerli wants to merge 1 commit intomainfrom
fix/mechanical-review-fixes

Conversation

@mingjerli
Copy link
Owner

Summary

This PR implements the architectural improvements outlined in the design plan (Items 7-10):

Item 7: Path Validation

  • New path_validation.py module with PathValidator class
  • TOCTOU-safe file reading via _safe_read_sql_file()
  • Symlink protection with opt-in allow_symlinks parameter
  • Windows reserved name detection (CON, PRN, NUL, etc.)
  • Unicode normalization for homoglyph attack prevention
  • 100% test coverage (71 tests)

Item 10: Prompt Injection Mitigation

  • New prompt_sanitization.py module with 4-layer defense
  • Input sanitization with tag escaping (not removal)
  • Unicode NFKC normalization for Cyrillic bypass prevention
  • Output validation with semantic relevance checking
  • sqlglot-based SQL validation for destructive operations
  • Environment variable CLGRAPH_DISABLE_PROMPT_SANITIZATION for debugging
  • 95% test coverage (100 tests)

Item 9: File Splitting

  • Extract lineage_utils.py from lineage_builder.py (~592 lines)
  • Extract sql_column_tracer.py from lineage_builder.py (~296 lines)
  • Extract tvf_registry.py from query_parser.py (~72 lines)
  • Maintain backward compatibility via re-exports

Item 8: Pipeline Decomposition

  • Extract LineageTracer component (~400 lines)
  • Extract MetadataManager component (~185 lines)
  • Extract PipelineValidator component (~169 lines)
  • Extract SubpipelineBuilder component (~183 lines)
  • Pipeline now uses facade pattern with lazy initialization
  • All 1,052 existing tests pass without modification

File Size Reductions

File Before After
pipeline.py 2,795 2,426
lineage_builder.py 3,419 2,666
query_parser.py 2,354 2,313

Test Plan

  • All 1,052 existing tests pass
  • 71 new path validation tests (100% coverage)
  • 100 new prompt sanitization tests (95% coverage)
  • 35 new module extraction tests
  • 68 new component tests
  • ruff check passes
  • ruff format passes

🤖 Generated with Claude Code

## Item 7: Path Validation
- Add path_validation.py with PathValidator class
- TOCTOU-safe file reading via _safe_read_sql_file()
- Symlink protection with opt-in allow_symlinks parameter
- Windows reserved name detection
- Unicode normalization for homoglyph attack prevention
- 100% test coverage (71 tests)

## Item 10: Prompt Injection Mitigation
- Add prompt_sanitization.py with 4-layer defense
- Input sanitization with tag escaping (not removal)
- Unicode NFKC normalization for Cyrillic bypass prevention
- Output validation with semantic relevance checking
- sqlglot-based SQL validation for destructive operations
- Environment variable CLGRAPH_DISABLE_PROMPT_SANITIZATION for debugging
- 95% test coverage (100 tests)

## Item 9: File Splitting
- Extract lineage_utils.py from lineage_builder.py (~592 lines)
- Extract sql_column_tracer.py from lineage_builder.py (~296 lines)
- Extract tvf_registry.py from query_parser.py (~72 lines)
- Maintain backward compatibility via re-exports

## Item 8: Pipeline Decomposition
- Extract LineageTracer component (~400 lines)
- Extract MetadataManager component (~185 lines)
- Extract PipelineValidator component (~169 lines)
- Extract SubpipelineBuilder component (~183 lines)
- Pipeline now uses facade pattern with lazy initialization
- All 1,052 existing tests pass without modification

File size reductions:
- pipeline.py: 2,795 → 2,426 lines
- lineage_builder.py: 3,419 → 2,666 lines
- query_parser.py: 2,354 → 2,313 lines
@mingjerli mingjerli force-pushed the fix/mechanical-review-fixes branch from a1b2e5f to 9715afd Compare February 7, 2026 00:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant