Skip to content

Conversation

@luandro
Copy link
Contributor

@luandro luandro commented Nov 27, 2025

Fixes #95

Investigate and document root causes for content being skipped during fetch:

1. Dead code: filterChangedPages imported but never used
2. Logic discrepancy: missing path change detection in filterChangedPages
3. Sub-page timeout issues: 10s timeout too aggressive
4. Insufficient logging: hard to diagnose skip reasons

Plan includes:
- Phase 1: Immediate fixes (detailed logging, increased timeout)
- Phase 2: Core logic fixes (update filterChangedPages function)
- Phase 3: Diagnostics (cache validation command)

Related to incremental sync feature added in 2f2a47a (2025-11-26)

Estimated timeline: 6-10 hours
…hase 1)

Phase 1 of fixing issue #95: Content being skipped during fetch

Changes:
1. Add detailed skip reason logging in generateBlocks.ts
   - Shows specific reason why each page is skipped (cache hit, timestamp, path change, etc.)
   - Helps diagnose incorrect skip behavior
   - Includes "should not skip" warnings for potential bugs

2. Enhanced sub-page timeout error logging in fetchNotionData.ts
   - Display error type and detailed message
   - Add helpful hints when timeout occurs
   - Better categorization of failures (timeout vs API error vs permission)

3. Increase sub-page fetch timeout from 10s to 30s
   - Handles slow Notion API responses better
   - Reduces false positive timeouts for complex pages
   - Particularly helpful for pages with large blocks or nested children

Benefits:
- Immediate diagnostic value for debugging skipped pages
- Fewer legitimate pages timing out and being skipped
- Clear visibility into incremental sync behavior

Related: ISSUE_95_PLAN.md
@digidem digidem deleted a comment from chatgpt-codex-connector bot Nov 27, 2025
@github-actions
Copy link
Contributor

github-actions bot commented Nov 27, 2025

🚀 Preview Deployment

Your documentation preview is ready!

Preview URL: https://pr-99.comapeo-docs.pages.dev

🔄 Content: Regenerated 5 pages from Notion (script changes detected)

💡 Tip: Add label fetch-all-pages to test with full content, or fetch-10-pages for broader coverage.

This preview will update automatically when you push new commits to this PR.


Built with commit dd58fea

…ditions

Fix logic bug in skip reason detection where unreachable branches
were incorrectly labeled as "should not skip" instead of being
identified as logic errors.

Problem:
- Inside `if (!needsProcessing)` block, we know ALL conditions
  for needsProcessing are FALSE due to boolean OR logic
- Previous labels like "should not skip" were misleading since
  those branches are mathematically unreachable
- Would never detect actual logic bugs in needsProcessing calculation

Solution:
- Change impossible condition messages to "🔴 ERROR:" prefix
- Add clear comments explaining unreachability
- Document that ONLY "unchanged since [timestamp]" is valid
- If ERROR appears in logs, indicates bug in needsProcessing logic

Benefits:
- Proper detection of logic bugs (if they exist)
- Clear distinction between valid skip (cache hit) vs impossible cases
- Better debugging when investigating Issue #95

Related: ISSUE_95_PLAN.md Phase 1 review
Fix missing path change check in filterChangedPages function to match
the inline logic in generateBlocks.ts. This prevents renamed/moved
pages from being incorrectly skipped during incremental sync.

Changes:
1. Add optional `getFilePath` callback parameter to filterChangedPages
   - Allows checking if page's output path has changed
   - Critical for detecting renamed/moved pages
   - Backward compatible (callback is optional)

2. Update logic to check path changes before timestamp comparison
   - If path changed, page needs processing even if timestamp same
   - Matches behavior of inline needsProcessing logic

3. Add comprehensive tests for new functionality
   - Test path change detection works correctly
   - Test backward compatibility without callback
   - Verify path-unchanged pages are not included

Benefits:
- Prevents skipping pages that were moved/renamed
- Matches the inline logic behavior in generateBlocks.ts
- Fully backward compatible (existing calls still work)
- Well tested with 2 new test cases

This fixes the logic discrepancy identified in Phase 1 review where
filterChangedPages was missing path change detection that existed
in the inline needsProcessing check.

Related: ISSUE_95_PLAN.md Phase 2
Detailed review of Phase 2 implementation identifying:

Critical Finding:
- filterChangedPages function is not used in production code
- Only imported but never called in generateBlocks.ts
- Actual fix was already in Phase 1 (inline logic)
- Phase 2 prepares function for future use

Quality Assessment:
- ✅ Implementation is technically correct
- ✅ No regressions found
- ✅ Good test coverage for happy path
- ⚠️ Some edge cases identified (empty paths, normalization)
- ⚠️ Limited practical impact (function unused)

Edge Cases Identified:
1. Empty string paths - could cause issues
2. Path normalization differences (./docs vs docs)
3. Null/undefined handling (already safe)
4. Case sensitivity (correct behavior)
5. Multiple output paths (handled correctly)

Recommendations:
1. Add documentation comment about unused status
2. Add empty path validation check
3. Consider path normalization
4. Add edge case tests

Bottom Line: Phase 2 is technically sound but has limited
practical impact since the function isn't used. The real fix
for Issue #95 was in Phase 1 (logging + timeout + inline logic).

Related: ISSUE_95_PLAN.md
Add documentation and safety checks based on Phase 2 review findings:

1. Add NOTE in JSDoc clarifying function is currently unused
   - Points to actual implementation in generateBlocks.ts:704-711
   - Explains function is maintained for testing and future refactoring
   - Prevents confusion about why function exists but isn't called

2. Add empty path validation check
   - Safety guard: if getFilePath returns empty/whitespace, regenerate
   - Prevents edge case where empty path could cause issues
   - Fail-safe approach: when in doubt, regenerate

Changes are defensive improvements that make the function more
robust without changing its core behavior. Empty path validation
prevents potential edge case identified in review.

Related: PHASE_2_REVIEW.md recommendations #1 and #2
Implement comprehensive cache validation tool to help diagnose
incremental sync issues and cache inconsistencies.

Features:
1. Verify all cached output files exist on disk
   - Detects missing files that should be regenerated
   - Reports which pages have missing outputs

2. Find orphaned markdown files
   - Scans docs/ directory for files not in cache
   - Helps identify manually created or old files
   - Warns about files that won't be updated

3. Verify script hash matches current code
   - Checks if scripts changed since last sync
   - Indicates if next run will do full rebuild
   - Shows hash mismatch details

4. Statistics and summary
   - Total pages in cache
   - Count of missing/orphaned files
   - Script hash status
   - Exit code 0 if valid, 1 if issues found

Usage:
  bun run notion:validate-cache           # Quick check
  bun run notion:validate-cache --verbose # Detailed output

Example Output:
  🔍 Validating Page Metadata Cache...
  ✅ All 156 cached files exist
  ⚠️  Found 3 orphaned markdown files
  ✅ Script hash matches

Benefits:
- Diagnose why pages aren't updating
- Find stale/orphaned content files
- Verify cache integrity
- Quick health check before/after sync

Related: ISSUE_95_PLAN.md Phase 3
Comprehensive improvements based on thorough code review:

1. Fix Dead Code - Remove Unused Import
   - Comment out filterChangedPages import in generateBlocks.ts
   - Add NOTE explaining it's not used (inline logic preferred)
   - Add TODO comment at inline logic location
   - Prevents confusion about why function exists but isn't called

2. Enhanced Timeout Documentation
   - Add comprehensive comments explaining 10s → 30s change
   - Document rationale: reduce false-positive timeouts
   - Note trade-off: slower failure vs fewer skipped pages
   - Add monitoring recommendation for timeout patterns
   - Reference Issue #95 and implementation date

3. Add Telemetry for Unreachable Branches
   - Add console.error() for impossible ERROR conditions
   - Prominent red warning if logic bug occurs
   - Actionable message directing to report issue
   - References specific line numbers for debugging
   - Makes logic bugs immediately visible in production

4. Fix Path Normalization Consistency
   - Add normalizePath() helper function
   - Normalize all paths before adding to Set
   - Normalize all paths before comparison
   - Eliminates false positives from format differences
   - Handles Windows backslashes, leading slashes

5. Document Race Condition Risk
   - Add comprehensive JSDoc warning
   - List safe vs unsafe times to run validation
   - Prevent inaccurate results from concurrent operations
   - Guides users on proper usage

Benefits:
- Eliminates all code review concerns
- Adds defensive programming
- Improves production observability
- Prevents future bugs
- Better documentation for maintainers

All changes are non-breaking and defensive in nature.

Related: Code review recommendations for Issue #95
@luandro luandro added the duplicate This issue or pull request already exists label Dec 14, 2025
@luandro
Copy link
Contributor Author

luandro commented Dec 14, 2025

This is a duplicate of #112

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

duplicate This issue or pull request already exists

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Some content is not being updated or skipped during fetch

3 participants