Skip to content

Fix Unicode normalization for LLM-generated simplified articles#483

Merged
mircealungu merged 1 commit intomasterfrom
fix/unicode-normalization-simplified-articles
Feb 12, 2026
Merged

Fix Unicode normalization for LLM-generated simplified articles#483
mircealungu merged 1 commit intomasterfrom
fix/unicode-normalization-simplified-articles

Conversation

@mircealungu
Copy link
Member

Summary

  • Fix visual rendering issues with Romanian diacritics (ă, â, etc.) in simplified articles
  • LLM-generated content was stored in NFD (decomposed) Unicode form, causing diacritics to appear with spacing issues
  • Add NFC normalization to create_simplified_version() to ensure proper Unicode encoding
  • Include migration script to fix existing affected articles

Test plan

  • Run migration script on staging: source ~/.venvs/z_env/bin/activate && python -m tools.migrations.26-02-12--normalize_simplified_article_unicode
  • Verify existing Romanian simplified articles display correctly after migration
  • Create a new simplified article in Romanian and verify diacritics render properly

🤖 Generated with Claude Code

LLM-generated content (from Anthropic) was being stored without Unicode
normalization, causing diacritics like Romanian ă, â to render incorrectly.
The text was in NFD (decomposed) form where diacritics are separate
combining characters, causing visual spacing issues.

- Add NFC normalization to create_simplified_version() for title, content,
  and summary
- Add migration script to fix existing simplified articles with decomposed
  Unicode characters

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@github-actions
Copy link

ArchLens detected architectural changes in the following views:
diff

@mircealungu mircealungu merged commit 5f49c8a into master Feb 12, 2026
1 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant