Skip to content

Conversation

@dab246
Copy link
Member

@dab246 dab246 commented Dec 3, 2025

Descriptions

This is the second proposed solution to address the issue where email content fails to display after applying the HTML sanitization process. This solution focuses on strengthening the sanitize_html library, ensuring that the sanitizer removes or ignores only unsafe tags and attributes without stripping valid email content.

Dependency

Related

#4184

Summary by CodeRabbit

  • Bug Fixes

    • Centralized HTML sanitization for more consistent and secure processing.
    • Improved table rendering by forcing normal word-break on table cells.
    • Anchor handling refined to only process links with href attributes.
    • Email content transformation pipeline updated to include the new sanitizer step.
  • Tests

    • Expanded and reorganized HTML sanitization tests covering event attributes, href/src safety, CSS/style safety, XSS vectors, and form-related cases.
    • Updated expectations for anchor attribute ordering in email transformation tests.

✏️ Tip: You can customize this high-level summary in your review settings.

@github-actions
Copy link

github-actions bot commented Dec 3, 2025

This PR has been deployed to https://linagora.github.io/tmail-flutter/4188.

@chibenwa
Copy link
Member

chibenwa commented Dec 3, 2025

So we are considering switching to a whitekist strategy to a black list one?

Sorry I do not buy it.

I cannot trust us to know every possible unsafe html tag or attribute.

What's wrong with option 1

Cc @Crash--

@dab246
Copy link
Member Author

dab246 commented Dec 3, 2025

So we are considering switching to a whitekist strategy to a black list one?

Sorry I do not buy it.

We will continue using the whitelist, which includes the originally allowed tags. However, for tags that are not on the whitelist, we will no longer remove their entire content or child nodes as in the previous logic. Instead, we will remove only the disallowed tag itself and continue sanitizing its inner elements.

I cannot trust us to know every possible unsafe html tag or attribute.

That is obvious, but we always maintain a broader set of safe tags (based on HTML standards and Google’s recommendations). We also do not need to enforce overly strict constraints on this matter.

What's wrong with option 1

Option 1 is not incorrect, but its limitation is that whenever a new tag appears that is not included in the whitelist, it will be removed entirely (both the tag itself and all elements it contains). This affects how emails are rendered and makes maintenance more difficult and time-consuming.

@chibenwa
Copy link
Member

chibenwa commented Dec 3, 2025

Thanks for the clarification

@dab246
Copy link
Member Author

dab246 commented Dec 3, 2025

See the document at linagora/dart-neats@c02d138#diff-608d87b363f587deb3bb659db286a66fc82b0ea1ead0d33e22a9ca9230b4ad44 to better understand this improvement.

@coderabbitai
Copy link

coderabbitai bot commented Dec 8, 2025

Walkthrough

This PR centralizes HTML sanitization by replacing public allowlists with a private SanitizeHtml instance, updates the sanitize_html dependency ref to a new engine, strengthens and reorganizes sanitization tests, adjusts HTML generation CSS for tables, narrows anchor processing to only anchors with href, and integrates the standardized sanitizer into the email transformation pipeline and calendar-event transformer configuration. Tests and controller code were updated to reflect attribute-order and output differences from the new sanitizer behavior.

Changes

  • Core sanitizer refactoring — core/lib/presentation/utils/html_transformer/text/standardize_html_sanitizing_transformers.dart

    • Removed public allowlists mailAllowedHtmlAttributes and mailAllowedHtmlTags.
    • Added private static final SanitizeHtml _sanitizer = SanitizeHtml().
    • process now returns early on empty input and delegates to the shared _sanitizer.
  • HTML utilities enhancement — core/lib/utils/html/html_utils.dart

    • Added CSS rule: word-break: normal !important for table, td, th in generated HTML documents.
  • Dependency configuration — core/pubspec.yaml

    • Updated sanitize_html git ref from support_mail to feat/new-sanitize-engine.
  • Anchor selector narrowing — core/lib/presentation/utils/html_transformer/dom/sanitize_hyper_link_tag_in_html_transformers.dart

    • Selector changed from 'a' to 'a[href]' so only anchors with href are processed.
  • Sanitizer test expansion — core/test/utils/standardize_html_sanitizing_transformers_test.dart

    • Reordered imports, added sanitize helper, expanded/changed allowed tag and event-attribute fixtures, reorganized tests and added extensive scenarios covering event-attribute removal, href/src sanitization, dangerous-tag unwrapping, CSS/HTML safety, forms, and real-world XSS/phishing vectors.
  • Email content transformation integration — lib/features/email/data/local/html_analyzer.dart

    • Inserted StandardizeHtmlSanitizingTransformers into TransformConfiguration.fromTextTransformers for EmailContentType.textPlain, positioned before SanitizeAutolinkHtmlTransformers.
  • Calendar event transformation — lib/features/email/presentation/controller/single_email_controller.dart

    • Replaced TransformConfiguration.fromTextTransformers(...) with TransformConfiguration.create(...), supplying the same text transformers and adding customDomTransformers (SanitizeHyperLinkTagInHtmlTransformer(useTooltip: PlatformInfo.isWeb)).
  • Calendar event test updates — test/features/email/presentation/controller/single_email_controller_test.dart

    • Updated expected HTML in tests to reflect changed anchor attribute ordering produced by the new/updated transformers.

Estimated code review effort

3 (Moderate) — ~25 minutes

  • Verify sanitizer behavior parity where required.
  • Confirm new sanitize_html ref compatibility.
  • Review expanded tests for correctness and intent alignment.
  • Validate transformer ordering and resulting output for calendar events and email rendering.

Poem

A rabbit hops through HTML streams,
Sanitizing all those XSS dreams,
With central guard and tests so keen,
The safest markup you've ever seen!

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly relates to the main objective of fixing email content not being displayed due to HTML sanitization, which aligns with the core purpose of the changeset.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch mu-missing-email-content

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@chibenwa
Copy link
Member

chibenwa commented Dec 8, 2025

@coderabbitai review

@coderabbitai
Copy link

coderabbitai bot commented Dec 8, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cb47a6f and 935f55c.

⛔ Files ignored due to path filters (4)
  • contact/pubspec.lock is excluded by !**/*.lock
  • core/pubspec.lock is excluded by !**/*.lock
  • model/pubspec.lock is excluded by !**/*.lock
  • pubspec.lock is excluded by !**/*.lock
📒 Files selected for processing (7)
  • core/lib/presentation/utils/html_transformer/text/standardize_html_sanitizing_transformers.dart (1 hunks)
  • core/lib/utils/html/html_utils.dart (1 hunks)
  • core/pubspec.yaml (1 hunks)
  • core/test/utils/standardize_html_sanitizing_transformers_test.dart (3 hunks)
  • lib/features/email/data/local/html_analyzer.dart (2 hunks)
  • lib/features/email/presentation/controller/single_email_controller.dart (1 hunks)
  • test/features/email/presentation/controller/single_email_controller_test.dart (0 hunks)
💤 Files with no reviewable changes (1)
  • test/features/email/presentation/controller/single_email_controller_test.dart
🔇 Additional comments (9)
core/lib/utils/html/html_utils.dart (1)

154-156: LGTM! CSS enhancement for table word-breaking.

The added CSS rule word-break: normal !important for table elements complements the existing white-space: normal !important rule and ensures consistent text handling in tables across different email clients.

core/lib/presentation/utils/html_transformer/text/standardize_html_sanitizing_transformers.dart (2)

6-6: Verify thread-safety of the centralized sanitizer instance.

The static final _sanitizer instance will be shared across all invocations of StandardizeHtmlSanitizingTransformers.process(). This is generally safe for stateless sanitizers, but please confirm that the SanitizeHtml class is thread-safe and doesn't maintain mutable state that could lead to race conditions when processing multiple emails concurrently.


11-14: Verify the default sanitization configuration matches previous behavior.

The refactored implementation now relies entirely on the default configuration of SanitizeHtml() instead of explicit allowlists. This is cleaner but changes the sanitization behavior.

Based on the PR discussion, the new engine preserves the whitelist but changes behavior for non-whitelisted tags (unwrapping instead of removing). Please verify:

  1. The default allowlist in the new engine matches the previously used mailAllowedHtmlTags and mailAllowedHtmlAttributes
  2. The unwrapping behavior for unknown tags doesn't introduce security risks
  3. The change has been tested with real-world email content, especially content with non-standard tags
core/test/utils/standardize_html_sanitizing_transformers_test.dart (4)

82-83: LGTM! Good test helper abstraction.

The sanitize() helper function reduces duplication and makes tests more readable by encapsulating the common pattern of calling transformer.process() and trimming the result.


86-130: LGTM! Comprehensive XSS protection tests.

The test group thoroughly validates removal of event handlers across all DOM events and element types. The systematic approach testing every event attribute on every allowed tag ensures robust XSS protection.


226-246: LGTM! Thorough validation of nested dangerous content removal.

This test effectively verifies that the sanitizer removes all dangerous vectors (onclick, script tags, onerror, javascript: URLs) while preserving safe structural elements and content. The assertions checking for both presence and absence of specific content are well-chosen.


249-329: LGTM! Excellent coverage of form-based attack vectors.

These test groups comprehensively validate protection against:

  • CSRF token stealing via hidden inputs
  • Credential phishing via form submissions
  • File upload exploitation

The tests correctly verify that forms are unwrapped (preserving safe nested content) while dangerous attributes and form inputs are removed.

lib/features/email/presentation/controller/single_email_controller.dart (1)

1743-1748: The transformer reordering is already validated by comprehensive tests.

The tests in single_email_controller_test.dart (lines 339-340 and 389-391) explicitly verify that calendar event parsing works correctly with the current transformer order:

  1. Test with auto-linking: Validates URLs convert to proper <a> tags and newlines become <br> tags
  2. Test with XSS removal: Validates that script tags and javascript: URLs are removed before autolink processing, then safe URLs are properly linkified

The expected output in both tests confirms the transformer order StandardizeHtmlSanitizingTransformers()SanitizeAutolinkUnescapeHtmlTransformer() is correct, as:

  • Dangerous content is sanitized first (scripts, javascript: URLs stripped)
  • Safe URLs are then converted to anchor tags
  • HTML entities and newlines are properly transformed

No action needed; existing tests cover all three concerns raised.

lib/features/email/data/local/html_analyzer.dart (1)

7-7: The transformer ordering is correct as implemented.

The pipeline for text/plain content—StandardizeHtmlSanitizingTransformers → SanitizeAutolinkHtmlTransformers → PersistPreformattedTextTransformer—produces the expected output:

  1. StandardizeHtmlSanitizingTransformers (first): Defensively sanitizes input using the sanitize_html package. For legitimate plain text emails, this is a no-op; for text containing stray/malicious HTML (edge case), it removes dangerous content before linkification.

  2. SanitizeAutolinkHtmlTransformers (second): Detects URLs and email addresses using the linkify library, converts them to HTML anchor tags, and escapes remaining plain text portions. The internal escapeHtml=true flag ensures plain text is properly escaped.

  3. PersistPreformattedTextTransformer (third): Preserves ASCII art and tables by wrapping them in styled divs; otherwise passes content through unchanged.

The code correctly handles:

  • Email addresses and URLs (auto-linked to <a> tags)
  • Preformatted content (code blocks, ASCII art preserved with CSS styling)
  • Stray HTML tags (sanitized away in first transformer)

@hoangdat
Copy link
Member

@chibenwa please also config rabbitai to review this PR linagora/dart-neats#1

@hoangdat
Copy link
Member

hoangdat commented Dec 10, 2025

@hoangdat
Copy link
Member

hoangdat commented Dec 10, 2025

  • link inside bluebar content from outlook be hidden
image

Test.eml.txt

@chibenwa
Copy link
Member

I enabled code rabbit on dart-neats repo

@dab246
Copy link
Member Author

dab246 commented Dec 10, 2025

  • content be overlap
  • link inside bluebar content from outlook be hidden

Fixed

Screen.Recording.2025-12-10.at.16.04.19.mov

@chibenwa
Copy link
Member

@coderabbitai please review

@coderabbitai
Copy link

coderabbitai bot commented Dec 12, 2025

@chibenwa: I'll review the changes in this PR.

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 935f55c and 4eb4416.

⛔ Files ignored due to path filters (4)
  • contact/pubspec.lock is excluded by !**/*.lock
  • core/pubspec.lock is excluded by !**/*.lock
  • model/pubspec.lock is excluded by !**/*.lock
  • pubspec.lock is excluded by !**/*.lock
📒 Files selected for processing (7)
  • core/lib/presentation/utils/html_transformer/dom/sanitize_hyper_link_tag_in_html_transformers.dart (1 hunks)
  • core/lib/presentation/utils/html_transformer/text/standardize_html_sanitizing_transformers.dart (1 hunks)
  • core/pubspec.yaml (1 hunks)
  • core/test/utils/standardize_html_sanitizing_transformers_test.dart (3 hunks)
  • lib/features/email/data/local/html_analyzer.dart (2 hunks)
  • lib/features/email/presentation/controller/single_email_controller.dart (1 hunks)
  • test/features/email/presentation/controller/single_email_controller_test.dart (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • core/pubspec.yaml
  • core/lib/presentation/utils/html_transformer/text/standardize_html_sanitizing_transformers.dart
  • lib/features/email/data/local/html_analyzer.dart
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2025-12-09T09:36:45.349Z
Learnt from: dab246
Repo: linagora/tmail-flutter PR: 4194
File: lib/features/manage_account/presentation/manage_account_dashboard_controller.dart:174-176
Timestamp: 2025-12-09T09:36:45.349Z
Learning: In Dart/Flutter projects using GetX, do not wrap getBinding<T>() calls in try/catch since they return null when not found. Only wrap Get.find<T>() calls in try/catch because they throw if a dependency is unavailable. When a binding is optional, consider checking Get.isRegistered<T>() or handling the null/exception path gracefully instead of blindly catching, and document the expectation for failure modes where a dependency may not be registered.

Applied to files:

  • core/lib/presentation/utils/html_transformer/dom/sanitize_hyper_link_tag_in_html_transformers.dart
  • test/features/email/presentation/controller/single_email_controller_test.dart
  • lib/features/email/presentation/controller/single_email_controller.dart
  • core/test/utils/standardize_html_sanitizing_transformers_test.dart
📚 Learning: 2025-12-09T12:47:45.861Z
Learnt from: zatteo
Repo: linagora/tmail-flutter PR: 4196
File: scribe/lib/scribe/ai/data/datasource_impl/ai_datasource_impl.dart:34-35
Timestamp: 2025-12-09T12:47:45.861Z
Learning: In the tmail-flutter repository, avoid suggesting replacing DioError with DioException unless there is a codebase-wide migration. This pattern applies to all Dart files; when reviewing, only propose a DioError-safe change (e.g., compatible error handling or conversion) unless a global migration is in place. Ensure consistency of DioError usage across modules and flag any deviations for a repo-wide decision.

Applied to files:

  • core/lib/presentation/utils/html_transformer/dom/sanitize_hyper_link_tag_in_html_transformers.dart
  • test/features/email/presentation/controller/single_email_controller_test.dart
  • lib/features/email/presentation/controller/single_email_controller.dart
  • core/test/utils/standardize_html_sanitizing_transformers_test.dart
📚 Learning: 2025-12-12T04:54:05.432Z
Learnt from: dab246
Repo: linagora/tmail-flutter PR: 4191
File: lib/features/email/presentation/extensions/handle_email_action_extension.dart:37-80
Timestamp: 2025-12-12T04:54:05.432Z
Learning: In lib/features/email/presentation/extensions/handle_email_action_extension.dart, the mailboxDashBoardController.selectedEmail should only be synchronized when isMobileThreadDisabled is true. This is intentional behavior and should not be changed to update selectedEmail in non-mobile or thread-enabled contexts.

Applied to files:

  • test/features/email/presentation/controller/single_email_controller_test.dart
  • lib/features/email/presentation/controller/single_email_controller.dart
🔇 Additional comments (10)
core/test/utils/standardize_html_sanitizing_transformers_test.dart (7)

86-100: Excellent comprehensive event attribute sanitization coverage.

The nested loops testing all 57 event attributes across all allowed tags provide robust coverage against event-handler-based XSS attacks. This systematic approach ensures no event handler slips through.


133-141: Verify javascript: href removal is complete.

The test expects javascript: hrefs to be removed, but only checks that the href attribute is gone. Consider verifying that the link content remains intact after sanitization (the test checks for <a id="id1">test</a> which looks correct).


164-178: Validate unknown tag unwrapping behavior.

The tests verify that unknown tags like <supress_time_adjustment> and <google-sheets-html-origin> are unwrapped while preserving their children. This aligns with Option 2's approach described in the PR objectives.


249-272: Strong form attribute sanitization coverage.

The comprehensive list of 14 dangerous form attributes and systematic testing ensures protection against CSRF/phishing attacks via form submission vectors.


274-291: Form child element removal is thorough.

Testing removal of input, button, textarea, select, and option elements prevents credential phishing and data exfiltration attacks.


368-440: Outstanding real-world attack vector coverage.

The tests cover credential phishing, CSRF token stealing, javascript/data URI forms, file upload exploitation, and mixed XSS vectors. This provides strong evidence of the sanitizer's security posture.


10-21: The allowedTags constant in the test is a test fixture used for iterating through tags in test cases, not a reference implementation list. The StandardizeHtmlSanitizingTransformers delegates to the external sanitize_html package from pub.dev, which defines its own tag allowlist. No tag allowlist is maintained in the application's implementation, so there is nothing to verify against.

Likely an incorrect or invalid review comment.

core/lib/presentation/utils/html_transformer/dom/sanitize_hyper_link_tag_in_html_transformers.dart (1)

21-21: Optimize selector to target only anchors with href.

The change from querySelectorAll('a') to querySelectorAll('a[href]') is a sensible optimization. Anchors without href don't represent hyperlinks and don't require the URL sanitization, target, and rel modifications this transformer applies.

lib/features/email/presentation/controller/single_email_controller.dart (1)

1743-1752: Standardize transformation pipeline for calendar events.

The migration from TransformConfiguration.fromTextTransformers to TransformConfiguration.create enables both text and DOM transformations for calendar event descriptions. This ensures consistent sanitization across all email content processing paths.

The configuration now:

  • Preserves the existing text transformers (autolink, sanitize, newline)
  • Adds DOM-level hyperlink sanitization with tooltip support for web

Based on learnings, this composite transformer approach provides defense-in-depth sanitization.

test/features/email/presentation/controller/single_email_controller_test.dart (1)

344-348: Test expectations updated to reflect new transformation order.

The updated expectations correctly reflect the new attribute order produced by the enhanced HTML transformation pipeline:

  • rel="noreferrer" now precedes the style attribute
  • target="_blank" follows the style attribute

The sanitization behavior remains correct:

  • Safe URLs are preserved and decorated with security attributes
  • Dangerous javascript: hrefs are removed (line 404 shows the href attribute stripped while preserving link text)

Also applies to: 398-405

Comment on lines +188 to +196
test('SHOULD remove dangerous JS text nodes inside HTML', () {
const html = 'hello <p>document.cookie</p> world';
expect(sanitize(html), equals('hello <p></p> world'));
});

test('SHOULD remove HTML comments entirely', () {
const html = '<div><!--test--><p>A</p><!--x--></div>';
expect(sanitize(html), equals('<div><p>A</p></div>'));
});
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Clarify "dangerous JS text nodes" test expectation.

Line 190 tests that text content document.cookie inside a <p> tag is removed, but this seems overly aggressive. Plain text that happens to contain JavaScript keywords shouldn't be stripped if it's not in a dangerous context.

Could you clarify the intended behavior? If the sanitizer removes any text matching JavaScript patterns, this might affect legitimate email content that references these terms.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants