Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Aug 15, 2025

This PR fixes an issue where Japanese and other multi-byte characters were appearing garbled when viewing PDF signature information in PDF software or browsers, while optimizing the encoding to only use HEX when necessary.

Problem

When using Japanese characters in signature information fields (reason, contactInfo, name, location), the characters would appear corrupted or garbled when viewing the signature details in PDF viewers. This happened because the code was using PDFString.of() which doesn't properly handle multi-byte Unicode characters.

Solution

Implemented conditional encoding that intelligently chooses the appropriate method based on the content:

  • ASCII-only text: Uses PDFString.of() for efficiency and cleaner output
  • Unicode/multi-byte text: Uses PDFHexString.fromText() to preserve character integrity

The implementation includes:

  • needsHexEncoding() helper function that detects non-ASCII characters using /[^\x00-\x7F]/
  • createPDFString() helper that selects the appropriate encoding method
  • Conditional encoding for all four signature information fields: Reason, ContactInfo, Name, and Location

Testing

  • Added comprehensive test cases for both ASCII and Unicode text scenarios
  • Verified ASCII text uses PDFString (more efficient)
  • Verified Unicode text uses PDFHexString (preserves characters)
  • Updated existing tests to expect correct encoding based on content
  • Japanese test case validates proper encoding and decoding: 日本語の理由, 連絡先@example.com, 田中太郎, 東京、日本

Example

ASCII text (efficient encoding):

Reason: (Simple ASCII reason)  // Uses PDFString.of()

Unicode text (preserves characters):

Reason: <FEFF65E5672C8A9E306E7406304630>  // Uses PDFHexString.fromText()

This approach provides optimal efficiency for common ASCII use cases while ensuring Unicode characters display correctly in PDF viewers. The change is fully backward compatible and compliant with PDF standards.

Fixes #267.


✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

- Changed PDFString.of() to PDFHexString.fromText() for reason, contactInfo, name, and location fields
- Added comprehensive test case for Japanese character encoding
- Updated existing tests to expect PDFHexString instead of PDFString
- All tests passing and linting clean

Co-authored-by: vbuch <5102057+vbuch@users.noreply.github.com>
Copilot AI changed the title [WIP] Garbled characters when Japanese is used for signature information Fix garbled Japanese characters in PDF signature information fields Aug 15, 2025
Copilot AI requested a review from vbuch August 15, 2025 14:47
@vbuch
Copy link
Owner

vbuch commented Aug 15, 2025

@copilot could we only HEX when that is needed (utf chars in the value)? Is that fine with the PDF standard?

Copy link
Contributor Author

Copilot AI commented Aug 15, 2025

@copilot could we only HEX when that is needed (utf chars in the value)? Is that fine with the PDF standard?

Yes, absolutely! I've implemented conditional encoding that only uses HEX when needed. The code now:

  • Uses PDFString.of() for ASCII-only text (more efficient, cleaner output)
  • Uses PDFHexString.fromText() only for Unicode/multi-byte characters

A helper function needsHexEncoding() detects if the string contains non-ASCII characters using /[^\x00-\x7F]/. This is fully compliant with the PDF standard and provides the best of both worlds.

Added comprehensive tests to verify both encoding paths work correctly. Commit: ed57f99

Copilot AI changed the title Fix garbled Japanese characters in PDF signature information fields Optimize PDF signature encoding: use HEX only when needed for Unicode characters Aug 15, 2025
@coveralls
Copy link

Coverage Status

coverage: 100.0%. remained the same
when pulling 269e26d on copilot/fix-267
into f54ca67 on develop.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Garbled characters when Japanese is used for signature information

3 participants