Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EXIF UserComment tag writes Unicode text in UTF-16LE instead of UTF-16BE as specified in the standard #2906

Open
4 tasks done
nirvash opened this issue Mar 26, 2025 · 0 comments

Comments

@nirvash
Copy link

nirvash commented Mar 26, 2025

Prerequisites

  • I have written a descriptive issue title
  • I have verified that I am running the latest version of ImageSharp
  • I have verified if the problem exist in both DEBUG and RELEASE mode
  • I have searched open and closed issues to ensure it has not already been reported

ImageSharp version

3.1.7

Other ImageSharp packages and versions

None (only the main package is installed)

Environment (Operating system, version and so on)

Windows 11

.NET Framework version

9.0.200

Description

When writing Unicode text to the EXIF UserComment tag, ImageSharp is using UTF-16LE encoding instead of UTF-16BE encoding as required by the EXIF specification.

According to the EXIF specification (JEITA CP-3451, section 4.6.5 "User Comments"), when using Unicode encoding, the first 8 bytes should be "UNICODE\0" followed by text encoded in UTF-16BE.

However, ImageSharp is storing the text in UTF-16LE, which causes the UserComment to be displayed incorrectly in many image viewers and editors.

The root cause of the issue is in the ExifEncodedStringHelpers.cs file (lines 53-60), where Encoding.Unicode is used which represents UTF-16LE. It should be using Encoding.BigEndianUnicode instead.

Additionally, the same issue exists when reading UserComment values. In TryGetEncodedStringValue method, when detecting "UNICODE" encoding, it also uses Encoding.Unicode (UTF-16LE) instead of Encoding.BigEndianUnicode (UTF-16BE) to decode the value, which means ImageSharp cannot correctly read UserComment values that are properly encoded according to the EXIF specification.

This issue requires careful consideration for backward compatibility. Users who have been writing UserComment tags with previous versions of ImageSharp have data encoded in UTF-16LE format. If the fix simply switches to UTF-16BE encoding for both reading and writing, those existing images would have their UserComment values read incorrectly. A potential solution might involve detecting the byte order or providing migration options to ensure both existing and new data can be handled correctly.

Steps to Reproduce

You can reproduce this issue with the following code, or check the complete reproduction repository at: https://github.com/nirvash/ImageSharpExifUserCommentEncodingBug

using SixLabors.ImageSharp;
using SixLabors.ImageSharp.Metadata.Profiles.Exif;

// Load an image
using var image = Image.Load("sample.jpg");

// Create EXIF profile if it doesn't exist
var exif = image.Metadata.ExifProfile ?? new ExifProfile();

// Set Unicode text in UserComment
exif.SetValue(ExifTag.UserComment, "Hello World! こんにちは世界");

// Apply EXIF profile to the image
image.Metadata.ExifProfile = exif;

// Save the image
image.Save("output.jpg");

When examining the UserComment value in the saved image file, you can see that the text is encoded in UTF-16LE instead of UTF-16BE.
When checking with a binary editor, after the "UNICODE\0" header, the letter "H" is stored as 48 00 instead of 00 48 (which is the UTF-16LE byte order).
According to the EXIF specification, it should be stored in UTF-16BE encoding with the byte order 00 48 for the letter "H".

Images

output.zip

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant