EXIF UserComment tag writes Unicode text in UTF-16LE instead of UTF-16BE as specified in the standard #2906

nirvash · 2025-03-26T23:46:58Z

Prerequisites

I have written a descriptive issue title
I have verified that I am running the latest version of ImageSharp
I have verified if the problem exist in both DEBUG and RELEASE mode
I have searched open and closed issues to ensure it has not already been reported

ImageSharp version

3.1.7

Other ImageSharp packages and versions

None (only the main package is installed)

Environment (Operating system, version and so on)

Windows 11

.NET Framework version

9.0.200

Description

When writing Unicode text to the EXIF UserComment tag, ImageSharp is using UTF-16LE encoding instead of UTF-16BE encoding as required by the EXIF specification.

According to the EXIF specification (JEITA CP-3451, section 4.6.5 "User Comments"), when using Unicode encoding, the first 8 bytes should be "UNICODE\0" followed by text encoded in UTF-16BE.

However, ImageSharp is storing the text in UTF-16LE, which causes the UserComment to be displayed incorrectly in many image viewers and editors.

The root cause of the issue is in the ExifEncodedStringHelpers.cs file (lines 53-60), where Encoding.Unicode is used which represents UTF-16LE. It should be using Encoding.BigEndianUnicode instead.

Additionally, the same issue exists when reading UserComment values. In TryGetEncodedStringValue method, when detecting "UNICODE" encoding, it also uses Encoding.Unicode (UTF-16LE) instead of Encoding.BigEndianUnicode (UTF-16BE) to decode the value, which means ImageSharp cannot correctly read UserComment values that are properly encoded according to the EXIF specification.

This issue requires careful consideration for backward compatibility. Users who have been writing UserComment tags with previous versions of ImageSharp have data encoded in UTF-16LE format. If the fix simply switches to UTF-16BE encoding for both reading and writing, those existing images would have their UserComment values read incorrectly. A potential solution might involve detecting the byte order or providing migration options to ensure both existing and new data can be handled correctly.

Steps to Reproduce

You can reproduce this issue with the following code, or check the complete reproduction repository at: https://github.com/nirvash/ImageSharpExifUserCommentEncodingBug

using SixLabors.ImageSharp;
using SixLabors.ImageSharp.Metadata.Profiles.Exif;

// Load an image
using var image = Image.Load("sample.jpg");

// Create EXIF profile if it doesn't exist
var exif = image.Metadata.ExifProfile ?? new ExifProfile();

// Set Unicode text in UserComment
exif.SetValue(ExifTag.UserComment, "Hello World! こんにちは世界");

// Apply EXIF profile to the image
image.Metadata.ExifProfile = exif;

// Save the image
image.Save("output.jpg");

When examining the UserComment value in the saved image file, you can see that the text is encoded in UTF-16LE instead of UTF-16BE.
When checking with a binary editor, after the "UNICODE\0" header, the letter "H" is stored as 48 00 instead of 00 48 (which is the UTF-16LE byte order).
According to the EXIF specification, it should be stored in UTF-16BE encoding with the byte order 00 48 for the letter "H".

Images

output.zip

The text was updated successfully, but these errors were encountered:

nirvash added the needs triage label Mar 26, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EXIF UserComment tag writes Unicode text in UTF-16LE instead of UTF-16BE as specified in the standard #2906

EXIF UserComment tag writes Unicode text in UTF-16LE instead of UTF-16BE as specified in the standard #2906

nirvash commented Mar 26, 2025