You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have verified that I am running the latest version of ImageSharp
I have verified if the problem exist in both DEBUG and RELEASE mode
I have searched open and closed issues to ensure it has not already been reported
ImageSharp version
3.1.7
Other ImageSharp packages and versions
None (only the main package is installed)
Environment (Operating system, version and so on)
Windows 11
.NET Framework version
9.0.200
Description
When writing Unicode text to the EXIF UserComment tag, ImageSharp is using UTF-16LE encoding instead of UTF-16BE encoding as required by the EXIF specification.
According to the EXIF specification (JEITA CP-3451, section 4.6.5 "User Comments"), when using Unicode encoding, the first 8 bytes should be "UNICODE\0" followed by text encoded in UTF-16BE.
However, ImageSharp is storing the text in UTF-16LE, which causes the UserComment to be displayed incorrectly in many image viewers and editors.
The root cause of the issue is in the ExifEncodedStringHelpers.cs file (lines 53-60), where Encoding.Unicode is used which represents UTF-16LE. It should be using Encoding.BigEndianUnicode instead.
Additionally, the same issue exists when reading UserComment values. In TryGetEncodedStringValue method, when detecting "UNICODE" encoding, it also uses Encoding.Unicode (UTF-16LE) instead of Encoding.BigEndianUnicode (UTF-16BE) to decode the value, which means ImageSharp cannot correctly read UserComment values that are properly encoded according to the EXIF specification.
This issue requires careful consideration for backward compatibility. Users who have been writing UserComment tags with previous versions of ImageSharp have data encoded in UTF-16LE format. If the fix simply switches to UTF-16BE encoding for both reading and writing, those existing images would have their UserComment values read incorrectly. A potential solution might involve detecting the byte order or providing migration options to ensure both existing and new data can be handled correctly.
using SixLabors.ImageSharp;
using SixLabors.ImageSharp.Metadata.Profiles.Exif;
// Load an image
using var image = Image.Load("sample.jpg");
// Create EXIF profile if it doesn't exist
var exif = image.Metadata.ExifProfile ?? new ExifProfile();
// Set Unicode text in UserComment
exif.SetValue(ExifTag.UserComment, "Hello World! こんにちは世界");
// Apply EXIF profile to the image
image.Metadata.ExifProfile = exif;
// Save the image
image.Save("output.jpg");
When examining the UserComment value in the saved image file, you can see that the text is encoded in UTF-16LE instead of UTF-16BE.
When checking with a binary editor, after the "UNICODE\0" header, the letter "H" is stored as 48 00 instead of 00 48 (which is the UTF-16LE byte order).
According to the EXIF specification, it should be stored in UTF-16BE encoding with the byte order 00 48 for the letter "H".
Prerequisites
DEBUG
andRELEASE
modeImageSharp version
3.1.7
Other ImageSharp packages and versions
None (only the main package is installed)
Environment (Operating system, version and so on)
Windows 11
.NET Framework version
9.0.200
Description
When writing Unicode text to the EXIF UserComment tag, ImageSharp is using UTF-16LE encoding instead of UTF-16BE encoding as required by the EXIF specification.
According to the EXIF specification (JEITA CP-3451, section 4.6.5 "User Comments"), when using Unicode encoding, the first 8 bytes should be "UNICODE\0" followed by text encoded in UTF-16BE.
However, ImageSharp is storing the text in UTF-16LE, which causes the UserComment to be displayed incorrectly in many image viewers and editors.
The root cause of the issue is in the ExifEncodedStringHelpers.cs file (lines 53-60), where
Encoding.Unicode
is used which represents UTF-16LE. It should be usingEncoding.BigEndianUnicode
instead.Additionally, the same issue exists when reading UserComment values. In TryGetEncodedStringValue method, when detecting "UNICODE" encoding, it also uses
Encoding.Unicode
(UTF-16LE) instead ofEncoding.BigEndianUnicode
(UTF-16BE) to decode the value, which means ImageSharp cannot correctly read UserComment values that are properly encoded according to the EXIF specification.This issue requires careful consideration for backward compatibility. Users who have been writing UserComment tags with previous versions of ImageSharp have data encoded in UTF-16LE format. If the fix simply switches to UTF-16BE encoding for both reading and writing, those existing images would have their UserComment values read incorrectly. A potential solution might involve detecting the byte order or providing migration options to ensure both existing and new data can be handled correctly.
Steps to Reproduce
You can reproduce this issue with the following code, or check the complete reproduction repository at: https://github.com/nirvash/ImageSharpExifUserCommentEncodingBug
When examining the UserComment value in the saved image file, you can see that the text is encoded in UTF-16LE instead of UTF-16BE.
When checking with a binary editor, after the "UNICODE\0" header, the letter "H" is stored as 48 00 instead of 00 48 (which is the UTF-16LE byte order).
According to the EXIF specification, it should be stored in UTF-16BE encoding with the byte order 00 48 for the letter "H".
Images
output.zip
The text was updated successfully, but these errors were encountered: