Skip to content

Encoding::InvalidByteSequenceError with json 2.8.1 #697

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jterapin opened this issue Nov 6, 2024 · 10 comments · Fixed by #702
Closed

Encoding::InvalidByteSequenceError with json 2.8.1 #697

jterapin opened this issue Nov 6, 2024 · 10 comments · Fixed by #702

Comments

@jterapin
Copy link

jterapin commented Nov 6, 2024

Hi again!

When using json 2.8.1 with Ruby 3.2.0 (for additional context, we use CodeBuild instances) - been running into the following error:

Encoding::InvalidByteSequenceError: "\xE2" on US-ASCII (Encoding::InvalidByteSequenceError)
/codebuild/output/src3972890485/src/.gems/ruby/3.2.0/gems/json-2.8.1/lib/json/common.rb:204:in `encode'
/codebuild/output/src3972890485/src/.gems/ruby/3.2.0/gems/json-2.8.1/lib/json/common.rb:204:in `parse'
/codebuild/output/src3972890485/src/.gems/ruby/3.2.0/gems/json-2.8.1/lib/json/common.rb:204:in `parse'
/codebuild/output/src3972890485/src/.gems/ruby/3.2.0/gems/json-2.8.1/lib/json/common.rb:710:in `load'
/codebuild/output/src3972890485/src/aws-sdk-ruby/build_tools/services.rb:93:in `load_docs'

The load_docs referred above is here: https://github.com/aws/aws-sdk-ruby/blob/version-3/build_tools/services.rb#L93
Could any of the recent changes have effected this? Reverting back to json 2.7.6 has fixed this for us.

InvalidByteSequenceError: "\xE2" on US-ASCII feels like its trying to read it as ASCII instead of UTF-8.

Any guidance will be appreciated. Thanks much!!!

@byroot
Copy link
Member

byroot commented Nov 7, 2024

Yes, I'm making the library progressively stricter about encoding, this one I think I should just note in the CHANGELOG.

InvalidByteSequenceError: "\xE2" on US-ASCII

This means you are passing an UTF-8 strings, but it's encoded as ASCII, looking at your code, it comes directly from File.read, so somehow you must have Encoding.default_external == Encoding::ASCII.

ruby -e 'Encoding.default_external = Encoding::ASCII; p File.read("/etc/passwd").encoding'
#<Encoding:US-ASCII>

I think you should check your $LANG environment variable. But yes, either way you are passing an invalid string to JSON so it's expected it raises. We only have a small exception for binary strings but that's it.

@byroot
Copy link
Member

byroot commented Nov 7, 2024

A quick workaround can also be:

JSON.load(File.read(model_path('docs-2.json', models_dir), encoding: Encoding::UTF_8))

casperisfine pushed a commit to casperisfine/json that referenced this issue Nov 7, 2024
Fix: ruby#697

This way even if `Encoding.default_external` is set to a weird value
the document will be parsed just fine.
matzbot pushed a commit to ruby/ruby that referenced this issue Nov 11, 2024
Fix: ruby/json#697

This way even if `Encoding.default_external` is set to a weird value
the document will be parsed just fine.

ruby/json@3a8505a8fa
@jterapin
Copy link
Author

Hi! Just wondering on the ETA on the gem update with this fix? TIA

@byroot
Copy link
Member

byroot commented Nov 12, 2024

I'm just waiting a few days to see if some other issues are reported.

@byroot
Copy link
Member

byroot commented Nov 14, 2024

2.8.2 was released.

@jterapin
Copy link
Author

Hello @byroot - it's me again. We used the latest version of the json gem and still encountered the same error in our CodeBuild project (see below). I may have to make some configuration on the CodeBuild project settings (based on reading this post but before I look into doing that - I wanted to double check with you. Thanks again.

/codebuild/output/src2789127840/src/aws-sdk-ruby/build_tools/services.rb:188: warning: previous definition of Services was here
rake aborted!
Encoding::InvalidByteSequenceError: "\xE2" on US-ASCII (Encoding::InvalidByteSequenceError)
/codebuild/output/src2789127840/src/.gems/ruby/3.2.0/gems/json-2.8.2/lib/json/common.rb:205:in `encode'
/codebuild/output/src2789127840/src/.gems/ruby/3.2.0/gems/json-2.8.2/lib/json/common.rb:205:in `parse'
/codebuild/output/src2789127840/src/.gems/ruby/3.2.0/gems/json-2.8.2/lib/json/common.rb:205:in `parse'
/codebuild/output/src2789127840/src/.gems/ruby/3.2.0/gems/json-2.8.2/lib/json/common.rb:711:in `load'
/codebuild/output/src2789127840/src/aws-sdk-ruby/build_tools/services.rb:93:in `load_docs'
/codebuild/output/src2789127840/src/aws-sdk-ruby/build_tools/services.rb:70:in `build_service'
/codebuild/output/src2789127840/src/aws-sdk-ruby/build_tools/services.rb:51:in `block in services'
/codebuild/output/src2789127840/src/aws-sdk-ruby/build_tools/services.rb:50:in `each'
/codebuild/output/src2789127840/src/aws-sdk-ruby/build_tools/services.rb:50:in `inject'
/codebuild/output/src2789127840/src/aws-sdk-ruby/build_tools/services.rb:50:in `services'
/codebuild/output/src2789127840/src/aws-sdk-ruby/build_tools/services.rb:42:in `each'
/codebuild/output/src2789127840/src/aws-sdk-ruby/tasks/build.rake:7:in `block in <top (required)>'
tasks/sdk.rake:60:in `block in <top (required)>'
/codebuild/output/src2789127840/src/releaseTools/lib/rake_code_build_output.rb:8:in `top_level'
/codebuild/output/src2789127840/src/.gems/ruby/3.2.0/gems/rake-13.2.1/exe/rake:27:in `<top (required)>'
/usr/local/bin/bundle:25:in `load'
/usr/local/bin/bundle:25:in `<main>'
Tasks: TOP => build
(See full trace by running task with --trace)

@byroot
Copy link
Member

byroot commented Nov 15, 2024

The backtrace clearly shows you are not using .load_file.

@jterapin
Copy link
Author

Aaaah, thanks for catching that.

headius pushed a commit to headius/json that referenced this issue Nov 20, 2024
Fix: ruby#697

This way even if `Encoding.default_external` is set to a weird value
the document will be parsed just fine.
ojab added a commit to ojab/danger that referenced this issue Feb 6, 2025
`json` gem was made more strict wrt input string encoding [0] and if
default locale is non-unicode (for example, default self-hosted GHA
runner image in [1] does not have unicode locales) parsing could fail
with `Encoding::InvalidByteSequenceError`

[0] ruby/json#697
[1] https://github.com/actions/actions-runner-controller
@jterapin
Copy link
Author

Hello from the AWS SDK for Ruby. Hope you are well, @byroot ~

We have a customer who is using the json gem with Ruby 3.4 (within Ruby lambda environment) and running into a similar issue (same error). Further context: aws/aws-sdk-ruby#3231

I was able to reproduce the same error:

/var/lang/lib/ruby/3.4.0/json/common.rb:221:in 'String#encode': "\xE2" on US-ASCII (Encoding::InvalidByteSequenceError)
	from /var/lang/lib/ruby/3.4.0/json/common.rb:221:in 'JSON::Ext::Parser.parse'
	from /var/lang/lib/ruby/3.4.0/json/common.rb:221:in 'JSON.parse'
	from /var/lang/lib/ruby/gems/3.4.0/gems/aws-sdk-core-3.222.1/lib/aws-sdk-core/json/json_engine.rb:10:in 'Aws::Json::JsonEngine.load'
	from /var/lang/lib/ruby/gems/3.4.0/gems/aws-sdk-core-3.222.1/lib/aws-sdk-core/json.rb:43:in 'Aws::Json.load'

Here's where we use JSON.parse within the SDK: https://github.com/aws/aws-sdk-ruby/blob/version-3/gems/aws-sdk-core/lib/aws-sdk-core/json/json_engine.rb#L10

Thoughts? Thanks in advance!

@byroot
Copy link
Member

byroot commented Apr 15, 2025

You customer is passing you an invalid string. It's marked as being ASCII, but contains UTF-8.

Can't tell you any more than that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants