-
-
Notifications
You must be signed in to change notification settings - Fork 9.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incorrect Content-Length header with StringIO body #6917
Comments
To quote myself from 2023
We cannot silently do a copy like this into a I'd be happy to include a warning that is emitted unconditionally when we see an |
Thanks for the quick reply. I understand the concerns, but I don't fully agree.
As I understand it, the problem is not that we can't guess the encoding. The problem is that we decide for the user to encode as And isn't this the same situation for
What would go wrong if you did something like the following if isinstance(o, io.StringIO):
current_position = o.tell()
try:
total_size = len(o.read().encode("utf-8"))
finally:
o.seek(current_position) This would use more memory but only during this call, so it seems unlikely to increase the high-water mark for the call to |
No, what I'm saying is that in order to translate this to bytes to get the correct encoding and what the server may be expecting, we have to guess at the correct length.
I don't recall if we try to pre-process it, if urllib3 does it, or if (this is my instinctual guess)
No. With file-like objects, it's passed down to |
requests always converts small Is my understanding wrong that Content-Length is the size, in bytes, of the payload?
It looks like urllib3 is doing it, but regardless it is always sent as utf-8, so it seems reasonable to use the utf-8 length.
I mean literal |
A very concise demonstration of the problem: Would anyone expect these two requests to be different? value = "💩"
print("REQUEST 1")
print("=" * 20)
r = requests.post(
"http://example.com",
data=value,
)
print("=" * 20)
print("REQUEST 2")
print("=" * 20)
r = requests.post(
"http://example.com",
data=io.StringIO(value),
)
print("=" * 20) In fact, they are different:
And I believe the second one is just wrong, Reading the RFC, I believe the current request violates this part of RFP 9110: |
Correctly calculate Content-Length for io.StringIO objects containing multi-byte characters by measuring the UTF-8 encoded byte length. Added test case with emoji character to verify the fix. Fixes: psf#6917
When requests is used with an
io.StringIO
as thedata
type, and the body contains characters whose utf-8 encoding is multiple bytes, the Content-Length header is set incorrectly.Looking at the implementation of
super_len
, it appears thatio.StringIO
has its length measured usingseek
andtell
.It has been implemented that way since June 2016 (af7729f).
It looks like this was fixed for
str
inputs in #6586 in 2023 but was never fixed for io.StringIOI am happy to send a PR if the implementation is straightforward.
Off the top of my head I don't know how to count the bytes in a utf-8 encoded StringIO without copying, and previous PRs have tried to avoid a copy in
super_len
Expected Result
Content-Length should match the number of bytes sent when using io.StringIO
Actual Result
Content-Length is the length of the string, not the bytes sent.
Reproduction Steps
Run the following script, which shows the problem in detail
Here is my output:
Notice that the request is always the same, except that the
Content-Length
header is1
instead of4
when io.StringIO is used.System Information
The text was updated successfully, but these errors were encountered: