-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Timeouts reading "large" files from object stores over "slow" connections #15067
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Besides the bad connection, it also can happen across cloud region's according to a user |
To trigger this error you need a "slow" internet connection and a parquet file where the row group is "large" and the query is trying to read lots of data (as would be the case when trying to The triggering condition is that the amount of data requested by DataFusion can not be retrieved in a single ObjectStore request The https://datasets.clickhouse.com/hits_compatible/athena_partitioned/hits_1.parquet file is particularly about in this regard It has three row groups with 16MB, 161MB and 231MB > select distinct row_group_id, row_group_num_rows, row_group_bytes from parquet_metadata('hits_1.parquet');
+--------------+--------------------+-----------------+
| row_group_id | row_group_num_rows | row_group_bytes |
+--------------+--------------------+-----------------+
| 1 | 344064 | 161244232 |
| 0 | 62734 | 16679556 |
| 2 | 593202 | 231269159 |
+--------------+--------------------+-----------------+
3 row(s) fetched.
Elapsed 0.004 seconds. Since the query is basically doing |
I think the solution here is to make more requests, each for a smaller amount of data. For example, instead of a single request for 93MB, it could make 23 requests of 4 MB each, or 100 requests for 1MB each The only real question in my mind is where to add this logic (in the parquet reader or as an object store wrapper) The easiest thing for now is probably to make an object store wrapper, like Longer term it might make sense to look into making the parquet reader more fine grained (as in be able to start decoding pages from a row group before they are all fetched) |
FWIW splitting large requests and performing them in parallel is something we could upstream into object_store's default get_ranges method. It already does the reverse. Edit: That being said 200MB row groups is probably a problem in and of itself, and might suggest an issue with the writer's configuration. |
I think @crepererum is also working on something similar ("Chunked Requests") for us internally at InfluxDB as various people noticed that you could actually often get more bandwidth and lower latency from S3 using multiple concurrent requests to the same object (though of course you pay amazon per request so the $$$ cost is higher)
That particular file came from ClickBench which is not necessairly the best example of parquet files so in general I agree smaller row groups might be better To be clear, I think the problem with "the single request that is made can not complete before the timeout is hit" is real and unfortunately it isn't like there is only one possible fix. There are a bunch of potential fixes that come with different tradeoffs 🤔 |
Maybe it is time for a |
|
I am convinced this issue would be solved with automatic retries |
Describe the bug
Related to
to_pyarrow_table()
on a table in S3 kept getting "Generic S3 error: error decoding response body" delta-io/delta-rs#2595Basically, when I just try to read one of the ClickBench parquet files directly from remote object store (see example below) on a slow internet connection I get the following error
My example just reads the data back (it is not doing any CPU intensive processing).
This is very similar to the reports @ion-elgreco has fielded in delta.rs
to_pyarrow_table()
on a table in S3 kept getting "Generic S3 error: error decoding response body" delta-io/delta-rs#2595To Reproduce
Run this program that just tries to read the file (on a crappy internet connection)
This results in the following output:
Expected behavior
I expect the query to complete without error
Additional context
When I added an
ObjectStore
wrapper that reported what requests were beingmade to the underlying storage system I found that DataFusion is making single
"large" request for 93MB. Given the bandwidth of the coffee shop wifi, this
request can not be completed within the default 30 second connection timeout.
Thus the request times out and the error is returned to the client
I was able to make the query work by writing another ObjectStore wrapper that
split the single 93MB request into multiple smaller requests and then my program completes.
Click here to see the idea (horrible code, I am sorry)
The text was updated successfully, but these errors were encountered: