Skip to content

Ability to chunk download from object store #11609

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
trungda opened this issue Jul 22, 2024 · 3 comments
Closed

Ability to chunk download from object store #11609

trungda opened this issue Jul 22, 2024 · 3 comments
Labels
enhancement New feature or request

Comments

@trungda
Copy link
Contributor

trungda commented Jul 22, 2024

Is your feature request related to a problem or challenge?

When downloading large objects (> 300MBs) using object_store crate, I often hit timeout using the default configuration (30 seconds connection timeout). Interestingly, when increasing the timeout, the download speed is actual lower (not sure if it's the same for everyone?).

Describe the solution you'd like

I am thinking if it makes sense to chunk a file into smaller ranges (say, 100MB each), and in parallel, download each range with different connection and reconcile them under the same interface.

Describe alternatives you've considered

Not sure if such a capability can be composed using the existing interfaces.

Additional context

No response

@trungda trungda added the enhancement New feature or request label Jul 22, 2024
@alamb
Copy link
Contributor

alamb commented Jul 23, 2024

Thank you @trungda

I think it would be very interesting to build a "parallel downloader" ObjectStore implementation, though I am not sure it necessairly belongs in the core object_store crate (though it could be added if there is enough interest)

There might also be some interesting ideas to explore around "racing reads" to avoid latency

There are many good ideas in this paper, BTW: https://dl.acm.org/doi/10.14778/3611479.3611486

I think you could compose this kind of smart client from the existing interfaces

@trungda
Copy link
Contributor Author

trungda commented Jul 23, 2024

Thanks @alamb for the pointer. Also, I just realized that I filed this issue in the wrong repo. It should have belonged to arrow-rs :) Let me move it there to get discussion from other community members?

@trungda
Copy link
Contributor Author

trungda commented Jul 24, 2024

Closed in favor of apache/arrow-rs-object-store#274.

@trungda trungda closed this as completed Jul 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants