Skip to content

[Feature request] Add support to Read Athena Results from S3 Using DuckDB #9267

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
mariotaddeucci opened this issue Feb 25, 2025 · 2 comments
Labels
driver:athena Issues related to the AWS Athena driver enhancement New feature proposal

Comments

@mariotaddeucci
Copy link

Currently, Athena Drive relies solely on the AWS API, which is great for access control management. However, when handling large query results in streaming mode, performance can be significantly impacted due to the REST API's limitation of returning a maximum of 1,000 rows per request.

It would be beneficial to have an optional mode to read the generated CSV result directly from S3 using DuckDB, which is extremely fast and would allow fetching the entire result with a single request instead of iterating over multiple paginated responses.

This would not replace the default behavior but serve as an opt-in alternative for scenarios where performance is a concern.

If this makes sense, I'm happy to contribute with a PR for this feature. Let me know your thoughts!

@mariotaddeucci mariotaddeucci changed the title Option to Read Athena Results from S3 Using DuckDB [FEATURE] Add support to Read Athena Results from S3 Using DuckDB Feb 25, 2025
@mariotaddeucci mariotaddeucci changed the title [FEATURE] Add support to Read Athena Results from S3 Using DuckDB [Feature request] Add support to Read Athena Results from S3 Using DuckDB Feb 25, 2025
@leopedrassoli
Copy link

Up

@igorlukanin igorlukanin added enhancement New feature proposal driver:athena Issues related to the AWS Athena driver labels Mar 16, 2025
@igorlukanin
Copy link
Member

Hi @mariotaddeucci 👋

Thank you for the suggestion. Do you think that reading the generated CSV using DuckDB is the only way to improve the results download performance for Athena? Anything else that can be tweaked?

Also, did you consider using pre-aggregations? That way, direct access to Athena would not be required when serving queries.

I'm a little bit worried about the complexity that this approach could bring—this separate DuckDB-based behavior would need to be maintained alongside the normal one, and DuckDB would become a dependency of the Athena driver. On the other hand, maybe this would prove as a way to upgrade the export bucket downloading implementation for other drivers, not only Athena.

It would be great to check what @KSDaemon thinks about this.

@mariotaddeucci In any case, you should feel free to proceed with an implementation. Even if this does not get merged into the main distribution, you can still publish this version of the Athena driver as a separate npm package that can be used with Cube anyway.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
driver:athena Issues related to the AWS Athena driver enhancement New feature proposal
Projects
None yet
Development

No branches or pull requests

3 participants