You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, Athena Drive relies solely on the AWS API, which is great for access control management. However, when handling large query results in streaming mode, performance can be significantly impacted due to the REST API's limitation of returning a maximum of 1,000 rows per request.
It would be beneficial to have an optional mode to read the generated CSV result directly from S3 using DuckDB, which is extremely fast and would allow fetching the entire result with a single request instead of iterating over multiple paginated responses.
This would not replace the default behavior but serve as an opt-in alternative for scenarios where performance is a concern.
If this makes sense, I'm happy to contribute with a PR for this feature. Let me know your thoughts!
The text was updated successfully, but these errors were encountered:
mariotaddeucci
changed the title
Option to Read Athena Results from S3 Using DuckDB
[FEATURE] Add support to Read Athena Results from S3 Using DuckDB
Feb 25, 2025
mariotaddeucci
changed the title
[FEATURE] Add support to Read Athena Results from S3 Using DuckDB
[Feature request] Add support to Read Athena Results from S3 Using DuckDB
Feb 25, 2025
Thank you for the suggestion. Do you think that reading the generated CSV using DuckDB is the only way to improve the results download performance for Athena? Anything else that can be tweaked?
Also, did you consider using pre-aggregations? That way, direct access to Athena would not be required when serving queries.
I'm a little bit worried about the complexity that this approach could bring—this separate DuckDB-based behavior would need to be maintained alongside the normal one, and DuckDB would become a dependency of the Athena driver. On the other hand, maybe this would prove as a way to upgrade the export bucket downloading implementation for other drivers, not only Athena.
It would be great to check what @KSDaemon thinks about this.
@mariotaddeucci In any case, you should feel free to proceed with an implementation. Even if this does not get merged into the main distribution, you can still publish this version of the Athena driver as a separate npm package that can be used with Cube anyway.
Currently, Athena Drive relies solely on the AWS API, which is great for access control management. However, when handling large query results in streaming mode, performance can be significantly impacted due to the REST API's limitation of returning a maximum of 1,000 rows per request.
It would be beneficial to have an optional mode to read the generated CSV result directly from S3 using DuckDB, which is extremely fast and would allow fetching the entire result with a single request instead of iterating over multiple paginated responses.
This would not replace the default behavior but serve as an opt-in alternative for scenarios where performance is a concern.
If this makes sense, I'm happy to contribute with a PR for this feature. Let me know your thoughts!
The text was updated successfully, but these errors were encountered: