[Feature request] Add support to Read Athena Results from S3 Using DuckDB #9267

mariotaddeucci · 2025-02-25T02:54:08Z

Currently, Athena Drive relies solely on the AWS API, which is great for access control management. However, when handling large query results in streaming mode, performance can be significantly impacted due to the REST API's limitation of returning a maximum of 1,000 rows per request.

It would be beneficial to have an optional mode to read the generated CSV result directly from S3 using DuckDB, which is extremely fast and would allow fetching the entire result with a single request instead of iterating over multiple paginated responses.

This would not replace the default behavior but serve as an opt-in alternative for scenarios where performance is a concern.

If this makes sense, I'm happy to contribute with a PR for this feature. Let me know your thoughts!

leopedrassoli · 2025-03-01T11:37:33Z

Up

igorlukanin · 2025-03-16T20:58:44Z

Hi @mariotaddeucci 👋

Thank you for the suggestion. Do you think that reading the generated CSV using DuckDB is the only way to improve the results download performance for Athena? Anything else that can be tweaked?

Also, did you consider using pre-aggregations? That way, direct access to Athena would not be required when serving queries.

I'm a little bit worried about the complexity that this approach could bring—this separate DuckDB-based behavior would need to be maintained alongside the normal one, and DuckDB would become a dependency of the Athena driver. On the other hand, maybe this would prove as a way to upgrade the export bucket downloading implementation for other drivers, not only Athena.

It would be great to check what @KSDaemon thinks about this.

@mariotaddeucci In any case, you should feel free to proceed with an implementation. Even if this does not get merged into the main distribution, you can still publish this version of the Athena driver as a separate npm package that can be used with Cube anyway.

mariotaddeucci changed the title ~~Option to Read Athena Results from S3 Using DuckDB~~ [FEATURE] Add support to Read Athena Results from S3 Using DuckDB Feb 25, 2025

mariotaddeucci changed the title ~~[FEATURE] Add support to Read Athena Results from S3 Using DuckDB~~ [Feature request] Add support to Read Athena Results from S3 Using DuckDB Feb 25, 2025

igorlukanin added enhancement New feature proposal driver:athena Issues related to the AWS Athena driver labels Mar 16, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature request] Add support to Read Athena Results from S3 Using DuckDB #9267

[Feature request] Add support to Read Athena Results from S3 Using DuckDB #9267

mariotaddeucci commented Feb 25, 2025

leopedrassoli commented Mar 1, 2025

Uh oh!

igorlukanin commented Mar 16, 2025

Uh oh!

[Feature request] Add support to Read Athena Results from S3 Using DuckDB #9267

[Feature request] Add support to Read Athena Results from S3 Using DuckDB #9267

Comments

mariotaddeucci commented Feb 25, 2025

leopedrassoli commented Mar 1, 2025

Uh oh!

igorlukanin commented Mar 16, 2025

Uh oh!