Skip to content

Databricks Driver: Export Bucket On GCS #9393

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
qiao-x opened this issue Mar 26, 2025 · 7 comments
Closed

Databricks Driver: Export Bucket On GCS #9393

qiao-x opened this issue Mar 26, 2025 · 7 comments
Labels

Comments

@qiao-x
Copy link
Contributor

qiao-x commented Mar 26, 2025

Hi, team
I've tested the create table using statement on Databricks-GCP and it succeeds to export csv to a bucket on GCS, so i'm wondering if it's feasible to implement the GCS export feature based on this statement and the extract files function here

proof
Image

Image
@qiao-x
Copy link
Contributor Author

qiao-x commented Mar 26, 2025

Hi @KSDaemon any suggestions?

@KSDaemon
Copy link
Member

Hey!
You need to:

  • add support for GCS in unload() of Databricks Driver here
  • add support for GCS in getCsvFiles of Databricks Driver here
  • Note, there is already createExternalTableFromSql() implementation in Databricks Driver which might require some alignment
  • You can have a look at Snowflake driver as example.

@qiao-x
Copy link
Contributor Author

qiao-x commented Mar 28, 2025

@KSDaemon good news, i've completed the poc in our gcp-databricks env with export bucket feature. I will then create a MR for this and request a review from you guys.

But i met some issues when implementing this solution
When i fork from v1.2.27, and simply build databricks-jdbc driver without any code modification and then put dist/src into the container, everything works fine, but things change when I fork from master(i believe a ODBC driver is introduced there). Even i modify nothing and just replace the dist/src with my local built version. It shows an error when any queries are triggered:

Error: EISDIR: illegal operation on a directory, open '/cube/node_modules/@cubejs-backend/databricks-jdbc-driver/dist/download/META-INF/'\n    at DatabricksDriver.testConnection (/cube/node_modules/@cubejs-backend/jdbc-driver/src/JDBCDriver.ts:221:13)\n    at /cube/node_modules/@cubejs-backend/server-core/src/core/server.ts:621:15\n    at Object.query (/cube/node_modules/@cubejs-backend/query-orchestrator/src/orchestrator/QueryCache.ts:568:26)\n    at QueryQueue.processQuery (/cube/node_modules/@cubejs-backend/query-orchestrator/src/orchestrator/QueryQueue.js:839:25)

@KSDaemon
Copy link
Member

Cool!

Rgd error - interesting... need to look

@qiao-x
Copy link
Contributor Author

qiao-x commented Mar 31, 2025

Hi@KSDaemon
I've create a PR #9407 Could you review this and instruct me on how to add UTs?

@KSDaemon
Copy link
Member

@qiao-x That's awesome! Sure, I'll have a look!

@igorlukanin
Copy link
Member

igorlukanin commented Apr 14, 2025

Thanks for the contribution @qiao-x! This has been released in v1.3.0: https://cube.dev/blog/cube-core-v1-3-performance-improvements-and-upgrades#new-in-data-source-support

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants