Skip to content

[datafusion-spark] Example of using Spark compatible function library #15915

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Tracked by #15914
alamb opened this issue May 2, 2025 · 4 comments
Open
Tracked by #15914

[datafusion-spark] Example of using Spark compatible function library #15915

alamb opened this issue May 2, 2025 · 4 comments
Labels
documentation Improvements or additions to documentation enhancement New feature or request good first issue Good for newcomers

Comments

@alamb
Copy link
Contributor

alamb commented May 2, 2025

Is your feature request related to a problem or challenge?

@shehabgamin added the datafusion-spark crate in #15168

The idea is that using the functions in this crate you can get a SessionContext that executes sql using Spark semantics. However, there is no user facing documentation that shows how to do this

Describe the solution you'd like

Add an example somewhere showing how to configure and use the spark functions in a SessionContext. I can help with this

Describe alternatives you've considered

I personally suggest adding a new page to the website: https://datafusion.apache.org/

Specifically, I suggest

  1. Add a new page in the "Library User Guide" called "Spark Compatible Functions"
  2. Add a preamble explaining what the datafusion-spark crate is (contains a list of spark compatible functions)
  3. Add examples

For example we should show how to run sql using a "spark compatible" frame:

let ctx = SessionContext::new();
datafusion_spark::register_all(&ctx)?;

// TODO run an example SQL query here that uses a function from 
// the datafusion spark crate
ctx.sql("select ... ")

// also add an example for DataFrame API

In order to run the example code as part of CI, you will have to add an entry such as this:

#[cfg(doctest)]
doc_comment::doctest!(
"../../../docs/source/user-guide/introduction.md",
user_guide_introduction
);

to the datafusion-spark lib.rs file (it can't go in the datafusion/core/lib.rs because the core crate doesn't bring in datafusion-spark)

Additional context

No response

@alamb alamb added enhancement New feature or request documentation Improvements or additions to documentation labels May 2, 2025
@alamb alamb added the good first issue Good for newcomers label May 2, 2025
@alamb
Copy link
Contributor Author

alamb commented May 2, 2025

I think this is a good first issue as there is a clear request of what is desired and examples to follow

@Adez017
Copy link
Contributor

Adez017 commented May 2, 2025

hi @alamb . it sounds interesting , i would love to work in this .

@Adez017
Copy link
Contributor

Adez017 commented May 2, 2025

take

@alamb
Copy link
Contributor Author

alamb commented May 2, 2025

Thanks @Adez017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

2 participants