Skip to content

Add a "Gentle Introduction to Arrow / Record Batches" #11336

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Tracked by #7013
alamb opened this issue Jul 8, 2024 · 2 comments
Open
Tracked by #7013

Add a "Gentle Introduction to Arrow / Record Batches" #11336

alamb opened this issue Jul 8, 2024 · 2 comments
Assignees
Labels
documentation Improvements or additions to documentation enhancement New feature or request

Comments

@alamb
Copy link
Contributor

alamb commented Jul 8, 2024

Part of #7013

Is your feature request related to a problem or challenge?

As @efredine notes on #11290 / #11290 (comment):

The in-memory examples are concise and its easy to get the gist of what's going on. But it also throws people in to the deep end of the Arrow format which lacks a gentle introduction IMO. The Arrow-rs documentation gets immediately into the weeds!

Describe the solution you'd like

It's likely that many users might never even need to know or access the arrow format directly. They will just read and write to csv or parquet.

I don't think this needs to change, but perhaps what's missing is a section on how and when to use the Arrow format? A gentler introduction to Record Batches

Describe alternatives you've considered

Add a section to the user guide on "a gentle introduction to arrow"

Additional context

here is a ticke tracking such a thing upstream: apache/arrow-rs#4071

I actually think the basic content / structure could be copied from https://jorgecarleitao.github.io/arrow2/main/guide/ with the examples being updated to reflect arrow-rs

@alamb alamb added documentation Improvements or additions to documentation enhancement New feature or request labels Jul 8, 2024
@Adez017
Copy link
Contributor

Adez017 commented Apr 18, 2025

take

@alamb
Copy link
Contributor Author

alamb commented Apr 19, 2025

Thank you @Adez017 -- I would personally suggest starting by porting some of the contents of https://jorgecarleitao.github.io/arrow2/main/guide/ into DataFUsion's docs

In terms of mechanism, I personally suggest making a PR into the arrow-rs repo: https://github.com/apache/arrow-rs and making the documentation doc comments in the arrow crate

This would mean they docs show up here:

And it will ensure the examples contine to run

I suggest starting with the sections

  • Arrow format
  • high level API

And see how that goes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants