-
Notifications
You must be signed in to change notification settings - Fork 909
[PATHFINDING] Parse json as variant #7403
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Attn @alamb |
See also the related PR for variant here: |
Thank you for this PR @scovich
In my mind this functionality feels like a "computation kernel" (aka similarly to the functions in https://docs.rs/arrow/latest/arrow/compute/index.html) The signature seems like it would roughly be something like: /// Covert text stored as JSON in an input `StringArray`, `LargeStringArray` or `StringViewArray` into
/// a single "Variant" array (`StructArray` with an extension type)
fn json_to_variant(input: &ArrayRef) -> ArrayRef {
...
} Since the arrow-json crate is currently for converting
I think we will sort this out as part of implementing varint in #6736. TLDR is via a |
I agree something like arrow-compute makes a lot of sense. Unfortunately, the tape decoder machinery is private to arrow-json crate, so I had to do the initial pathfinding here. Is there a better way forward? |
SOme other options might be (not sure which one we should go with):
I have been thinking a lot about how we should introduce variant. What do you think about a structure like this (crates)
I think depending on how arrow-variant is implemented, maybe it depends directly on |
I filed #7423 to track this item |
This is a pathfinding exercise, to see how easy/hard it might be to parse JSON text into parquet's new variant type, using the tape decoder. Not intended to merge, it is more of a conversation starter.
In particular: