Skip to content

Substrait: Handle inner map fields in schema renaming #15869

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

cht42
Copy link
Contributor

@cht42 cht42 commented Apr 26, 2025

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added the substrait Changes to the substrait crate label Apr 26, 2025
Copy link
Contributor

@gabotechs gabotechs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this fix! would it be possible to add a test that replicates the issue on main and shows that this PR solves it?

@cht42 cht42 requested a review from gabotechs April 29, 2025 14:51
Comment on lines 73 to 74
Projection: DATA.a, DATA.b
Projection: DATA.a, DATA.b, DATA.c
TableScan: DATA
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the tests! I think it might be worth to expand a bit on them, as they are only testing the new functionality partially. For example, I could replace the code committed in datafusion/substrait/src/logical_plan/consumer.rs with just the following:

        ...
        DataType::Map(inner, _) => match inner.data_type() {
            DataType::Struct(key_and_value) if key_and_value.len() == 2 => {
                *name_idx += 1;
                Ok(field.clone())
            }
            _ => substrait_err!("Map fields must contain a Struct with exactly 2 fields"),
        },
        ...

And make the test green, even though the code is wrong.

Maybe @Blizzara or @westonpace can give a bit more insights about how to better test this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated with unit tests

@cht42 cht42 force-pushed the chuet/rename-map-inner-substrait branch from 5d5ac4c to 5a61718 Compare May 4, 2025 10:52
Arc::new(Field::new_map(
"7",
"entries",
Arc::new(Field::new("keys", DataType::Int32, false)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make keys also a struct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor

@Blizzara Blizzara left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! (I work with @cht42 )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
substrait Changes to the substrait crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Substrait: Handle inner map fields in schema renaming
3 participants