Skip to content

Move back schema not matching check and workaround #15580

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

LiaCastaneda
Copy link
Contributor

@LiaCastaneda LiaCastaneda commented Apr 4, 2025

Which issue does this PR close?

related to #15439

Rationale for this change

We want to keep the names mismatch check but also don't fail when having multiple joins with the same join key ( wider explanation here)

What changes are included in this PR?

Bring the check back, an align the physical column expression to the physical input field name for this cases.

Are these changes tested?

Yes, current tests work for this change too. (consumer integration test is passing)

Are there any user-facing changes?

no


I could only think of this workaround without actually skipping the check, but I'm open to suggestions :)

@github-actions github-actions bot added physical-expr Changes to the physical-expr crates core Core DataFusion crate labels Apr 4, 2025
Comment on lines +2069 to +2077
// Handle the case where the name of a physical column expression does not match the corresponding physical input fields names.
// Physical column names are derived from the physical schema, whereas physical column expressions are derived from the logical column names.
//
// This is a special case that applies only to column expressions. Logical plans may slightly modify column names by appending a suffix (e.g., using ':'),
// to avoid duplicates—since DFSchemas do not allow duplicate names. For example: `count(Int64(1)):1`.
fn maybe_fix_physical_column_name(
expr: Result<Arc<dyn PhysicalExpr>>,
input_physical_schema: &SchemaRef,
) -> Result<Arc<dyn PhysicalExpr>> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the rename happens here in the logical plan builder to avoid errors while building the logical plan.

@LiaCastaneda
Copy link
Contributor Author

LiaCastaneda commented Apr 4, 2025

Related PRs:

@LiaCastaneda LiaCastaneda force-pushed the lia/move-back-schema-not-matching-check-plus-woraround branch from e147069 to 318bfa9 Compare April 4, 2025 18:23
@LiaCastaneda LiaCastaneda marked this pull request as ready for review April 4, 2025 18:28
@alamb alamb requested a review from berkaysynnada April 6, 2025 11:56
Copy link
Contributor

@berkaysynnada berkaysynnada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @LiaCastaneda quickly taking this on and finding a great way. I think this is a quite neat fix.

Copy link
Contributor

@berkaysynnada berkaysynnada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @LiaCastaneda for the quick response. This is LGTM now. Can you just add an .slt test with a column including a name with ":", and projected across the plan, so this will not be broken in the future.

@alamb alamb changed the title Move back schema not matching check and woraround Move back schema not matching check and workaround Apr 7, 2025
@LiaCastaneda
Copy link
Contributor Author

Thank you @LiaCastaneda for the quick response. This is LGTM now. Can you just add an .slt test with a column including a name with ":", and projected across the plan, so this will not be broken in the future.

I couldn't manage to find a reproducer with a column with the original name containing ":" on the top level projection that at the same time suffixed the column name with ":" because of name conflicts. I tried also with substrait, but in my substrait reproducer the ":" is added on the aggregate of the top level projection so its a bit complicated to modify the name. I added a unit test for this function though :/

Copy link
Contributor

@berkaysynnada berkaysynnada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @LiaCastaneda, I'm happily merging this now

@berkaysynnada berkaysynnada merged commit 784df33 into apache:main Apr 8, 2025
27 checks passed
LiaCastaneda added a commit to DataDog/datafusion that referenced this pull request Apr 8, 2025
* Add expression & input schema name consistency check back

* Workaround

* Handle edge case where original name contains colon

* Add unit test
LiaCastaneda added a commit to DataDog/datafusion that referenced this pull request Apr 8, 2025
* Add expression & input schema name consistency check back

* Workaround

* Handle edge case where original name contains colon

* Add unit test
nirnayroy pushed a commit to nirnayroy/datafusion that referenced this pull request May 2, 2025
* Add expression & input schema name consistency check back

* Workaround

* Handle edge case where original name contains colon

* Add unit test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate physical-expr Changes to the physical-expr crates
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants