-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Move back schema not matching check and workaround #15580
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move back schema not matching check and workaround #15580
Conversation
// Handle the case where the name of a physical column expression does not match the corresponding physical input fields names. | ||
// Physical column names are derived from the physical schema, whereas physical column expressions are derived from the logical column names. | ||
// | ||
// This is a special case that applies only to column expressions. Logical plans may slightly modify column names by appending a suffix (e.g., using ':'), | ||
// to avoid duplicates—since DFSchemas do not allow duplicate names. For example: `count(Int64(1)):1`. | ||
fn maybe_fix_physical_column_name( | ||
expr: Result<Arc<dyn PhysicalExpr>>, | ||
input_physical_schema: &SchemaRef, | ||
) -> Result<Arc<dyn PhysicalExpr>> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the rename happens here in the logical plan builder to avoid errors while building the logical plan.
Related PRs:
|
e147069
to
318bfa9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @LiaCastaneda quickly taking this on and finding a great way. I think this is a quite neat fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @LiaCastaneda for the quick response. This is LGTM now. Can you just add an .slt test with a column including a name with ":", and projected across the plan, so this will not be broken in the future.
I couldn't manage to find a reproducer with a column with the original name containing ":" on the top level projection that at the same time suffixed the column name with ":" because of name conflicts. I tried also with substrait, but in my substrait reproducer the ":" is added on the aggregate of the top level projection so its a bit complicated to modify the name. I added a unit test for this function though :/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @LiaCastaneda, I'm happily merging this now
* Add expression & input schema name consistency check back * Workaround * Handle edge case where original name contains colon * Add unit test
* Add expression & input schema name consistency check back * Workaround * Handle edge case where original name contains colon * Add unit test
* Add expression & input schema name consistency check back * Workaround * Handle edge case where original name contains colon * Add unit test
Which issue does this PR close?
related to #15439
Rationale for this change
We want to keep the names mismatch check but also don't fail when having multiple joins with the same join key ( wider explanation here)
What changes are included in this PR?
Bring the check back, an align the physical column expression to the physical input field name for this cases.
Are these changes tested?
Yes, current tests work for this change too. (consumer integration test is passing)
Are there any user-facing changes?
no
I could only think of this workaround without actually skipping the check, but I'm open to suggestions :)