Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-51732][SQL] Apply rpad on attributes with same ExprId if they need to be deduplicated #50527

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

mihailotim-db
Copy link
Contributor

@mihailotim-db mihailotim-db commented Apr 7, 2025

What changes were proposed in this pull request?

This PR fixes a case where rpad is not applied on attributes that have the same ExprId even though those attributes should be deduplicated.

Why are the changes needed?

For example, consider the following query:

CREATE OR REPLACE TABLE t(a CHAR(50)); 
SELECT t1.aFROM t t1 
WHERE (SELECT count(*) AS item_cnt FROM t t2 WHERE (t1.a = t2.a)) > 0

In the above case, ApplyCharTypePadding will run for subquery where t1.a and t2.a will reference the same ExprId, therefore we won't apply rpad. However, after DeduplicateRelations runs for outer query, t1.a and t2.a will get different ExprIds and would therefore need rpad. However, this doesn't happen because ApplyCharTypePadding for outer query does not recurse into the subquery.

On the other hand, for a query:

SELECT t1.a
FROM t t1, t t2
WHERE t1.a = t2.a 

ApplyCharTypePadding will correctly add rpad to both t1.a and t2.a because attributes will first be deduplicated.

In particular, this fixes a code-path when readSideCharPadding is off and LEGACY_NO_CHAR_PADDING_IN_PREDICATE is also false

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

No

@github-actions github-actions bot added the SQL label Apr 7, 2025
@mihailotim-db mihailotim-db force-pushed the mihailotim-db/apply_char_type_padding_subqueries branch from 2b978d8 to 58fd919 Compare April 7, 2025 08:48
@mihailotim-db mihailotim-db changed the title fix [SPARK-51732][SQL] Apply rpad on attributes with same ExprId if they need to be deduplicated Apr 7, 2025
@mihailotim-db mihailotim-db force-pushed the mihailotim-db/apply_char_type_padding_subqueries branch 5 times, most recently from d31421c to 0594863 Compare April 7, 2025 17:49
@mihailotim-db mihailotim-db force-pushed the mihailotim-db/apply_char_type_padding_subqueries branch from 0594863 to 93304e1 Compare April 8, 2025 14:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants