Skip to content

Allow default value on lag/lead if they are coerceable to value #8308

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

simonvandel
Copy link
Contributor

Which issue does this PR close?

Closes #8307.

Rationale for this change

Allow other default values than Int64 on lag/lead window functions.

What changes are included in this PR?

If a default value is specified, it is now coerced if possible.
Before this PR only Int64 was allowed.

Are these changes tested?

Yes, added some slt tests.

Are there any user-facing changes?

Yes, more default values are now accepted.

@github-actions github-actions bot added physical-expr Changes to the physical-expr crates sqllogictest SQL Logic Tests (.slt) labels Nov 22, 2023
|| (dtype.is_integer() && default_value_type.is_integer())
{

if comparison_coercion(&default_value_type, dtype).is_some() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that you just need to change integer, since I'm not sure having pub fn comparison_binary_numeric_coercion is a good idea or not. At least, we can add the comment for now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I didn't understand what you mean by "change integer". Could you rephrase, or provide a suggestion for a change?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I mean coerce, not change.

Copy link
Contributor

@jayzhan211 jayzhan211 Nov 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code from your previous commit

if can_coerce_from(dtype, &default_value_type)
            || (dtype.is_integer() && default_value_type.is_integer())

Comment similar to
// We only check coercion between integer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So just to be clear, are you suggesting that the comparison should be reverted to

if can_coerce_from(dtype, &default_value_type)
            // We explicitely allow a lossy conversion between signed and unsigned integers
            || (dtype.is_integer() && default_value_type.is_integer())

?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest comparison_coercion. I read the doc, it does not seem lag/lead should only be integer.

Copy link
Contributor

@jayzhan211 jayzhan211 Dec 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that coerced_type might not be the same of dtype or default_value_type. So we need to cast both of them to coerced_type

if let(coerced_type) = comparison_coercion(dtype, &default_value_type) {}

For example, if we have (dtype: UInt64, default_value_type: Int32), coerced_type is Int64

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @alamb to confirm my suggestion.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think coercing during physical evaluation is too late in the process -- most other coercion happens in the analyze phase so that subsequent transformations / optimizations use the same rules.

Would it be possible to move this coercion earlier into

https://github.com/apache/arrow-datafusion/blob/513fd052bdbf5c7a73de544a876961f780b90a92/datafusion/optimizer/src/analyzer/type_coercion.rs#L64-L63

?

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this @simonvandel and your help @jayzhan211

|| (dtype.is_integer() && default_value_type.is_integer())
{

if comparison_coercion(&default_value_type, dtype).is_some() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think coercing during physical evaluation is too late in the process -- most other coercion happens in the analyze phase so that subsequent transformations / optimizations use the same rules.

Would it be possible to move this coercion earlier into

https://github.com/apache/arrow-datafusion/blob/513fd052bdbf5c7a73de544a876961f780b90a92/datafusion/optimizer/src/analyzer/type_coercion.rs#L64-L63

?

@alamb
Copy link
Contributor

alamb commented Dec 1, 2023

(p.s sorry for the late review)

@alamb alamb marked this pull request as draft December 15, 2023 19:19
@alamb
Copy link
Contributor

alamb commented Dec 15, 2023

Marking as draft as i think this is no longer waiting on feedback. Please mark it ready for review if/when it is ready for another look

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
physical-expr Changes to the physical-expr crates sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

lag and lead window functions does not accept compatible default value
3 participants