-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Add input_nullable for UDAF args StateField and Accumulator #11063
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we only need
nullable
for state_field 🤔There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is not used in the PR. I just thought it made sense to make the API more similar but will revert.
But I also noticed this is not enough to resolved the
array_agg
regression. There are two more limitiations in the current UDAF API. Firstly herehttps://github.com/eejbyfeldt/datafusion/blob/18042fd69138e19613844580408a71a200ea6caa/datafusion/physical-expr-common/src/aggregate/mod.rs#L287-L289
the nullability of the returned field is hardcoded to
true
and it not controllable AggregateUDFImpl. What is the desired way to fix this?Should be api be changed to instead implement a method
fn field
?Or should we add a method
return_nullable
method with a defaultfalse
implementation?I also noticed that the current implementation for
array_agg
does not propagate the nullability of the input to the field in the returned array. This is probably because thereturn_type
method does not have access to nullability. But probably something we want to be able to resolve in the long run.Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can use
nullable
forfield()
.You can get
input_nullable
increate_aggregate_expr
I think
nullable
is both set instate_field
andfield
, so the returned array should match the schema of them. 🤔Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking closer at the code I see that this will maintain the behavior of the old code. But it seems wrong to me that we in general assume that the aggregate maintains the nullability of the input type. If we consider the aggregate
array_agg
. Then there are two "nullable" fields in the return value the "top level" value and the "field inside" the returned array. I think ourarray_agg
(or at least a possiblearray_agg
) will return an empty array when there are no values. This means that the nullability of the "top level" field should always befalse
regardless of input nullability and the nullabillity that depends on the input is the "field inside" the array. Note that I think the existing code also does not implement this correctly.I tried out the suggested fix and that will break existing code. Probably because it wrong for some existing aggregtes like sum that might return null even if the input is not nullable. So that is further indication that is not the correct way to go.
This comment was marked as outdated.
Sorry, something went wrong.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The nullability is introduced in #8055
There might be other way to fix #8055 🤔
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I test the code in #8032, and there is no error after I change the "top level null" back to
false
🤔Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally the field for array agg should be, the
nullable
is the nullability of element, not the nullability of the ListThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that was what I was trying to explain. I created this PR that fixes that #11093 it required some other changes to make that change possible.