-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Move conversion of FIRST/LAST Aggregate function to independent physical optimizer rule #10061
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
&input_order_mode, | ||
); | ||
|
||
let aggr_exec = aggr_exec.new_with_aggr_expr_and_ordering_info( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
keep the logic similar to AggregateExec::try_new_with_schema
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok(AggregateExec {
mode,
group_by,
aggr_expr,
filter_expr,
input,
schema,
input_schema,
metrics: ExecutionPlanMetricsSet::new(),
required_input_ordering,
limit: None,
input_order_mode,
cache,
})
We will review and comment on this next week. @mustafasrepo |
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
/// | ||
/// Similar to the one in datafusion/physical-plan/src/aggregates/mod.rs, but this | ||
/// function care only the possible conversion between FIRST_VALUE and LAST_VALUE | ||
fn get_aggregate_exprs_requirement( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
only first/last rule is moved to here.
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: jayzhan211 <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @jayzhan211 -- this is a very nice contribution towards extracting aggregate functions out of the core. I think we should fix the double recursion but otherwise this code looks (really) nice to me.
cc @mustafasrepo and @ozankabak FYI
/// so we can convert the aggregate expression to FirstValue(c1 order by asc), | ||
/// since the current ordering is already satisfied, it saves our time! | ||
#[derive(Default)] | ||
pub struct ConvertFirstLast {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we could call this something more general, like OptimizeAggregateOrder
so it could potentially be used for aggregates other than FIRST_VALUE
and LAST_VALUE
🤔
} | ||
|
||
fn name(&self) -> &str { | ||
"SimpleOrdering" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this name should match the name of the structure -- that is "ConvertFirstLast"
in this case
fn get_common_requirement_of_aggregate_input( | ||
plan: Arc<dyn ExecutionPlan>, | ||
) -> Result<Transformed<Arc<dyn ExecutionPlan>>> { | ||
// Optimize children |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since this rule already calls transform_up
which handles the recursion up the tree of ExecutionPlan
and managine the transformed
flag, I don't think you also need to recursively walk down the children here again. I think you can probably just call optimize_internal
directly
Recursing back down the tree is also like N^2 (or worse) in the number of plan nodes so I think we should avoid it for performance reasons (in addition to making the code simpler)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazing, it works! I think I have no idea what is going on in transform_up
😞
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// Convenience utility for writing optimizer rules: Recursively apply the
/// given function `f` to all children of a node, and then to the node itself
/// (post-order traversal). When `f` does not apply to a given node, it is
/// left unchanged.
I didn't notice that the children is updated to parent too, so I do it manually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
$SELF.map_children($F_CHILD)?.transform_parent(|n| $F_UP(n))
I think transform_parent
here is actually doing transform_self
🤔
} | ||
} | ||
|
||
/// In `create_initial_plan` for LogicalPlan::Aggregate, we have a nested AggregateExec where the first layer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you for this comment. It makes things much clearer
let reverse_aggr_req = | ||
PhysicalSortRequirement::from_sort_exprs(&reverse_aggr_req); | ||
|
||
if let Some(first_value) = aggr_expr.as_any().downcast_ref::<FirstValue>() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eventually (some other PR) it would be amazing if we can move this code into FirstValue
somehow. As it is now, there is a coupling between the optimizer rule and the actual PhysicalExpr
-- which means among other things this same optimization can't be used by user defined aggregates
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this should be similar to FunctionRewrite, registerable optimize rule.
Signed-off-by: jayzhan211 <[email protected]>
We will review this Monday |
Sounds good -- thank you @ozankabak -- let's wait for that review prior to merging this PR |
@@ -89,6 +90,8 @@ impl PhysicalOptimizer { | |||
// as that rule may inject other operations in between the different AggregateExecs. | |||
// Applying the rule early means only directly-connected AggregateExecs must be examined. | |||
Arc::new(LimitedDistinctAggregation::new()), | |||
// Run once before PartialFinalAggregation is rewritten to ensure the rule is applied correctly | |||
Arc::new(OptimizeAggregateOrder::new()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can remove this rule from here. Using it only in below place should be enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jayzhan211 for this PR. I have left some minor comment. Please note that, applying my suggestion changes couple of tests. However, I have verified that those changes are both valid, not harmful for the execution. We can merge this PR as is also.
Signed-off-by: jayzhan211 <[email protected]>
Let me apply your suggestion! Thanks for your review @mustafasrepo and @alamb |
Which issue does this PR close?
Closes #9972.
Rationale for this change
We plan to make FIRST / LAST UDAF. This rule does the conversion between FIRST/LAST, it will eventually be moved to
aggregate-functions
crate. The first step is to move it out to an independent rule.What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?