Skip to content

Enable definition of aggregation functions in the Catalog #1760

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
May 28, 2025

Conversation

jpschorr
Copy link
Contributor

Relevant Issues

  • N/A

Description

Previously, the list of aggregation functions known to planning was hard-coded. This PR looks up aggregation functions in the catalog, allowing planning to find both built-ins and catalog-defined functions.

Other Information

  • Updated Unreleased Section in CHANGELOG: YES
  • Any backward-incompatible changes? NO
  • Any new external dependencies? NO
  • Do your changes comply with the contributing and code style guidelines? YES

License Information

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@jpschorr jpschorr requested a review from alancai98 May 20, 2025 21:48
Copy link

github-actions bot commented May 20, 2025

CROSS-ENGINE-REPORT ❌

BASE (LEGACY-V0.14.8) TARGET (EVAL-D506C7C) +/-
% Passing 89.67% 93.93% 4.26% ✅
Passing 5287 5619 332 ✅
Failing 609 175 -434 ✅
Ignored 0 188 188 🔶
Total Tests 5896 5982 86 ✅

Testing Details

  • Base Commit: v0.14.8
  • Base Engine: LEGACY
  • Target Commit: d506c7c
  • Target Engine: EVAL

Result Details

  • ❌ REGRESSION DETECTED. See Now Failing/Ignored Tests. ❌
  • Passing in both: 2612
  • Failing in both: 18
  • Ignored in both: 0
  • PASSING in BASE but now FAILING in TARGET: 27
  • PASSING in BASE but now IGNORED in TARGET: 84
  • FAILING in BASE but now PASSING in TARGET: 179
  • IGNORED in BASE but now PASSING in TARGET: 0

Now FAILING Tests ❌

The complete list can be found in GitHub CI summary, either from Step Summary or in the Artifact.

Now IGNORED Tests ❌

The complete list can be found in GitHub CI summary, either from Step Summary or in the Artifact.

Now Passing Tests

179 test(s) were previously failing in BASE (LEGACY-V0.14.8) but now pass in TARGET (EVAL-D506C7C). Before merging, confirm they are intended to pass.

The complete list can be found in GitHub CI summary, either from Step Summary or in the Artifact.

CROSS-COMMIT-REPORT ❌

BASE (EVAL-08B84FA) TARGET (EVAL-D506C7C) +/-
% Passing 93.93% 93.93% 0.00% ✅
Passing 5619 5619 0 ✅
Failing 175 175 0 ✅
Ignored 188 188 0 ✅
Total Tests 5982 5982 0 ✅

Testing Details

  • Base Commit: 08b84fa
  • Base Engine: EVAL
  • Target Commit: d506c7c
  • Target Engine: EVAL

Result Details

  • ❌ REGRESSION DETECTED. See Now Failing/Ignored Tests. ❌
  • Passing in both: 5619
  • Failing in both: 175
  • Ignored in both: 188
  • PASSING in BASE but now FAILING in TARGET: 8
  • PASSING in BASE but now IGNORED in TARGET: 0
  • FAILING in BASE but now PASSING in TARGET: 8
  • IGNORED in BASE but now PASSING in TARGET: 0

Now FAILING Tests ❌

The following 8 test(s) were previously PASSING in BASE but are now FAILING in TARGET:

Click here to see
  1. structs should be ordered by data types (DESC) (nulls first as default for desc), compileOption: PERMISSIVE
  2. structs should be ordered by data types (DESC) (nulls first as default for desc), compileOption: STRICT
  3. group by with where, compileOption: PERMISSIVE
  4. group by with where, compileOption: STRICT
  5. group by with group as and where, compileOption: PERMISSIVE
  6. group by with group as and where, compileOption: STRICT
  7. repeated field on struct is ambiguous{identifier:"REPEATED",cn:9,bn:"REPEATED"}, compileOption: STRICT
  8. repeated field on struct is ambiguous{identifier:" "repeated" ",cn:10,bn:"repeated"}, compileOption: STRICT

Now Passing Tests

The following 8 test(s) were previously FAILING in BASE but are now PASSING in TARGET. Before merging, confirm they are intended to pass:

Click here to see
  1. repeated field on struct is ambiguous{identifier:"REPEATED",cn:9,bn:"REPEATED"}, compileOption: STRICT
  2. repeated field on struct is ambiguous{identifier:" "repeated" ",cn:10,bn:"repeated"}, compileOption: STRICT
  3. structs should be ordered by data types (DESC) (nulls first as default for desc), compileOption: PERMISSIVE
  4. structs should be ordered by data types (DESC) (nulls first as default for desc), compileOption: STRICT
  5. group by with where, compileOption: PERMISSIVE
  6. group by with where, compileOption: STRICT
  7. group by with group as and where, compileOption: PERMISSIVE
  8. group by with group as and where, compileOption: STRICT

@jpschorr jpschorr marked this pull request as ready for review May 20, 2025 22:00
Copy link
Member

@alancai98 alancai98 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

implementation looks good. glad that we could finally get rid of the hard-coded aggregates. left a couple minor comments

),
)
assertEquals(0, Datum.comparator().compare(expected, datum))
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wonder if we could also test a few other scenarios:

  1. aggregate UDF has same name as a builtin scalar (e.g. UPPER)
  2. scalar UDF has same name as a builtin aggregate (e.g. COUNT)
  3. scalar UDF and aggregate UDF have same name

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the above tests. I also had to add logic to the planning to assert on name collisions between then scalar and aggregate overloads.

The current catalog interface, unfortunately, makes it difficult to introspect the functions/aggregates at build time. Otherwise, I'd have made the check during construction.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jpschorr jpschorr force-pushed the feat-enable-catalog-aggfns branch from 6350bd5 to 9f3ec3d Compare May 27, 2025 21:47
@jpschorr jpschorr requested a review from alancai98 May 27, 2025 22:03
),
)
assertEquals(0, Datum.comparator().compare(expected, datum))
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jpschorr jpschorr merged commit 927fa4c into main May 28, 2025
14 checks passed
@jpschorr jpschorr deleted the feat-enable-catalog-aggfns branch May 28, 2025 23:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants