Support predicate pruning for basic transformations such as upper
#14054
Labels
enhancement
New feature or request
upper
#14054
Uh oh!
There was an error while loading. Please reload this page.
Follow up to #507.
Predicate pruning is a powerful technique to speed up queries by skipping entire files / pieces of work based on summary statistics of the data.
This issue proposes implementing predicate pruning for expressions such as
lower(col) = 'abc'
. The idea is that if we have a min stat such asAbC
we should be able to transform it to'abc'
and push down the predicate (in this case it might match). Or given the min/maxYYY
/ZZZ
thenlower(col) = 'abc'
could never match so the file can be skipped.To implement this you'll need to make a PR similar to #12978 and add fuzz tests (see #13253).
One thing to think about is how we can make this work in concert with other predicate push down. That is, it would be ideal if something like this could be pushed down:
lower(col) like 'abc%'
. That may require a lot of refactoring and might need to be done in a series of PRs, an initial PR that just implements the=
case would be a good start to prove that it's possible. But it may also be worth exploring generalizing e.g.lower(col) like 'abc%'
becomescol ilike 'abc%'
which we then push down? A discussion of pros and cons is warranted.The text was updated successfully, but these errors were encountered: