Support bounds evaluation for temporal data types #14523

ch-sc · 2025-02-06T11:36:31Z

Which issue does this PR close?

As discussed in 14237 temporal data types should be supported in bounds evaluation.

Closes Add support for filter selectivity for more expression types #14237.

Rationale for this change

We want to extend the number of expressions that can evaluate bounds to calculate statistics and enable better/more query plan optimisations.

What changes are included in this PR?

Additional data types are supported: Timestamp, Date, Time, Duration, Interval
Additional binary operators are supported: ~~NotEq~~, IsDistinctFrom, IsNotDistinctFrom
Whether an expression supports bounds evaluation or not is now part of the PhysicalExpr trait and therefore decided by every expression type itself. This will also enable UDFs to implement bounds evaluation.

Are these changes tested?

yes

Are there any user-facing changes?

PhysicalExpr and ScalarUDFImpl change, but both provide default implementations for additional functions.

berkaysynnada · 2025-02-07T07:50:05Z

Thank you @ch-sc for working on this. When you need a review, I can do that if you ping me.

ch-sc · 2025-02-07T12:54:46Z

Thanks @berkaysynnada! Yeah I'd like to get a review.

There is still a bug, though, that I don't understand yet. Sometimes the sort operator gets removed as can be seen in the repartition_scan.slt test.

berkaysynnada · 2025-02-07T12:59:12Z

Sometimes the sort operator gets removed as can be seen in the repartition_scan.slt test.

Seems like you're losing the order requirement somehow

…t-temporal-types-in-interval-arithmetics

ch-sc · 2025-02-12T13:53:55Z

@berkaysynnada do you have time to take another look? :)

NotEq leads to the removal of the sort operator.

I debugged into this and noticed that the EnforceSorting optimiser removes SortExec. enforce_sorting/mod.rs:415
However, prior to that the order requirement somehow is lost and I don't understand why that happens.

I removed NotEq again from the supported binary operators to move this forward. We might want to look into this in a follow-up PR. WDYT?

berkaysynnada · 2025-02-12T14:35:11Z

@berkaysynnada do you have time to take another look? :)

Thanks again for driving this forward :) I'm a bit busy these days, but I'd love to go over it. If it's not urgent for you, would you mind waiting until the weekend?

I removed NotEq again from the supported binary operators to move this forward. We might want to look into this in a follow-up PR. WDYT?

Thanks for the heads-up. I try to add support for NotEq. If I find a solution, I can fix it within this PR itself—if that's okay with you.

I took a quick look, and you can use the union logic there:
https://github.com/Fly-Style/datafusion/blob/fee3023a63227f0a22ac2da1d040d373cc028c34/datafusion/expr-common/src/interval_arithmetic.rs#L667
It seems to follow a better style, IMO.

ch-sc · 2025-02-13T12:00:34Z

I try to add support for NotEq. If I find a solution, I can fix it within this PR itself—if that's okay with you.

Sure, feel free to make any adjustment as you see fit.

ch-sc · 2025-02-14T19:48:31Z

I took a quick look, and you can use the union logic there

I thought a bit more about the union logic and came up with another solution.

Sometimes intervals are not overlapping and sometimes have no bound in one direction (NULL). Typically the union of non-overlapping intervals would result in a set of intervals, but there is no support of this.

Here are some examples of what I think should be correct, without interval sets support:

(1, 2)    ∪ (3,4)        = (1,4)
(1, NULL) ∪ (NULL, NULL) = (1, NULL)
(NULL, 1) ∪ (NULL, NULL) = (NULL, 1)
(3, NULL) ∪ (NULL, 1)    = (NULL, NULL)
(5, NULL) ∪ (1, 2)       = (1, NULL)

I adapted the union logic accordingly and added tests. Happy for your feedback!

ch-sc · 2025-02-18T14:10:05Z

Looks like I confused the meaning of nulls in intervals. I switched to the union logic @berkaysynnada suggested.

berkaysynnada

Hi @ch-sc. Both your implementations and clean-ups on the existing code seem very good to me. I suggested some further improvements, and have one point to discuss: Rather than introducing new supports_() API's, should we force the users to infer the support by the evaluate API? IMO that will make things simpler, WDYT?

and sorry for the late response 😞

berkaysynnada · 2025-02-23T18:25:06Z

datafusion/common/src/scalar/mod.rs

@@ -1583,6 +1583,17 @@ impl ScalarValue {
        }
    }

+    /// Returns negation for a boolean scalar value
+    pub fn boolean_negate(&self) -> Result<Self> {


Maybe we can extend arithmetic_negate() with booleans, wdyt?

We can do that, but I think we should then rename arithmetic_negate to just negate to cause no ambiguity. WDYT?

datafusion/expr-common/src/interval_arithmetic.rs

berkaysynnada · 2025-02-23T18:41:39Z

datafusion/expr-common/src/interval_arithmetic.rs

+        Operator::IsDistinctFrom | Operator::IsNotDistinctFrom => {
+            NullableInterval::from(lhs)
+                .apply_operator(op, &rhs.into())
+                .and_then(|x| {


We can avoid cloning here actually introducing a new taker API to NullableInterval

Not sure how you would do that.

datafusion/expr-common/src/operator.rs

datafusion/expr/src/udf.rs

datafusion/physical-expr/src/expressions/in_list.rs

…t-temporal-types-in-interval-arithmetics

ch-sc · 2025-02-25T13:49:26Z

@berkaysynnada thank you for your review 🙂

Rather than introducing new supports_() API's, should we force the users to infer the support by the evaluate API?

I agree, something like that would be ideal. I'll look into it today

berkaysynnada · 2025-03-16T14:07:11Z

Hi @ch-sc. Do you plan to complete the work here? (as it's so almost)

ch-sc · 2025-03-17T07:53:54Z

Hi @berkaysynnada, I should be able to spend some time on this at the end of this week.

…t-temporal-types-in-interval-arithmetics

ch-sc · 2025-04-09T11:21:37Z

Hi @berkaysynnada, can you take another look to move this forward? :)

…t-temporal-types-in-interval-arithmetics

berkaysynnada · 2025-04-10T11:42:37Z

Hi @berkaysynnada, can you take another look to move this forward? :)

I will do but some tests are failing. I'll have some time probably in the evening or tomorrow morning

berkaysynnada · 2025-04-11T08:52:20Z

Hi @ch-sc. Reviewing while tests are failing is not very efficient. Are you dealing with them?

ch-sc · 2025-04-11T14:36:39Z

Sorry @berkaysynnada, I got side-tracked from this yesterday. The test is fixed.

berkaysynnada · 2025-04-17T07:43:18Z

Thank you @ch-sc. I will review it today

berkaysynnada

Thank you again @ch-sc. I've looked the changes except cp_solver file. Your changes makes very sense overall, but I see some points to discuss. Let's continue iterating

berkaysynnada · 2025-04-17T13:59:29Z

datafusion/physical-expr/src/analysis.rs

        col_stats: &ColumnStatistics,
        col_index: usize,
    ) -> Result<Self> {
-        let field = schema.fields().get(col_index).ok_or_else(|| {


Why do we spread this check over all caller side code?

berkaysynnada · 2025-04-17T14:06:27Z

datafusion/physical-expr-common/src/physical_expr.rs

@@ -126,6 +126,25 @@ pub trait PhysicalExpr: Send + Sync + Display + Debug + DynEq + DynHash {
        not_impl_err!("Not implemented for {self}")
    }

+    /// Checks support of bounds evaluation for this expression, before evaluating bounds.
+    /// Returns None if bounds evaluation is not supported.
+    fn evaluate_bounds_checked(


We have supports_bounds_evaluation() there already. I think no need to introduce another API in PhysicalExpr. If we see lots of repetition of "first supports_bounds_evaluation, then evaluate_bounds" pattern, then we can make a standalone function, IMO

berkaysynnada · 2025-04-17T14:15:31Z

datafusion/expr-common/src/interval_arithmetic.rs

-                rhs.data_type()
-            );
-        };
+            BinaryTypeCoercer::new(&self.data_type(), &Operator::Plus, &rhs.data_type()).get_result_type()


should we check the result type is also supported by evaluate_bounds? WDYT

berkaysynnada · 2025-04-17T14:19:53Z

datafusion/expr-common/src/interval_arithmetic.rs

@@ -963,6 +961,23 @@ pub fn apply_operator(op: &Operator, lhs: &Interval, rhs: &Interval) -> Result<I
        Operator::Minus => lhs.sub(rhs),
        Operator::Multiply => lhs.mul(rhs),
        Operator::Divide => lhs.div(rhs),
+        Operator::IsDistinctFrom | Operator::IsNotDistinctFrom => {


can apply_operator() return NullableInterval::Null or NullableInterval::MaybeNull here? isn't it strange for isDistinct and isNotDistinct?

berkaysynnada · 2025-04-17T14:22:44Z

datafusion/expr-common/src/interval_arithmetic.rs

@@ -1690,6 +1705,24 @@ impl Display for NullableInterval {
    }
 }

+impl From<&Interval> for NullableInterval {


This conversion is dangerous. Null means different things in Interval and NullableInterval. Interval's can be converted to NotNull types only in the current behavior. Where do we need such a conversion, and why? can you elaborate a bit

berkaysynnada · 2025-04-17T14:32:13Z

datafusion/physical-expr/src/expressions/in_list.rs

+    ///
+    /// output interval:    [`true`, `true`]
+    /// ```
+    fn evaluate_bounds(&self, children: &[&Interval]) -> Result<Interval> {


My previous concern still applies. I think there should be 3 different paths of this logic:

expr_bounds is singleton:
check the inList expression separately. If any of them is also singleton, and has the same value with expr_bounds, then the result is CertainlyTrue

expr_bounds is not a singleton => intersect expr_bounds with all inList expressions.
a. if any of inList expressions intersect with expr_bounds, then the result is Uncertain
b. if any of inList does not intersect with expr_bounds, then the result is CertainlyFalse

Support temporal data types in interval arithmetics

a5f5729

github-actions bot added logical-expr Logical plan and expressions physical-expr Changes to the physical-expr crates optimizer Optimizer rules core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) common Related to common crate labels Feb 6, 2025

ch-sc added 3 commits February 6, 2025 13:42

fix test code

68784cc

cargo fmt

bc9eb4d

clean up

5960170

github-actions bot removed the sqllogictest SQL Logic Tests (.slt) label Feb 6, 2025

fix interval type coercion

fb1d681

ch-sc added 4 commits February 11, 2025 18:17

remove NotEq from supported operators

4ea39eb

Merge branch 'main' of github.com:apache/arrow-datafusion into suppor…

b873f50

…t-temporal-types-in-interval-arithmetics

fix imports

b1b8aec

cargo fmt

cb11459

revisit union interval logic

b4bd851

revisit union interval logic

aaf116b

ch-sc mentioned this pull request Feb 17, 2025

StatisticsV2: initial statistics framework redesign #14699

Merged

ch-sc added 2 commits February 18, 2025 13:38

treat null as unbounded

5d65ba1

clean up

4642bfa

csv source yields too many column stats

12f7b3c

berkaysynnada reviewed Feb 23, 2025

View reviewed changes

ch-sc added 5 commits February 24, 2025 18:11

addressing comments

d14747d

omit clone

d3810de

Merge branch 'main' of github.com:apache/arrow-datafusion into suppor…

0147f66

…t-temporal-types-in-interval-arithmetics

remove check

8813adb

UDF evaluate bounds default impl

f521a6a

check support before evaluating bounds

28803d6

github-actions bot added the functions Changes to functions implementation label Mar 26, 2025

ch-sc added 4 commits March 26, 2025 13:44

fmt

4d81d6c

Merge branch 'main' of github.com:apache/arrow-datafusion into suppor…

777d58e

…t-temporal-types-in-interval-arithmetics

fix after merge

d1994cc

fix doc test

318a35d

github-actions bot added the documentation Improvements or additions to documentation label Mar 26, 2025

ch-sc added 4 commits March 26, 2025 15:32

fix doc test

665d0c6

fix example code

a63dd05

clippy

0d73839

remove println

c232884

ch-sc added 2 commits April 9, 2025 13:23

move boolean_negate & rename arithmetic_negate

6f7fe72

Merge branch 'main' of github.com:apache/arrow-datafusion into suppor…

431c327

…t-temporal-types-in-interval-arithmetics

github-actions bot added the sql SQL Planner label Apr 9, 2025

fix negate test

ae257d1

berkaysynnada reviewed Apr 17, 2025

View reviewed changes

Support bounds evaluation for temporal data types #14523

Are you sure you want to change the base?

Support bounds evaluation for temporal data types #14523

Conversation

ch-sc commented Feb 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

berkaysynnada commented Feb 7, 2025

Uh oh!

ch-sc commented Feb 7, 2025

Uh oh!

berkaysynnada commented Feb 7, 2025

Uh oh!

ch-sc commented Feb 12, 2025

Uh oh!

berkaysynnada commented Feb 12, 2025

Uh oh!

ch-sc commented Feb 13, 2025

Uh oh!

ch-sc commented Feb 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ch-sc commented Feb 18, 2025

Uh oh!

berkaysynnada left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ch-sc commented Feb 25, 2025

Uh oh!

berkaysynnada commented Mar 16, 2025

Uh oh!

ch-sc commented Mar 17, 2025

Uh oh!

ch-sc commented Apr 9, 2025

Uh oh!

berkaysynnada commented Apr 10, 2025

Uh oh!

berkaysynnada commented Apr 11, 2025

Uh oh!

ch-sc commented Apr 11, 2025

Uh oh!

berkaysynnada commented Apr 17, 2025

Uh oh!

berkaysynnada left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ch-sc commented Feb 6, 2025 •

edited

Loading

ch-sc commented Feb 14, 2025 •

edited

Loading