fix!: incorrect coercion when comparing with string literals #15482

alan910127 · 2025-03-28T20:28:49Z

Which issue does this PR close?

Closes Incorrect cast of integer columns to utf8 when comparing with utf8 constant #15161.

Rationale for this change

Currently, DataFusion handles comparisons between numbers and string literals differently from a number of databases. It coerces the number to a string, whereas other databases cast the literal to the column type and emit an error if the cast fails. This behavior can be unintuitive.

What changes are included in this PR?

Updated TypeCoercionRewriter::coerce_binary_op to cast string literals to the column type if one is present on either side of a comparison expression.

Are these changes tested?

Updated existing tests to reflect the new type coercion behavior.
In push_down_filter.slt, some explain tests now produce no output when queries fail due to invalid casts. For now, I have updated these tests to expect empty output, but further adjustments may be needed.

Are there any user-facing changes?

Yes. Queries that previously coerced numbers into strings will now fail if the string literal cannot be cast to the column type.

Example

Before this change (success)

> CREATE TABLE t AS SELECT CAST(123 AS int) a;
> SELECT * FROM t WHERE a = '2147483648'; -- Not a valid i32
+---+
| a |
+---+
+---+

After this change (error)

> CREATE TABLE t AS SELECT CAST(123 AS int) a;
> SELECT * FROM t WHERE a = '2147483648'; -- Not a valid i32
type_coercion
caused by
Error during planning: Cannot coerce '2147483648' to type 'Int32'

alan910127 · 2025-03-28T20:30:11Z

datafusion/core/tests/expr_api/mod.rs

    // compare int col to string literal `i = '202410'`
    // Note this casts the column (not the field)
-    create_expr_test(col("i").eq(lit("202410")), "CAST(i@1 AS Utf8) = 202410");
-    create_expr_test(lit("202410").eq(col("i")), "202410 = CAST(i@1 AS Utf8)");
+    create_expr_test(col("i").eq(lit("202410")), "i@1 = 202410");
+    create_expr_test(lit("202410").eq(col("i")), "202410 = i@1");
    // however, when simplified the casts on i should removed
    // https://github.com/apache/datafusion/issues/14944
-    create_simplified_expr_test(col("i").eq(lit("202410")), "CAST(i@1 AS Utf8) = 202410");
-    create_simplified_expr_test(lit("202410").eq(col("i")), "CAST(i@1 AS Utf8) = 202410");
+    create_simplified_expr_test(col("i").eq(lit("202410")), "i@1 = 202410");
+    create_simplified_expr_test(lit("202410").eq(col("i")), "i@1 = 202410");


not sure if this test is still needed since the literal casting behavior is not considered an "optimization"

jayzhan211 · 2025-03-29T01:54:03Z

datafusion/sqllogictest/test_files/push_down_filter.slt

@@ -230,19 +230,19 @@ logical_plan TableScan: t projection=[a], full_filters=[t.a != Int32(100)]
 query TT
 explain select a from t where a = '99999999999';
 ----
-logical_plan TableScan: t projection=[a], full_filters=[CAST(t.a AS Utf8) = Utf8("99999999999")]
+


why is there no plan?

I thought it's because it returns a plan_err!() when the cast fails, but I'm not quite sure about that.

I can confirm that Postgres is also not able to plan for this query for the same reason: link to Postgres fiddle

Let's file a ticket about the explain plan being missing

gabotechs · 2025-04-02T14:12:59Z

datafusion/sqllogictest/test_files/push_down_filter.slt

@@ -230,19 +230,19 @@ logical_plan TableScan: t projection=[a], full_filters=[t.a != Int32(100)]
 query TT
 explain select a from t where a = '99999999999';
 ----
-logical_plan TableScan: t projection=[a], full_filters=[CAST(t.a AS Utf8) = Utf8("99999999999")]
+


I can confirm that Postgres is also not able to plan for this query for the same reason: link to Postgres fiddle

gabotechs · 2025-04-02T14:22:10Z

datafusion/sqllogictest/test_files/push_down_filter.slt

 query TT
 explain select a from t where a = '99999999999';
 ----


I imagine that instead of returning an empty plan, we should be expecting a runtime error here, something like

Suggested change

query TT

explain select a from t where a = '99999999999';

----

statement error Cannot coerce '...' to type '...'

explain select a from t where a = '99999999999';

I do not see any instance in the sql logic tests that are actually expecting an empty plan upon an error.

I think this is likely an EXPLAIN-related issue, but returning an error when no plan could be generated seems like a reasonable approach. I'm not entirely sure why it's implemented this way, maybe we should call for help 😆

Yeah, it is kind of wierd -- it would be nice to have at least some message in the explain plan if there was an error

Perhaps we can file a ticket to track it

I think we should update the test to just run the query (and expect an error) rather than EXPLAIN it

BTW this is a new test added last week:
https://github.com/apache/datafusion/blame/190634bee1093d9d71786aa9c98ec207be05ea72/datafusion/sqllogictest/test_files/push_down_filter.slt#L231

in perf: unwrap cast for comparing ints =/!= strings #15110

EDIT: @alamb EXPLAIN works (only outputs physical plan) in the latest main branch, but I'm not sure if this behavior is expected 🤔

> create table t as select CAST(123 AS int) a; 0 row(s) fetched. Elapsed 0.021 seconds. > select * from t where a = '9999999999'; type_coercion caused by Error during planning: Cannot coerce '9999999999' to type 'Int32' > explain select * from t where a = '9999999999'; +---------------+-------------------------------+ | plan_type | plan | +---------------+-------------------------------+ | physical_plan | ┌───────────────────────────┐ | | | │ CoalesceBatchesExec │ | | | │ -------------------- │ | | | │ target_batch_size: │ | | | │ 8192 │ | | | └─────────────┬─────────────┘ | | | ┌─────────────┴─────────────┐ | | | │ FilterExec │ | | | │ -------------------- │ | | | │ predicate: │ | | | │ a = 9999999999 │ | | | └─────────────┬─────────────┘ | | | ┌─────────────┴─────────────┐ | | | │ DataSourceExec │ | | | │ -------------------- │ | | | │ bytes: 160 │ | | | │ format: memory │ | | | │ rows: 1 │ | | | └───────────────────────────┘ | | | | +---------------+-------------------------------+

Perhaps we can file a ticket to track it

~~Issue created: #15598~~

(in case anyone else find this, you can get the full plan via EXPLAIN FORMAT INDENT ...

alamb

Thank you @alan910127 and @gabotechs and @jayzhan211

I am not sure about the explicit checking for literals in type coercion logic -- I think coercion is supposed to be done based on types alone

Can we please add some more explicit tests of this new behavior
https://github.com/apache/datafusion/blob/main/datafusion/sqllogictest/test_files/type_coercion.slt

I am thinking queries like

select int_col = '123'
-- constant expressions
select int_col = '12'||'3'

alamb · 2025-04-03T13:22:14Z

datafusion/sqllogictest/test_files/push_down_filter.slt

 query TT
 explain select a from t where a = '99999999999';
 ----


Yeah, it is kind of wierd -- it would be nice to have at least some message in the explain plan if there was an error

Perhaps we can file a ticket to track it

I think we should update the test to just run the query (and expect an error) rather than EXPLAIN it

alamb · 2025-04-03T13:23:02Z

datafusion/sqllogictest/test_files/push_down_filter.slt

 query TT
 explain select a from t where a = '99999999999';
 ----


BTW this is a new test added last week:
https://github.com/apache/datafusion/blame/190634bee1093d9d71786aa9c98ec207be05ea72/datafusion/sqllogictest/test_files/push_down_filter.slt#L231

in perf: unwrap cast for comparing ints =/!= strings #15110

alamb · 2025-04-03T13:24:02Z

datafusion/sqllogictest/test_files/push_down_filter.slt

@@ -230,19 +230,19 @@ logical_plan TableScan: t projection=[a], full_filters=[t.a != Int32(100)]
 query TT
 explain select a from t where a = '99999999999';
 ----
-logical_plan TableScan: t projection=[a], full_filters=[CAST(t.a AS Utf8) = Utf8("99999999999")]
+


Let's file a ticket about the explain plan being missing

alamb · 2025-04-03T13:25:18Z

datafusion/sqllogictest/test_files/push_down_filter.slt


 # The predicate should still have the column cast when the value is a NOT valid i32
 query TT
 explain select a from t where a = '99.99';
 ----
-logical_plan TableScan: t projection=[a], full_filters=[CAST(t.a AS Utf8) = Utf8("99.99")]


I think the comments in this file are now out of date -- so perhaps we can update them

alamb · 2025-04-03T13:28:40Z

datafusion/optimizer/src/analyzer/type_coercion.rs

+        let left_type = left.get_type(left_schema)?;
+        let right_type = right.get_type(right_schema)?;
+
+        match (&left, &right) {


I am surprise this code is only triggered for literals -- I think coercion is supposed to happen based on data types not on expressions. Among other things, this code won't handle expressions (date_col = '2025'||'-'||'02' for example)

I think we should change the base coercion rules for types

I think both are valid -- duckdb supports something like select int_col = '12'||'3', while postgres returns an error since it only treats string literals as unknown type (thus the filter was int = text => error). However, I think supporting expressions makes more sense semantically (I believe we don't have a concept like the unknown type?).

fix!: incorrect coercion when comparing with string literals

d369a9d

github-actions bot added optimizer Optimizer rules core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) labels Mar 28, 2025

alan910127 commented Mar 28, 2025

View reviewed changes

jayzhan211 reviewed Mar 29, 2025

View reviewed changes

gabotechs reviewed Apr 2, 2025

View reviewed changes

alamb added the performance Make DataFusion faster label Apr 3, 2025

alamb reviewed Apr 3, 2025

View reviewed changes

alan910127 mentioned this pull request Apr 5, 2025

EXPLAIN only outputs an empty plan where there's a plan_err #15598

Closed

alamb mentioned this pull request Apr 7, 2025

Weekly Plan (Andrew Lamb) April 7, 2025 #15616

Closed

12 tasks

github-actions bot added the documentation Improvements or additions to documentation label Apr 14, 2025

alan910127 force-pushed the fix/int-utf8-cmp-coercion branch from 5c0ea0d to d369a9d Compare April 14, 2025 06:56

github-actions bot removed the documentation Improvements or additions to documentation label Apr 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix!: incorrect coercion when comparing with string literals #15482

fix!: incorrect coercion when comparing with string literals #15482

alan910127 commented Mar 28, 2025

alan910127 Mar 28, 2025

jayzhan211 Mar 29, 2025

alan910127 Mar 29, 2025 •

edited

Loading

gabotechs Apr 2, 2025

alamb Apr 3, 2025

gabotechs Apr 2, 2025

gabotechs Apr 2, 2025

alan910127 Apr 2, 2025

alamb Apr 3, 2025

alamb Apr 3, 2025

alan910127 Apr 5, 2025 •

edited

Loading

alamb Apr 10, 2025

alamb left a comment

alamb Apr 3, 2025

alamb Apr 3, 2025

alamb Apr 3, 2025

alamb Apr 3, 2025

alamb Apr 3, 2025

alan910127 Apr 5, 2025

fix!: incorrect coercion when comparing with string literals #15482

Are you sure you want to change the base?

fix!: incorrect coercion when comparing with string literals #15482

Conversation

alan910127 commented Mar 28, 2025

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Example

Before this change (success)

After this change (error)

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alan910127 Mar 29, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alan910127 Apr 5, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alamb left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alan910127 Mar 29, 2025 •

edited

Loading

alan910127 Apr 5, 2025 •

edited

Loading