Skip to content

fix!: incorrect coercion when comparing with string literals #15482

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

alan910127
Copy link
Contributor

Which issue does this PR close?

Rationale for this change

Currently, DataFusion handles comparisons between numbers and string literals differently from a number of databases. It coerces the number to a string, whereas other databases cast the literal to the column type and emit an error if the cast fails. This behavior can be unintuitive.

What changes are included in this PR?

Updated TypeCoercionRewriter::coerce_binary_op to cast string literals to the column type if one is present on either side of a comparison expression.

Are these changes tested?

  • Updated existing tests to reflect the new type coercion behavior.
  • In push_down_filter.slt, some explain tests now produce no output when queries fail due to invalid casts. For now, I have updated these tests to expect empty output, but further adjustments may be needed.

Are there any user-facing changes?

Yes. Queries that previously coerced numbers into strings will now fail if the string literal cannot be cast to the column type.

Example

Before this change (success)

> CREATE TABLE t AS SELECT CAST(123 AS int) a;
> SELECT * FROM t WHERE a = '2147483648'; -- Not a valid i32
+---+
| a |
+---+
+---+

After this change (error)

> CREATE TABLE t AS SELECT CAST(123 AS int) a;
> SELECT * FROM t WHERE a = '2147483648'; -- Not a valid i32
type_coercion
caused by
Error during planning: Cannot coerce '2147483648' to type 'Int32'

@github-actions github-actions bot added optimizer Optimizer rules core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) labels Mar 28, 2025
Comment on lines 331 to +338
// compare int col to string literal `i = '202410'`
// Note this casts the column (not the field)
create_expr_test(col("i").eq(lit("202410")), "CAST(i@1 AS Utf8) = 202410");
create_expr_test(lit("202410").eq(col("i")), "202410 = CAST(i@1 AS Utf8)");
create_expr_test(col("i").eq(lit("202410")), "i@1 = 202410");
create_expr_test(lit("202410").eq(col("i")), "202410 = i@1");
// however, when simplified the casts on i should removed
// https://github.com/apache/datafusion/issues/14944
create_simplified_expr_test(col("i").eq(lit("202410")), "CAST(i@1 AS Utf8) = 202410");
create_simplified_expr_test(lit("202410").eq(col("i")), "CAST(i@1 AS Utf8) = 202410");
create_simplified_expr_test(col("i").eq(lit("202410")), "i@1 = 202410");
create_simplified_expr_test(lit("202410").eq(col("i")), "i@1 = 202410");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if this test is still needed since the literal casting behavior is not considered an "optimization"

@@ -230,19 +230,19 @@ logical_plan TableScan: t projection=[a], full_filters=[t.a != Int32(100)]
query TT
explain select a from t where a = '99999999999';
----
logical_plan TableScan: t projection=[a], full_filters=[CAST(t.a AS Utf8) = Utf8("99999999999")]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is there no plan?

Copy link
Contributor Author

@alan910127 alan910127 Mar 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought it's because it returns a plan_err!() when the cast fails, but I'm not quite sure about that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can confirm that Postgres is also not able to plan for this query for the same reason: link to Postgres fiddle

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's file a ticket about the explain plan being missing

@@ -230,19 +230,19 @@ logical_plan TableScan: t projection=[a], full_filters=[t.a != Int32(100)]
query TT
explain select a from t where a = '99999999999';
----
logical_plan TableScan: t projection=[a], full_filters=[CAST(t.a AS Utf8) = Utf8("99999999999")]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can confirm that Postgres is also not able to plan for this query for the same reason: link to Postgres fiddle

Comment on lines 230 to 232
query TT
explain select a from t where a = '99999999999';
----
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I imagine that instead of returning an empty plan, we should be expecting a runtime error here, something like

Suggested change
query TT
explain select a from t where a = '99999999999';
----
statement error Cannot coerce '...' to type '...'
explain select a from t where a = '99999999999';

I do not see any instance in the sql logic tests that are actually expecting an empty plan upon an error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is likely an EXPLAIN-related issue, but returning an error when no plan could be generated seems like a reasonable approach. I'm not entirely sure why it's implemented this way, maybe we should call for help 😆

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it is kind of wierd -- it would be nice to have at least some message in the explain plan if there was an error

Perhaps we can file a ticket to track it

I think we should update the test to just run the query (and expect an error) rather than EXPLAIN it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

@alan910127 alan910127 Apr 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EDIT: @alamb EXPLAIN works (only outputs physical plan) in the latest main branch, but I'm not sure if this behavior is expected 🤔

> create table t as select CAST(123 AS int) a;
0 row(s) fetched.
Elapsed 0.021 seconds.

> select * from t where a = '9999999999';
type_coercion
caused by
Error during planning: Cannot coerce '9999999999' to type 'Int32'
> explain select * from t where a = '9999999999';
+---------------+-------------------------------+
| plan_type     | plan                          |
+---------------+-------------------------------+
| physical_plan | ┌───────────────────────────┐ |
|               | │    CoalesceBatchesExec    │ |
|               | │    --------------------   │ |
|               | │     target_batch_size:    │ |
|               | │            8192           │ |
|               | └─────────────┬─────────────┘ |
|               | ┌─────────────┴─────────────┐ |
|               | │         FilterExec        │ |
|               | │    --------------------   │ |
|               | │         predicate:        │ |
|               | │       a = 9999999999      │ |
|               | └─────────────┬─────────────┘ |
|               | ┌─────────────┴─────────────┐ |
|               | │       DataSourceExec      │ |
|               | │    --------------------   │ |
|               | │         bytes: 160        │ |
|               | │       format: memory      │ |
|               | │          rows: 1          │ |
|               | └───────────────────────────┘ |
|               |                               |
+---------------+-------------------------------+

Perhaps we can file a ticket to track it

Issue created: #15598

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(in case anyone else find this, you can get the full plan via EXPLAIN FORMAT INDENT ...

@alamb alamb added the performance Make DataFusion faster label Apr 3, 2025
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @alan910127 and @gabotechs and @jayzhan211

I am not sure about the explicit checking for literals in type coercion logic -- I think coercion is supposed to be done based on types alone

Can we please add some more explicit tests of this new behavior
https://github.com/apache/datafusion/blob/main/datafusion/sqllogictest/test_files/type_coercion.slt

I am thinking queries like

select int_col = '123'
-- constant expressions
select int_col = '12'||'3'

Comment on lines 230 to 232
query TT
explain select a from t where a = '99999999999';
----
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it is kind of wierd -- it would be nice to have at least some message in the explain plan if there was an error

Perhaps we can file a ticket to track it

I think we should update the test to just run the query (and expect an error) rather than EXPLAIN it

Comment on lines 230 to 232
query TT
explain select a from t where a = '99999999999';
----
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -230,19 +230,19 @@ logical_plan TableScan: t projection=[a], full_filters=[t.a != Int32(100)]
query TT
explain select a from t where a = '99999999999';
----
logical_plan TableScan: t projection=[a], full_filters=[CAST(t.a AS Utf8) = Utf8("99999999999")]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's file a ticket about the explain plan being missing


# The predicate should still have the column cast when the value is a NOT valid i32
query TT
explain select a from t where a = '99.99';
----
logical_plan TableScan: t projection=[a], full_filters=[CAST(t.a AS Utf8) = Utf8("99.99")]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the comments in this file are now out of date -- so perhaps we can update them

let left_type = left.get_type(left_schema)?;
let right_type = right.get_type(right_schema)?;

match (&left, &right) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am surprise this code is only triggered for literals -- I think coercion is supposed to happen based on data types not on expressions. Among other things, this code won't handle expressions (date_col = '2025'||'-'||'02' for example)

I think we should change the base coercion rules for types

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think both are valid -- duckdb supports something like select int_col = '12'||'3', while postgres returns an error since it only treats string literals as unknown type (thus the filter was int = text => error). However, I think supporting expressions makes more sense semantically (I believe we don't have a concept like the unknown type?).

@alamb alamb mentioned this pull request Apr 7, 2025
12 tasks
@github-actions github-actions bot added the documentation Improvements or additions to documentation label Apr 14, 2025
@alan910127 alan910127 force-pushed the fix/int-utf8-cmp-coercion branch from 5c0ea0d to d369a9d Compare April 14, 2025 06:56
@github-actions github-actions bot removed the documentation Improvements or additions to documentation label Apr 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate optimizer Optimizer rules performance Make DataFusion faster sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Incorrect cast of integer columns to utf8 when comparing with utf8 constant
4 participants