-
Notifications
You must be signed in to change notification settings - Fork 1.5k
fix!: incorrect coercion when comparing with string literals #15482
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
// compare int col to string literal `i = '202410'` | ||
// Note this casts the column (not the field) | ||
create_expr_test(col("i").eq(lit("202410")), "CAST(i@1 AS Utf8) = 202410"); | ||
create_expr_test(lit("202410").eq(col("i")), "202410 = CAST(i@1 AS Utf8)"); | ||
create_expr_test(col("i").eq(lit("202410")), "i@1 = 202410"); | ||
create_expr_test(lit("202410").eq(col("i")), "202410 = i@1"); | ||
// however, when simplified the casts on i should removed | ||
// https://github.com/apache/datafusion/issues/14944 | ||
create_simplified_expr_test(col("i").eq(lit("202410")), "CAST(i@1 AS Utf8) = 202410"); | ||
create_simplified_expr_test(lit("202410").eq(col("i")), "CAST(i@1 AS Utf8) = 202410"); | ||
create_simplified_expr_test(col("i").eq(lit("202410")), "i@1 = 202410"); | ||
create_simplified_expr_test(lit("202410").eq(col("i")), "i@1 = 202410"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure if this test is still needed since the literal casting behavior is not considered an "optimization"
@@ -230,19 +230,19 @@ logical_plan TableScan: t projection=[a], full_filters=[t.a != Int32(100)] | |||
query TT | |||
explain select a from t where a = '99999999999'; | |||
---- | |||
logical_plan TableScan: t projection=[a], full_filters=[CAST(t.a AS Utf8) = Utf8("99999999999")] | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is there no plan?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought it's because it returns a plan_err!()
when the cast fails, but I'm not quite sure about that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can confirm that Postgres is also not able to plan for this query for the same reason: link to Postgres fiddle
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's file a ticket about the explain plan being missing
@@ -230,19 +230,19 @@ logical_plan TableScan: t projection=[a], full_filters=[t.a != Int32(100)] | |||
query TT | |||
explain select a from t where a = '99999999999'; | |||
---- | |||
logical_plan TableScan: t projection=[a], full_filters=[CAST(t.a AS Utf8) = Utf8("99999999999")] | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can confirm that Postgres is also not able to plan for this query for the same reason: link to Postgres fiddle
query TT | ||
explain select a from t where a = '99999999999'; | ||
---- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I imagine that instead of returning an empty plan, we should be expecting a runtime error here, something like
query TT | |
explain select a from t where a = '99999999999'; | |
---- | |
statement error Cannot coerce '...' to type '...' | |
explain select a from t where a = '99999999999'; |
I do not see any instance in the sql logic tests that are actually expecting an empty plan upon an error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is likely an EXPLAIN-related issue, but returning an error when no plan could be generated seems like a reasonable approach. I'm not entirely sure why it's implemented this way, maybe we should call for help 😆
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it is kind of wierd -- it would be nice to have at least some message in the explain plan if there was an error
Perhaps we can file a ticket to track it
I think we should update the test to just run the query (and expect an error) rather than EXPLAIN
it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
EDIT: @alamb EXPLAIN
works (only outputs physical plan) in the latest main branch, but I'm not sure if this behavior is expected 🤔
> create table t as select CAST(123 AS int) a;
0 row(s) fetched.
Elapsed 0.021 seconds.
> select * from t where a = '9999999999';
type_coercion
caused by
Error during planning: Cannot coerce '9999999999' to type 'Int32'
> explain select * from t where a = '9999999999';
+---------------+-------------------------------+
| plan_type | plan |
+---------------+-------------------------------+
| physical_plan | ┌───────────────────────────┐ |
| | │ CoalesceBatchesExec │ |
| | │ -------------------- │ |
| | │ target_batch_size: │ |
| | │ 8192 │ |
| | └─────────────┬─────────────┘ |
| | ┌─────────────┴─────────────┐ |
| | │ FilterExec │ |
| | │ -------------------- │ |
| | │ predicate: │ |
| | │ a = 9999999999 │ |
| | └─────────────┬─────────────┘ |
| | ┌─────────────┴─────────────┐ |
| | │ DataSourceExec │ |
| | │ -------------------- │ |
| | │ bytes: 160 │ |
| | │ format: memory │ |
| | │ rows: 1 │ |
| | └───────────────────────────┘ |
| | |
+---------------+-------------------------------+
Perhaps we can file a ticket to track it
Issue created: #15598
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(in case anyone else find this, you can get the full plan via EXPLAIN FORMAT INDENT ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @alan910127 and @gabotechs and @jayzhan211
I am not sure about the explicit checking for literals in type coercion logic -- I think coercion is supposed to be done based on types alone
Can we please add some more explicit tests of this new behavior
https://github.com/apache/datafusion/blob/main/datafusion/sqllogictest/test_files/type_coercion.slt
I am thinking queries like
select int_col = '123'
-- constant expressions
select int_col = '12'||'3'
query TT | ||
explain select a from t where a = '99999999999'; | ||
---- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it is kind of wierd -- it would be nice to have at least some message in the explain plan if there was an error
Perhaps we can file a ticket to track it
I think we should update the test to just run the query (and expect an error) rather than EXPLAIN
it
query TT | ||
explain select a from t where a = '99999999999'; | ||
---- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -230,19 +230,19 @@ logical_plan TableScan: t projection=[a], full_filters=[t.a != Int32(100)] | |||
query TT | |||
explain select a from t where a = '99999999999'; | |||
---- | |||
logical_plan TableScan: t projection=[a], full_filters=[CAST(t.a AS Utf8) = Utf8("99999999999")] | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's file a ticket about the explain plan being missing
|
||
# The predicate should still have the column cast when the value is a NOT valid i32 | ||
query TT | ||
explain select a from t where a = '99.99'; | ||
---- | ||
logical_plan TableScan: t projection=[a], full_filters=[CAST(t.a AS Utf8) = Utf8("99.99")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the comments in this file are now out of date -- so perhaps we can update them
let left_type = left.get_type(left_schema)?; | ||
let right_type = right.get_type(right_schema)?; | ||
|
||
match (&left, &right) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am surprise this code is only triggered for literals -- I think coercion is supposed to happen based on data types not on expressions. Among other things, this code won't handle expressions (date_col = '2025'||'-'||'02'
for example)
I think we should change the base coercion rules for types
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think both are valid -- duckdb supports something like select int_col = '12'||'3'
, while postgres returns an error since it only treats string literals as unknown
type (thus the filter was int = text
=> error). However, I think supporting expressions makes more sense semantically (I believe we don't have a concept like the unknown
type?).
5c0ea0d
to
d369a9d
Compare
Which issue does this PR close?
Rationale for this change
Currently, DataFusion handles comparisons between numbers and string literals differently from a number of databases. It coerces the number to a string, whereas other databases cast the literal to the column type and emit an error if the cast fails. This behavior can be unintuitive.
What changes are included in this PR?
Updated
TypeCoercionRewriter::coerce_binary_op
to cast string literals to the column type if one is present on either side of a comparison expression.Are these changes tested?
push_down_filter.slt
, someexplain
tests now produce no output when queries fail due to invalid casts. For now, I have updated these tests to expect empty output, but further adjustments may be needed.Are there any user-facing changes?
Yes. Queries that previously coerced numbers into strings will now fail if the string literal cannot be cast to the column type.
Example
Before this change (success)
After this change (error)