-
Notifications
You must be signed in to change notification settings - Fork 1.5k
fix: fold cast null to substrait typed null #15854
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Facing this same issue, +1 to this solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense to me -- thank you @discord9 and @gabotechs
@@ -1590,6 +1590,21 @@ pub fn from_cast( | |||
schema: &DFSchemaRef, | |||
) -> Result<Expression> { | |||
let Cast { expr, data_type } = cast; | |||
// since substrait Null must be typed, so if we see a cast(null, dt), we make it a typed null | |||
if let Expr::Literal(lit) = expr.as_ref() { | |||
if lit.is_null() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The one thing I'm not sure about is the usage of is_null()
here. I think the logic inside that works for non-compound types here:
pub fn is_null(&self) -> bool {
match self {
ScalarValue::Boolean(v) => v.is_none(),
ScalarValue::Null => true,
ScalarValue::Float16(v) => v.is_none(),
But for compound types I'm not sure if it does the right thing for the fold you're introducing:
ScalarValue::List(arr) => arr.len() == arr.null_count(),
ScalarValue::LargeList(arr) => arr.len() == arr.null_count(),
ScalarValue::FixedSizeList(arr) => arr.len() == arr.null_count(),
ScalarValue::Struct(arr) => arr.len() == arr.null_count(),
ScalarValue::Map(arr) => arr.len() == arr.null_count(),
because I don't think that it make sense to flatten empty compound types. If I understand how this would work, something like
Cast(List[], List<Int32>)
would get flattened into a null literal of type List<i32>
which is subtly wrong, because it should be an empty List<i32>
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To further clarify, at the SQL/Relational level
SELECT null::text[], array[]::text[]
text | array
---- | --
null | {}
we can distinguish between casting a null literal to a compound type, and the empty version of that compound type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To further clarify, at the SQL/Relational level
SELECT null::text[], array[]::text[] text | array ---- | -- null | {}we can distinguish between casting a null literal to a compound type, and the empty version of that compound type.
Oh I didn't realize is_null
do extra things other than matches with ScalarValue::Null
, I shall only match ScalarValue::Null
then
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vbarua re
But for compound types I'm not sure if it does the right thing for the fold you're introducing:
because I don't think that it make sense to flatten empty compound types. If I understand how this would work, something like
There is a comment in the code of "is_null" before those:
// arr.len() should be 1 for a list scalar, but we don't seem to // enforce that anywhere, so we still check against array length.
I haven't validated if that is correct, but I guess the idea is that a SV:List is null if the outer, "one-row" list itself is null. If that list is not null, then the inner list can be empty. So an empty compound type would have first the arr
with arr.len() = 1, and
arr.null_count() = 0`, passing the check; then arr.values(0) would be another list which can have its own length and null count (where e.g. both would then be zero for an empty list).
So unless I misunderstood what you meant, or there's some bug that I'm not aware of / or a difference in the behavior to what I think it is, I do believe the earlier code was also correct. However the change to only deal with SV:Null seems good to me as well.
(I do find it annoying how complicated the compound types for SV are to understand. Why is SV::List a single-row ListArray instead of just a Vec ? But maybe there's a reason, and in any case changing that is probably too much of a hassle now, and totally outside the scope of this PR 😄 )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really grok the is_null
logic for compound types super well.
I did test locally with
let sv = ScalarValue::List(Arc::new(GenericListArray::new_null(
Field::new_list_field(DataType::Float32, true).into(),
1,
)));
let target_type = DataType::List(
Field::new_list_field(DataType::Int32, true).into()
);
let expr = Expr::Literal(sv)
.cast_to(&target_type, &empty_schema)
.unwrap();
and was able to trigger the folding behaviour, which is undesirable... I think?
@discord9 it might be good to capture that we shouldn't fold this as a test.
@vbarua might want to take a look again whether this fix is right |
I think the CI tests will pass once we merge up from main |
d6634e3
to
b9d1b90
Compare
Realized |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Matching ScalarValue::null explicitly works
Which issue does this PR close?
NULL::<Data type>
can't be encode to substrait #15855Rationale for this change
fix a error where
SELECT NULL::DOUBLE
can't be encode to substraitWhat changes are included in this PR?
replace
cast(NULL, <data type>)
to typed null in substraitAre these changes tested?
unit tests are added, also downstream code also have test them
Are there any user-facing changes?
shouldn't be any, simply a constant folding shouldn't affect downstream system