-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Deprecate ScalarUDFImpl::return_type #13717
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -173,7 +173,9 @@ impl ScalarUDF { | |
/// its [`ScalarUDFImpl::return_type`] should raise an error. | ||
/// | ||
/// See [`ScalarUDFImpl::return_type`] for more details. | ||
#[deprecated(since = "44.0.0", note = "Use return_type_from_exprs() instead")] | ||
pub fn return_type(&self, arg_types: &[DataType]) -> Result<DataType> { | ||
#[allow(deprecated)] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Shouldn't we also deprecate this ( There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes |
||
self.inner.return_type(arg_types) | ||
} | ||
|
||
|
@@ -450,6 +452,7 @@ pub trait ScalarUDFImpl: Debug + Send + Sync { | |
/// is recommended to return [`DataFusionError::Internal`]. | ||
/// | ||
/// [`DataFusionError::Internal`]: datafusion_common::DataFusionError::Internal | ||
#[deprecated(since = "44.0.0", note = "Use `return_type_from_exprs` instead")] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. if we are going to deprecate this API I think we should add an example to I can help if we proceed |
||
fn return_type(&self, arg_types: &[DataType]) -> Result<DataType>; | ||
|
||
/// What [`DataType`] will be returned by this function, given the | ||
|
@@ -483,6 +486,7 @@ pub trait ScalarUDFImpl: Debug + Send + Sync { | |
_schema: &dyn ExprSchema, | ||
arg_types: &[DataType], | ||
) -> Result<DataType> { | ||
#[allow(deprecated)] | ||
self.return_type(arg_types) | ||
} | ||
|
||
|
@@ -756,6 +760,7 @@ impl ScalarUDFImpl for AliasedScalarUDFImpl { | |
} | ||
|
||
fn return_type(&self, arg_types: &[DataType]) -> Result<DataType> { | ||
#[allow(deprecated)] | ||
self.inner.return_type(arg_types) | ||
} | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @goldmedal for ideas how (if) we will be able to resolve this deprecation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for reminding this. I filed #13735 to trace it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While reading #13735 and thinking about the churn we had recently with
invoke_with_batch
andinvoke_with_args
, etcWhat if we made a new API that could accomodate both usecases. Something like
🤔
This would also let us add other fields to the
ReturnTypeArgs
structure over time with less API churnThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we change the signature to
return_type(ReturnTypeArgs)
and removereturn_type_with_args
at allI was thinking about this too but not sure this huge breaking change is acceptable or not, but I think this will be a better change in long term
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree
return_type(ReturnTypeArgs)
would be the best in the long termHowever, in the short term (next 6 months) I think it would be nicer to downstream crates / users to add a new function
return_type_with_args
and leave#deprecation
notices onreturn_type
andreturn_type_from_exprs
directing people toreturn_type_with_args
Then once the deprecation period has passed we could rename / introduce a new function called
return_type
🤔There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think one of the challenge of #13825 and #12604 is that the properties of
Expr
including data type is determined based onschema
. Unless theschema
is the same all the way of the processing otherwise we need to recompute the properties based on givenschema
so we can't compute once and store the information withinExpr
. IntroduceMap
for schema -> properties mapping seems not practically.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example, if we have column
c
, we needschema
to know it's data type isInt32
. With anotherschema
, it may becomesInt64
. How do we ensure the schema is not changed and we can happily reuse the result we had before?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point.
The other way to view this is -- what does an Expr represent?
is it a syntactical expression (doesn't know types, can be applied to multiple different inputs), or a semantical expression (anchored in the evaluation context, knows types).
The SQL handling process goes from syntax to semantics, and expressions (Expr) built in dataframe layer are definitely syntactical, not semantical.
This may be a challenge for #13825, but less so for #12604. If we have new IR with separate Expr types, it won't be used in contexts where we need syntactical expressions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I review about this part again and my conclusion is that
Expr
should be the same as it is now, andschema
is separated concept other thanExpr
, with them we compute the corresponding metadata like data type and nullability. Therefore, I still think we needreturn_type_with_args
for solving this issue.The issue mentioned in #12604 should be solved in another way. I think we can have such information stored in
LogicalPlan
instead.Expr
has no type info until the schema is determined and it is when we create correspondingLogicalPlan
fromExpr
andSchema
.LogicalPlan
can be considered as the Container ofExpr
+Schema
, whenever the schema is updated orExpr
is rewritten, we recompute the properties ofExpr
. If nothing changed, we can reuse such computed properties.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found that we don't even need
args: Vec<Expr>
, what we need areVec<String>
.