-
Notifications
You must be signed in to change notification settings - Fork 1.5k
feat: Add array_min
function support
#14417
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat: Add array_min
function support
#14417
Conversation
5640e5d
to
639f8ce
Compare
array_min
function support
c696cbb
to
9beb8b7
Compare
First of all, I'm not sure whether this function should be in datafusion core or datafusion-functions-extra. It seems this is not the "core" function that is supported in both Postgres or DuckDB. Since we are going to support Spark function, maybe we should move this function inside it #5600 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DuckDB has list_max, and our array semantics are supposed to model Duck's list semantics, thus it makes sense to add to DataFusion core.
Review-wise, let's do array_max well in #14470
and then return to this PR. It doesn't make sense to review the two in parallel, since most of the comments will be the same. For example, this PR still uses sort to get minimal element.
I would actually recommend closing this PR and creating a new afresh once array_max gets in, to avoid using old copy of the code. For
DuckDB offers many array functions, but that doesn’t mean we need to port all of them to DataFusion Core. Our focus should be on functions that are already supported in PostgreSQL (which are a must-have) or those with significant user interest that justify ongoing maintenance in DataFusion Core. |
9beb8b7
to
7a1992e
Compare
8b04dc2
to
8c45a60
Compare
Thanks @jayzhan211 and @findepi for the reviews. |
8c45a60
to
09ac944
Compare
I have applied incoming review feedbacks from |
50dfce5
to
b4421d4
Compare
hey @erenavsarogullari, are you still tracking merging this ? |
a02e19f
to
6870816
Compare
Yes, this PR is ready to be merged if there is not any other concern. |
fn return_type(&self, arg_types: &[DataType]) -> datafusion_common::Result<DataType> { | ||
match &arg_types[0] { | ||
List(field) => Ok(field.data_type().clone()), | ||
_ => exec_err!("Not reachable, data_type should be List"), | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please implement return_type_from_args
instead of this so you can return that the input is not nullable in case of non nullable input
6870816
to
31d0a78
Compare
Marking as draft as I think this PR is no longer waiting on feedback and I am trying to make it easier to find PRs in need of review. Please mark it as ready for review when it is ready for another look |
Which issue does this PR close?
Closes #14416.
What changes are included in this PR?
Currently, Spark, Snowflake and Presto support
array_min
function. This can also be useful for DataFusion.Spark: https://docs.databricks.com/en/sql/language-manual/functions/array_min.html
Snowflake: https://docs.snowflake.com/en/sql-reference/functions/array_min
Presto: https://prestodb.io/docs/current/functions/array.html#array_min-x-x
All potential use-cases have been covered like different
data_types
,empty array
,NULL
etc.Also, planning to add
array_max
function as follow-up.Are these changes tested?
Added new UT cases to verify
array_min
function in terms of different source arrays.Are there any user-facing changes?
Yes, new SQL function is supported and documentation has also be updated.