Skip to content

feat: Add array_min function support #14417

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

erenavsarogullari
Copy link
Member

@erenavsarogullari erenavsarogullari commented Feb 3, 2025

Which issue does this PR close?

Closes #14416.

What changes are included in this PR?

Currently, Spark, Snowflake and Presto support array_min function. This can also be useful for DataFusion.

array_min(make_array(3,1,4,2)) => 1
array_min(make_array('h','e','l','l',NULL,'o')) => e
array_min(make_array(NULL,NULL)) => NULL
select input, array_min(input) from 
   (select make_array(d - 1, d, d + 1) input from (values (10), (NULL)) t(d)) =>
----
[9, 10, 11] 9
[NULL, NULL, NULL] NULL

Spark: https://docs.databricks.com/en/sql/language-manual/functions/array_min.html
Snowflake: https://docs.snowflake.com/en/sql-reference/functions/array_min
Presto: https://prestodb.io/docs/current/functions/array.html#array_min-x-x

All potential use-cases have been covered like different data_types, empty array, NULL etc.

Also, planning to add array_max function as follow-up.

Are these changes tested?

Added new UT cases to verify array_min function in terms of different source arrays.

Are there any user-facing changes?

Yes, new SQL function is supported and documentation has also be updated.

@github-actions github-actions bot added documentation Improvements or additions to documentation sqllogictest SQL Logic Tests (.slt) labels Feb 3, 2025
@erenavsarogullari erenavsarogullari force-pushed the array_min_function branch 3 times, most recently from 5640e5d to 639f8ce Compare February 4, 2025 02:37
@erenavsarogullari erenavsarogullari changed the title feat: Add array_min function feat: Add array_min function support Feb 4, 2025
@erenavsarogullari erenavsarogullari force-pushed the array_min_function branch 2 times, most recently from c696cbb to 9beb8b7 Compare February 6, 2025 05:00
@jayzhan211
Copy link
Contributor

jayzhan211 commented Feb 7, 2025

First of all, I'm not sure whether this function should be in datafusion core or datafusion-functions-extra. It seems this is not the "core" function that is supported in both Postgres or DuckDB.

Since we are going to support Spark function, maybe we should move this function inside it #5600

Copy link
Member

@findepi findepi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DuckDB has list_max, and our array semantics are supposed to model Duck's list semantics, thus it makes sense to add to DataFusion core.

Review-wise, let's do array_max well in #14470
and then return to this PR. It doesn't make sense to review the two in parallel, since most of the comments will be the same. For example, this PR still uses sort to get minimal element.

I would actually recommend closing this PR and creating a new afresh once array_max gets in, to avoid using old copy of the code. For

@jayzhan211
Copy link
Contributor

jayzhan211 commented Feb 7, 2025

DuckDB has list_max, and our array semantics are supposed to model Duck's list semantics, thus it makes sense to add to DataFusion core.

Review-wise, let's do array_max well in #14470 and then return to this PR. It doesn't make sense to review the two in parallel, since most of the comments will be the same. For example, this PR still uses sort to get minimal element.

I would actually recommend closing this PR and creating a new afresh once array_max gets in, to avoid using old copy of the code. For

DuckDB offers many array functions, but that doesn’t mean we need to port all of them to DataFusion Core. Our focus should be on functions that are already supported in PostgreSQL (which are a must-have) or those with significant user interest that justify ongoing maintenance in DataFusion Core.

@github-actions github-actions bot added the functions Changes to functions implementation label Feb 8, 2025
@erenavsarogullari erenavsarogullari force-pushed the array_min_function branch 3 times, most recently from 8b04dc2 to 8c45a60 Compare February 8, 2025 17:51
@erenavsarogullari
Copy link
Member Author

Thanks @jayzhan211 and @findepi for the reviews.
Updated this PR in terms of previous feedback from array_max PR: #14470
Please also find my comment for module selection for both functions: #14470 (comment)

@erenavsarogullari
Copy link
Member Author

erenavsarogullari commented Mar 2, 2025

I have applied incoming review feedbacks from array_max PR: #14470 to here as well. Just FYI.

@erenavsarogullari erenavsarogullari force-pushed the array_min_function branch 4 times, most recently from 50dfce5 to b4421d4 Compare March 9, 2025 09:45
@cht42
Copy link
Contributor

cht42 commented Apr 9, 2025

hey @erenavsarogullari, are you still tracking merging this ?

@erenavsarogullari erenavsarogullari force-pushed the array_min_function branch 2 times, most recently from a02e19f to 6870816 Compare April 9, 2025 16:18
@erenavsarogullari
Copy link
Member Author

erenavsarogullari commented Apr 9, 2025

hey @erenavsarogullari, are you still tracking merging this ?

Yes, this PR is ready to be merged if there is not any other concern. array_max function (#14470) has been merged. array_min function is conjugate of it.

Comment on lines +95 to +100
fn return_type(&self, arg_types: &[DataType]) -> datafusion_common::Result<DataType> {
match &arg_types[0] {
List(field) => Ok(field.data_type().clone()),
_ => exec_err!("Not reachable, data_type should be List"),
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please implement return_type_from_args instead of this so you can return that the input is not nullable in case of non nullable input

@alamb
Copy link
Contributor

alamb commented May 7, 2025

Marking as draft as I think this PR is no longer waiting on feedback and I am trying to make it easier to find PRs in need of review. Please mark it as ready for review when it is ready for another look

@alamb alamb marked this pull request as draft May 7, 2025 20:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation functions Changes to functions implementation sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add array_min function support
6 participants