User Defined Table Function (udtf) support #2177

gandronchik · 2022-04-08T12:06:56Z

UDTF support (User-defined functions returning table)

In my understanding table function returns multiple rows. For now, we have only UDF which returns a scalar value.

I don't think it should return multiply columns, structures are usually used for this.

we have the following cases:

1. select table_fun(1, 5);

generate_series(Int64(1),Int64(5))`
------------------------------------
                                  1
                                  2
                                  3
                                  4
                                  5
(5 rows)

 Projection: #generate_series(Int64(1),Int64(5)) +
   TableUDFs: generate_series(Int64(1), Int64(5))+
     EmptyRelation

it is the easiest scenario. The function just returns vec of values.

2. select table_fun(1, col) from (select 2 col union all select 3 col) t;

 generate_series(Int64(1),t.col)
---------------------------------
                               1
                               2
                               3
                               1
                               2
(5 rows)

Projection: #generate_series(Int64(1),t.col)  +
   TableUDFs: generate_series(Int64(1), #t.col)+
     Projection: #t.col, alias=t               +
       Union                                   +
         Projection: Int64(2) AS col           +
           EmptyRelation                       +
         Projection: Int64(3) AS col           +
           EmptyRelation

The function returns a batch.

3. select col, table_fun(1, col) from (select 2 col union all select 3 col) t;

col | generate_series(Int64(1),t.col)
-----+---------------------------------
   3 |                               1
   3 |                               2
   3 |                               3
   2 |                               1
   2 |                               2
(5 rows)

Projection: #t.col, #generate_series(Int64(1),t.col)+
   TableUDFs: generate_series(Int64(1), #t.col)      +
     Projection: #t.col, alias=t                     +
       Union                                         +
         Projection: Int64(2) AS col                 +
           EmptyRelation                             +
         Projection: Int64(3) AS col                 +
           EmptyRelation

it is the most difficult case. In this case, we have to transform data flow, because as you can see from the result, we have to duplicate col for each row of table_fun result.

4. select * from table_fun(1, 5);

 generate_series(Int64(1),Int64(5))
------------------------------------
                                  1
                                  2
                                  3
                                  4
                                  5
(5 rows)

Projection: #generate_series(Int64(1),Int64(5)) +
   TableUDFs: generate_series(Int64(1), Int64(5))+
     EmptyRelation

In this case, in this case, the result is the same as in the first case. However, we have another plan structure here.

5. select * from table_fun(1, 5) t(n);

 n
---
 1
 2
 3
 4
 5
(5 rows)

Projection: #t.n                                               +
   Projection: #generate_series(Int64(1),Int64(5)) AS n, alias=t+
     TableUDFs: generate_series(Int64(1), Int64(5))             +
       EmptyRelation

It looks the same with the previous case, however we have a bit different plan here to support alias (because table_fun node not support aliases and we have to add projection).

Regarding signature, I decided to use a single vector and vector with sizes of sections instead of vec of vecs to have better performance. If we use Vec, this will require a lot of memory in case of a request for millions of rows.

xudong963 · 2022-04-09T12:00:34Z

datafusion/core/src/datasource/listing/helpers.rs

@@ -99,6 +99,7 @@ impl ExpressionVisitor for ApplicabilityVisitor<'_> {
            Expr::ScalarUDF { fun, .. } => {
                self.visit_volatility(fun.signature.volatility)
            }
+            Expr::TableUDF { fun, .. } => self.visit_volatility(fun.signature.volatility),


I recommend writing it like this:

Expr::ScalarUDF { fun, .. } | Expr::TableUDF { fun, .. } => { self.visit_volatility(fun.signature.volatility) }

good point, however, it doesn't work in this case (because argument fun has different types for TableUDF and ScalarUDF)

xudong963 · 2022-04-09T12:04:41Z

datafusion/core/src/optimizer/simplify_expressions.rs

@@ -381,6 +381,7 @@ impl<'a> ConstEvaluator<'a> {
            | Expr::QualifiedWildcard { .. } => false,
            Expr::ScalarFunction { fun, .. } => Self::volatility_ok(fun.volatility()),
            Expr::ScalarUDF { fun, .. } => Self::volatility_ok(fun.signature.volatility),
+            Expr::TableUDF { .. } => false,


xudong963 · 2022-04-09T12:07:09Z

BTW, from clippy:

error: unneeded `return` statement
   --> datafusion/core/src/physical_plan/functions.rs:752:9
    |
752 |         return Ok(ColumnarValue::Array(result));
    |         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ help: remove `return`: `Ok(ColumnarValue::Array(result))`
    |
    = note: `-D clippy::needless-return` implied by `-D warnings`
    = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#needless_return

error: could not compile `datafusion` due to previous error
warning: build failed, waiting for other jobs to finish...
error: called `.nth(0)` on a `std::iter::Iterator`, when `.next()` is equivalent
    --> datafusion/core/src/execution/context.rs:3527:32
     |
3527 |             let start_number = start_arr.into_iter().nth(0).unwrap().unwrap_or(0);
     |                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ help: try calling `.next()` instead of `.nth(0)`: `start_arr.into_iter().next()`
     |
     = note: `-D clippy::iter-nth-zero` implied by `-D warnings`
     = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#iter_nth_zero

error: called `.nth(0)` on a `std::iter::Iterator`, when `.next()` is equivalent
    --> datafusion/core/src/execution/context.rs:3533:30
     |
3533 |             let end_number = end_arr.into_iter().nth(0).unwrap().unwrap_or(0) + 1;
     |                              ^^^^^^^^^^^^^^^^^^^^^^^^^^ help: try calling `.next()` instead of `.nth(0)`: `end_arr.into_iter().next()`
     |
     = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#iter_nth_zero

error: build failed

doki23 · 2022-04-10T07:44:34Z

Hmm, is TableFunction an expression 🤔?
refer to https://docs.snowflake.com/en/sql-reference/functions-table.html
The sql looks usually like

select doi.date as "Date", record_temperatures.city, record_temperatures.temperature
    from dates_of_interest as doi,
         table(record_high_temperatures_for_date(doi.date)) as record_temperatures;

It shouldn't be an expression, right?

alamb

Thank you @gandronchik -- sorry for the delay in review. I think this PR is looking quite good 👌

Epic first PR

Would it be possible to add a test for a table function that gets no arguments (as there is code to handle that case, but I don't see coverage)?

I also had one relatively minor question related to zero argument handling; Really nice.

Also it would be nice to add a note about supporting Table Functions in https://github.com/apache/arrow-datafusion/blob/master/docs/source/user-guide/sql/sql_status.md (but we can do so as a follow on PR)

Does anyone else have questions or concerns about merging this PR?

cc @andygrove @liukun4515 @yjshen

alamb · 2022-04-13T18:29:46Z

datafusion/core/src/physical_plan/udtf.rs

+// specific language governing permissions and limitations
+// under the License.
+
+//! UDTF support


Suggested change

//! UDTF support

//! User Defined Table Function (UDTF) support

alamb · 2022-04-13T18:31:16Z

datafusion/expr/src/udtf.rs

+// specific language governing permissions and limitations
+// under the License.
+
+//! Udtf module contains foundational types that are used to represent UDTFs in DataFusion.


Suggested change

//! Udtf module contains foundational types that are used to represent UDTFs in DataFusion.

//! Contains foundational types that are used to represent User Defined Table Functions (UDTFs) in DataFusion.

alamb · 2022-04-13T18:34:50Z

datafusion/physical-expr/src/functions.rs

+    fn evaluate(&self, batch: &RecordBatch) -> Result<ColumnarValue> {
+        // evaluate the arguments, if there are no arguments we'll instead pass in a null array
+        // indicating the batch size (as a convention)
+        let inputs = match (self.args.len(), self.name.parse::<BuiltinScalarFunction>()) {


I don't understand why we are parsing the table function name using BuiltinScalarFunction? Don't we already have self.fun?

doki23 · 2022-04-14T08:46:17Z

Hmmmm...I have some problems about this pr.
If we treat UDTF as an expression, does it mean that it can only produce one column?
As I mentioned before (#2177 (comment)), it's more like a table so that we can select * from it and get any number of columns.
I'm confused, would you please explain it to me? @alamb @gandronchik

thinkharderdev · 2022-04-15T10:08:01Z

Hmmmm...I have some problems about this pr. If we treat UDTF as an expression, does it mean that it can only produce one column? As I mentioned before (#2177 (comment)), it's more like a table so that we can select * from it and get any number of columns. I'm confused, would you please explain it to me? @alamb @gandronchik

I had the same question. I'm not sure I understand how this is different from a scalar function. It seems like a table function should produce RecordBatchs and effectively compile down to an ExecutionPlan.

alamb · 2022-04-15T14:44:05Z

It seems like a table function should produce RecordBatchs and effectively compile down to an ExecutionPlan.

I agree it should definitely produce RecordBatch

gandronchik · 2022-04-15T18:16:03Z

what about Result<Vec<ColumnarValue>>. I already almost implemented it this way:)

It seems like a table function should produce RecordBatchs and effectively compile down to an ExecutionPlan.

I agree it should definitely produce RecordBatch

thinkharderdev · 2022-04-15T20:15:19Z

what about Result<Vec<ColumnarValue>>. I already almost implemented it this way:)

It seems like a table function should produce RecordBatchs and effectively compile down to an ExecutionPlan.

I agree it should definitely produce RecordBatch

That's essentially a RecordBatch :)

You could have

pub type TableFunctionImplementation =
    Arc<dyn Fn(&[ColumnarValue]) -> Result<Vec<ColumnarValue>> + Send + Sync>;

// This is a terrible name but this would be analogous to ReturnTypeFunction/StateTypeFunction
pub type TableSchemaFunction = 
    Arc<dyn Fn(&[DataType]) -> Result<SchemaRef> + Send + Sync>;

Ted-Jiang · 2022-04-26T13:15:41Z

@alamb @thinkharderdev @doki23 i met the same problem in #2343

if we treat it as a Expr , we need change it to PhysicalExpr but

/// Evaluate an expression against a RecordBatch
    fn evaluate(&self, batch: &RecordBatch) -> Result<ColumnarValue>;

pub enum ColumnarValue {
    /// Array of values
    Array(ArrayRef),
    /// A single value
    Scalar(ScalarValue),
}

cause of it return ColumnarValue, we can not return result as a table, am i right?

Should i implement a TablePhysicalExpr
using

  fn evaluate(&self, batch: &RecordBatch) -> Result<Vec<ColumnarValue>>;

alamb · 2022-04-26T20:34:28Z

@alamb @thinkharderdev @doki23 i met the same problem in #2343

I left some thoughts in

#2343 (comment)

…always returns array

alamb · 2022-04-27T18:53:42Z

I plan to give this a more careful review tomorrow

Ted-Jiang · 2022-04-28T06:50:30Z

datafusion/expr/src/function.rs

@@ -39,6 +40,10 @@ use std::sync::Arc;
 pub type ScalarFunctionImplementation =
    Arc<dyn Fn(&[ColumnarValue]) -> Result<ColumnarValue> + Send + Sync>;

+/// Table function. Second tuple
+pub type TableFunctionImplementation =
+    Arc<dyn Fn(&[ColumnarValue], usize) -> Result<(ArrayRef, Vec<usize>)> + Send + Sync>;


as ArrayRef is one of ColumnarValue

pub enum ColumnarValue { /// Array of values Array(ArrayRef), /// A single value Scalar(ScalarValue), }

I think TableFunctionImplementation is same as ScalarFunctionImplementation .
And it only generate table N*1 , if we use as #2177 (comment)

Arc<dyn Fn(&[ColumnarValue], usize) -> Result<(Vec< ColumnarValue >, Vec<usize>)> + Send + Sync>;

We could generate N*M table
If im wrong plz correct me?

Or in this case it can generate N*M table

I am also a little mystified by this signature

. It looks like "Second tuple" was the start of a thought that didn't get finished? I also don't understand what the usize in the tuple represents -- perhaps you can add some comments explaining its purpose?

Also, I agree with @Ted-Jiang 's analysis -- I would expect this signature to return a "table" (aka a RecordBatch or a Vec<ColumnarValue> if preferred

Perhaps something like

Arc<dyn Fn(&[ColumnarValue]) -> Result<RecordBatch> + Send + Sync>;

or

Arc<dyn Fn(&[ColumnarValue]) -> Result<Vec<ColumnarValue>> + Send + Sync>;

I guess that @gandronchik wants to chain each result(ArrayRef) of TableFunctionImplementation into a multi-column result (see the code in TableFunStream::batch), which may mean the table udf consists of multi exprs. The reason should be trait PhysicalExpr only provides fn evaluate(&self, batch: &RecordBatch) -> Result<ColumnarValue>. But I agree that Arc<dyn Fn(&[ColumnarValue]) -> Result<Vec<ColumnarValue>> + Send + Sync> is more proper. So I believe that the approach may be directly invoke the table udf in the TableFunStream without implementing trait PhysicalExpr for it, or adding fn evaluate(&self, batch: &RecordBatch) -> Result<Vec<ColumnarValue>> for PhysicalExpr.

I updated the header of PR. Hope it is clear enough now:)

alamb

First of all, again thank you @gandronchik for this contribution

If you are implementing a table function I would expect it to be able to return multiple rows and columns. I think this PR only implements a table function that produces multiple rows out

It may be that I have a different understanding of "table function" than you are trying to implement. A writeup up of what you are trying to do (not how you are implementing it) would likely help this conversation forward.

As I am familiar with Table Functions, they are a little tricky as they can change the cardinality and schema of their input, and thus database systems restrict where in queries they may appear.

I think typical uses are in the FROM clause and in SELECT clause. I wonder if that sounds similar to what you are trying to do?

alamb · 2022-04-28T19:29:26Z

datafusion/core/src/execution/context.rs

+        let result = plan_and_collect(&ctx, "SELECT integer_series(1,5)").await?;
+
+        let expected = vec![
+            "+-----------------------------------+",
+            "| integer_series(Int64(1),Int64(5)) |",
+            "+-----------------------------------+",
+            "| 1                                 |",
+            "| 2                                 |",
+            "| 3                                 |",
+            "| 4                                 |",
+            "| 5                                 |",
+            "+-----------------------------------+",
+        ];


This is a good example of a UDT producing more row than went in 👍

Would it be possible to write an example that also produces a different number of columns than went in? I think that is what @Ted-Jiang and I are pointing out in in our comments below

I didn't support it. You can use structures for that

alamb · 2022-04-28T19:30:41Z

datafusion/core/src/execution/context.rs

+        assert_batches_eq!(expected, &result);
+
+        let result =
+            plan_and_collect(&ctx, "SELECT * from integer_series(1,5) pos(n)").await?;


Can you explain what this test is supposed to be demonstrating? I am not quite sure what it shows

I have just explained it in the header of PR. Hope I did it clear enough:)

alamb · 2022-04-28T19:33:57Z

datafusion/expr/src/function.rs

@@ -39,6 +40,10 @@ use std::sync::Arc;
 pub type ScalarFunctionImplementation =
    Arc<dyn Fn(&[ColumnarValue]) -> Result<ColumnarValue> + Send + Sync>;

+/// Table function. Second tuple
+pub type TableFunctionImplementation =
+    Arc<dyn Fn(&[ColumnarValue], usize) -> Result<(ArrayRef, Vec<usize>)> + Send + Sync>;


I am also a little mystified by this signature

. It looks like "Second tuple" was the start of a thought that didn't get finished? I also don't understand what the usize in the tuple represents -- perhaps you can add some comments explaining its purpose?

Also, I agree with @Ted-Jiang 's analysis -- I would expect this signature to return a "table" (aka a RecordBatch or a Vec<ColumnarValue> if preferred

Perhaps something like

Arc<dyn Fn(&[ColumnarValue]) -> Result<RecordBatch> + Send + Sync>;

or

Arc<dyn Fn(&[ColumnarValue]) -> Result<Vec<ColumnarValue>> + Send + Sync>;

doki23 · 2022-04-29T11:01:21Z

I don't think it should return multiply columns, structures are usually used for this.

I cannot agree. Result of Table Function represents a temporary table. Since it's a table, it shouldn't only have one column. Of course, one column of type structure can solve the problem, but it's different. We cannot directly execute order by or other query on it if we don't extract the structure.

alamb · 2022-04-29T18:38:15Z

@gandronchik thank you for the explanation in this PR's description. It helps though I will admit I still don't fully understand what is going o.

I agree with @doki23 -- I expect a table function to logically return a table (that something with both rows and columns)

Regarding signature, I decided to use a single vector and vector with sizes of sections instead of vec of vecs to have better performance. If we use Vec, this will require a lot of memory in case of a request for millions of rows.

The way the rest of DataFusion avoids buffering all the intermediate results at once int memory is with Streams but then that requires interacting with rust's async ecosystem which is non trivial

If you wanted a streaming solution, that would mean the signature might look something like the following (maybe)

Arc<dyn Fn(Box<dyn SendableRecordBatchStream>) -> Result<Box<dyn SendableRecordBatchStream>> + Send + Sync>;

gandronchik · 2022-05-05T12:49:07Z

@gandronchik thank you for the explanation in this PR's description. It helps though I will admit I still don't fully understand what is going o.

I agree with @doki23 -- I expect a table function to logically return a table (that something with both rows and columns)

Regarding signature, I decided to use a single vector and vector with sizes of sections instead of vec of vecs to have better performance. If we use Vec, this will require a lot of memory in case of a request for millions of rows.

The way the rest of DataFusion avoids buffering all the intermediate results at once int memory is with Streams but then that requires interacting with rust's async ecosystem which is non trivial

If you wanted a streaming solution, that would mean the signature might look something like the following (maybe)
Arc<dyn Fn(Box<dyn SendableRecordBatchStream>) -> Result<Box<dyn SendableRecordBatchStream>> + Send + Sync>;

Looks like I got the title wrong. I have implemented a function that returns many rows, probably it is not a table function. If I rename it, will it be fine?

Regarding the function signature, I think my solution is a compromise between vec and streaming. Actually, I don't think that function can return so many rows. However, of course, I will rewrite it if you want. So which solution do we choose: current Result<(ArrayRef, Vec<usize>)> + Send + Sync>, Result<Vec<ColumnarValue>> + Send + Sync> or Result<Box< dyn SendableRecordBatchStream>> + Send + Sync> ?

alamb · 2022-05-24T13:55:18Z

I think adding UDTFs (aka user defined table functions) that produce a 2 dimensional table output (aka Vec<RecordBatch> or a SendableRecordBatchStream) would be a valuable addition to DataFusion.

I think Spark calls these "table value functions":

https://docs.databricks.com/spark/latest/spark-sql/language-manual/sql-ref-syntax-qry-select-tvf.html

Postgres calls them table functions:

https://www.postgresql.org/docs/7.3/xfunc-tablefunctions.html

However, this PR does not implement table functions that I can see. I still don't fully understand the usecase for the code in this PR for a function that returns a single column of values and I don't know of any other system that implements such functions. Thus I feel that this PR adds a feature that is not widely usable to DataFusion users as a whole, and so I don't feel I can approve it.

If others (users or maintainers) have a perspective on this issue, I would love to hear them too. If there is broader support for this feature, I won't oppose merging it.

alamb · 2022-06-07T17:22:14Z

marking as draft until we figure out what to do with this

gandronchik · 2022-06-10T09:51:35Z

@alamb Hello! Sorry for the long response.

I am sorry for so big PR with so a bad description.

Now I try to explain what is happening here.
Honestly, I made mistake with the naming. I supported Set Returning Function. (https://www.postgresql.org/docs/current/functions-srf.html)

As I know DataFunction is oriented on PostgreSQL behavior. So, the functionality I provide here is Postgres functionality.

We already use it in Cube.js. We implemented a several functions:

generate_series (https://www.postgresql.org/docs/current/functions-srf.html)
generate_subscripts (https://www.postgresql.org/docs/current/functions-srf.html)
unnest (https://www.postgresql.org/docs/current/functions-array.html)

Please, look at my PR closer. I am ready to improve it, rename some structures, etc.

Bellow, I provide the implementation of generate_series function (real Postgres function):

macro_rules! generate_series_udtf {
    ($ARGS:expr, $TYPE: ident, $PRIMITIVE_TYPE: ident) => {{
        let mut section_sizes: Vec<usize> = Vec::new();
        let l_arr = &$ARGS[0].as_any().downcast_ref::<PrimitiveArray<$TYPE>>();
        if l_arr.is_some() {
            let l_arr = l_arr.unwrap();
            let r_arr = downcast_primitive_arg!($ARGS[1], "right", $TYPE);
            let step_arr = PrimitiveArray::<$TYPE>::from_value(1 as $PRIMITIVE_TYPE, 1);
            let step_arr = if $ARGS.len() > 2 {
                downcast_primitive_arg!($ARGS[2], "step", $TYPE)
            } else {
                &step_arr
            };

            let mut builder = PrimitiveBuilder::<$TYPE>::new(1);
            for (i, (start, end)) in l_arr.iter().zip(r_arr.iter()).enumerate() {
                let step = if step_arr.len() > i {
                    step_arr.value(i)
                } else {
                    step_arr.value(0)
                };

                let start = start.unwrap();
                let end = end.unwrap();
                let mut section_size: i64 = 0;
                if start <= end && step > 0 as $PRIMITIVE_TYPE {
                    let mut current = start;
                    loop {
                        if current > end {
                            break;
                        }
                        builder.append_value(current).unwrap();

                        section_size += 1;
                        current += step;
                    }
                }
                section_sizes.push(section_size as usize);
            }

            return Ok((Arc::new(builder.finish()) as ArrayRef, section_sizes));
        }
    }};
}

pub fn create_generate_series_udtf() -> TableUDF {
    let fun = make_table_function(move |args: &[ArrayRef]| {
        assert!(args.len() == 2 || args.len() == 3);

        if args[0].as_any().downcast_ref::<Int64Array>().is_some() {
            generate_series_udtf!(args, Int64Type, i64)
        } else if args[0].as_any().downcast_ref::<Float64Array>().is_some() {
            generate_series_udtf!(args, Float64Type, f64)
        }

        Err(DataFusionError::Execution(format!("Unsupported type")))
    });

    let return_type: ReturnTypeFunction = Arc::new(move |tp| {
        if tp.len() > 0 {
            Ok(Arc::new(tp[0].clone()))
        } else {
            Ok(Arc::new(DataType::Int64))
        }
    });

    TableUDF::new(
        "generate_series",
        &Signature::one_of(
            vec![
                TypeSignature::Exact(vec![DataType::Int64, DataType::Int64]),
                TypeSignature::Exact(vec![DataType::Int64, DataType::Int64, DataType::Int64]),
                TypeSignature::Exact(vec![DataType::Float64, DataType::Float64]),
                TypeSignature::Exact(vec![
                    DataType::Float64,
                    DataType::Float64,
                    DataType::Float64,
                ]),
            ],
            Volatility::Immutable,
        ),
        &return_type,
        &fun,
    )
}

alamb · 2022-06-12T09:51:03Z

Thanks @gandronchik -- I will try and find time to re-review this PR over the next few days in light of the information above.

gandronchik · 2022-06-26T15:47:04Z

Thanks @gandronchik -- I will try and find time to re-review this PR over the next few days in light of the information above.

@alamb Hello! Have you had already time to check the PR?

alamb · 2022-06-28T19:06:39Z

@alamb Hello! Have you had already time to check the PR?

Hi @gandronchik sadly I have not had a chance. I apologize for my lack of bandwidth but it is hard to find sufficient contiguous time to review such large PRs when I don't have the background context.

My core problem is that I don't understand (despite your admirable attempts to clarify) what this PR is trying to implement, so it is very hard to evaluate the code to see if it is implementing what is desired (because I don't understand what is desired).

For example, all the examples of "set returning functions" in the links you shared in postgres appear to use those functions as elements in the FROM clause. For example,

select * from unnest(ARRAY[1,2], ARRAY['foo','bar','baz']) as x(a,b) →

So I am struggling to understand examples you share in the PR's description that show using these functions in combination with a column 🤔

select table_fun(1, col) from (select 2 col union all select 3 col) t;

So what would you think about implementing more general user defined table functions (that can return RecordBatches / streams as we have discussed above)? I think others would also likely use such functionality and it seems like it would satisfy the usecases from cube.js (?)

gandronchik · 2022-06-29T09:02:54Z

@alamb Hello! I think it will be easier to understand what I implemented here if you check how generate_series function works in Postgres. Just try to call the following requests:

1. select generate_series(1, 5);

2. select generate_series(1, n) from (select 2 n union all select 3 n) x;

3. select n, generate_series(1, n) from (select 2 n union all select 3 n) x;

4. select col from generate_series(1, 5) fun(col);

Before these changes, DataFusion had only udf (returns only one row per each input row) and udaf (returns one row per any count of input rows). My changes allow to return multiply rows per each input row.

alamb · 2023-01-14T11:25:18Z

This PR is more than 6 month old, so closing it down for now to clean up the PR list. Please reopen if this is a mistake and you plan to work on it more

github-actions bot added the datafusion Changes in the datafusion crate label Apr 8, 2022

xudong963 reviewed Apr 9, 2022

View reviewed changes

xudong963 added the enhancement New feature or request label Apr 9, 2022

gandronchik requested a review from xudong963 April 12, 2022 14:02

alamb reviewed Apr 13, 2022

View reviewed changes

gandronchik closed this Apr 27, 2022

gandronchik deleted the support-udtf branch April 27, 2022 10:37

gandronchik restored the support-udtf branch April 27, 2022 11:49

gandronchik reopened this Apr 27, 2022

github-actions bot added the ballista label Apr 27, 2022

gandronchik requested a review from alamb April 27, 2022 14:13

gandronchik and others added 10 commits April 27, 2022 23:25

udtf support

d8d91a2

udtf support - minor refactor

a16cdee

chore: udtf in progress

d84ef1c

chore: udtf all types support

64ee981

chore: udtf logical planning

8d13284

chore: refactor

814aa96

chore: udtf result - Vec<ArrayRef> to ArrayRef + indexes

9107b6b

chore: refactor

58eb2cc

chore: fix udtf column map

d18807f

Table function fixes

75c070b

paveltiunov and others added 6 commits April 28, 2022 00:25

Table function fixes: empty table fun result

0b7a4a6

Table function fixes: expose build_table_udf_schema

86adb81

Table function fixes: change signature to ArrayRef as table function …

c72bef0

…always returns array

Table function fixes: clippy fixes

c548400

chore: fix udtf schema / support select * from udtf

a9c1b9e

chore: refactor

e979728

gandronchik force-pushed the support-udtf branch from a0b7728 to e979728 Compare April 27, 2022 16:55

alamb changed the title ~~udtf support~~ User Defined Table Function (udtf) support Apr 27, 2022

alamb mentioned this pull request Apr 27, 2022

[Question] how to add expr inline #2330

Closed

Ted-Jiang reviewed Apr 28, 2022

View reviewed changes

alamb reviewed Apr 28, 2022

View reviewed changes

gandronchik requested a review from alamb May 24, 2022 06:53

ovr mentioned this pull request Jun 2, 2022

Proposal: remove automated ballista CI checks from DataFusion #2679

Closed

andygrove removed the datafusion Changes in the datafusion crate label Jun 3, 2022

alamb marked this pull request as draft June 7, 2022 17:22

scsmithr mentioned this pull request Jan 9, 2023

Add concept of connections GlareDB/glaredb#459

Closed

2 tasks

alamb closed this Jan 14, 2023

alamb mentioned this pull request Oct 23, 2023

add sql STRING_AGG function #7910

Open

	//! UDTF support
	//! User Defined Table Function (UDTF) support

	//! Udtf module contains foundational types that are used to represent UDTFs in DataFusion.
	//! Contains foundational types that are used to represent User Defined Table Functions (UDTFs) in DataFusion.

User Defined Table Function (udtf) support #2177

User Defined Table Function (udtf) support #2177

Uh oh!

Conversation

gandronchik commented Apr 8, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

xudong963 commented Apr 9, 2022

Uh oh!

doki23 commented Apr 10, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

doki23 commented Apr 14, 2022

Uh oh!

thinkharderdev commented Apr 15, 2022

Uh oh!

alamb commented Apr 15, 2022

Uh oh!

gandronchik commented Apr 15, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

thinkharderdev commented Apr 15, 2022

Uh oh!

Ted-Jiang commented Apr 26, 2022

Uh oh!

alamb commented Apr 26, 2022

Uh oh!

alamb commented Apr 27, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

doki23 Apr 29, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

doki23 commented Apr 29, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alamb commented Apr 29, 2022

Uh oh!

gandronchik commented May 5, 2022

Uh oh!

alamb commented May 24, 2022

Uh oh!

alamb commented Jun 7, 2022

Uh oh!

gandronchik commented Apr 8, 2022 •

edited

Loading

doki23 commented Apr 10, 2022 •

edited

Loading

gandronchik commented Apr 15, 2022 •

edited

Loading

doki23 Apr 29, 2022 •

edited

Loading

doki23 commented Apr 29, 2022 •

edited

Loading

gandronchik commented Jun 29, 2022 •

edited

Loading