Description
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
When implementing analytics, users often want to do operations like array
+ constant
or array + array
The Rust implementation of Arrow is strongly typed 🙌 , including the compute kernels, but often data is passed around in Arc<dyn Array>
/ ArrayRef
so the types can be dynamic. This then places the burden on the user of the library to determine the exact types of the inputs in order to call the appropriate kernels
Let's take for example, trying to compare an array to find all elements that contain the number 5
and your input an array may have Int64
or Int32
values
In the current version of arrow, in order to do so you would need to write code like
use arrow::compute::eq_scalar;
fn find_5(array: ArrayRef) -> Result<BooleanArray>{
match array.data_type() {
DataType::Int64 => eq_scalar(array.as_any().downcast_ref::<Int64Array>().unwrap(), 5),
DataType::UInt64 => eq_scalar(array.as_any().downcast_ref::<UInt64Array>().unwrap(), 5),
...
}
}
This ends up being macroized and is non ideal because the user had to do dynamic type dispatch anyways, so there is no runtime performance benefit to strongly typed kernels
Describe the solution you'd like
It would be nice to be able to call a single function and let the rust library dynamically (at runtime) pick the correct kernel to call.
As described and suggested by @jorgecarleitao and @nevi-me in #984 (comment) this could go all the way and take the form of following the C++ Datum
model
fn eq(left: Datum, right: Datum) -> BooleanArray {
...
}
Where Datum
looks something like
enum Datum {
Scalar(ScalarValue),
Array(ArrayRef)
}
And ScalarValue
which would look something like ScalarValue
from DataFusion https://github.com/apache/arrow-datafusion/blob/master/datafusion/src/scalar.rs or Scalar
from arrow2 https://github.com/jorgecarleitao/arrow2/tree/main/src/scalar
Describe alternatives you've considered
The alternative is a continued proliferation of individual kernels such as eq
, eq_utf8
, etc
Additional context
There is a lot of additional discussion on #984