Skip to content

Add Scalar / Datum support to compute kernels #1047

Closed
@alamb

Description

@alamb

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

When implementing analytics, users often want to do operations like array + constant or array + array

The Rust implementation of Arrow is strongly typed 🙌 , including the compute kernels, but often data is passed around in Arc<dyn Array> / ArrayRef so the types can be dynamic. This then places the burden on the user of the library to determine the exact types of the inputs in order to call the appropriate kernels

Let's take for example, trying to compare an array to find all elements that contain the number 5 and your input an array may have Int64 or Int32 values

In the current version of arrow, in order to do so you would need to write code like

use arrow::compute::eq_scalar;

fn find_5(array: ArrayRef) -> Result<BooleanArray>{
  match array.data_type() {
    DataType::Int64 => eq_scalar(array.as_any().downcast_ref::<Int64Array>().unwrap(), 5),
    DataType::UInt64 => eq_scalar(array.as_any().downcast_ref::<UInt64Array>().unwrap(), 5),
    ...
  }
}

This ends up being macroized and is non ideal because the user had to do dynamic type dispatch anyways, so there is no runtime performance benefit to strongly typed kernels

Describe the solution you'd like

It would be nice to be able to call a single function and let the rust library dynamically (at runtime) pick the correct kernel to call.

As described and suggested by @jorgecarleitao and @nevi-me in #984 (comment) this could go all the way and take the form of following the C++ Datum model

fn eq(left: Datum, right: Datum) -> BooleanArray {
...
}

Where Datum looks something like

enum Datum {
  Scalar(ScalarValue),
  Array(ArrayRef)
}

And ScalarValue which would look something like ScalarValue from DataFusion https://github.com/apache/arrow-datafusion/blob/master/datafusion/src/scalar.rs or Scalar from arrow2 https://github.com/jorgecarleitao/arrow2/tree/main/src/scalar

Describe alternatives you've considered
The alternative is a continued proliferation of individual kernels such as eq, eq_utf8, etc

Additional context
There is a lot of additional discussion on #984

cc @matthewmturner @Dandandan @jimexist @houqp

Metadata

Metadata

Assignees

Labels

enhancementAny new improvement worthy of a entry in the changelog

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions