Skip to content

DynScalar abstraction (something that makes it easy to create scalar Datums) #4781

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
alamb opened this issue Sep 5, 2023 · 8 comments · Fixed by #4793
Closed

DynScalar abstraction (something that makes it easy to create scalar Datums) #4781

alamb opened this issue Sep 5, 2023 · 8 comments · Fixed by #4793
Labels
arrow Changes to the arrow crate enhancement Any new improvement worthy of a entry in the changelog

Comments

@alamb
Copy link
Contributor

alamb commented Sep 5, 2023

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
There is code like this in IOx that compare a value to 0

https://github.com/influxdata/influxdb_iox/blob/2a71fcbc76a695b1129895932746084a0a258bab/iox_query_influxql/src/window/non_negative.rs#L59-L60

This uses the now deprecated ly_dyn_scalar kernel

I would like to update to use the non deprecated kernel lt: https://docs.rs/arrow/latest/arrow/compute/kernels/cmp/fn.lt.html

However, to do so requires a "Datum" and thus I need some way to create a typed Datum for zero

Something like this is also needed to turn a ScalarValue into a Datum I think in DataFusion

Describe the solution you'd like
I don't really have a strong opinion, I just want something that will work reasonably for this usecase.

Describe alternatives you've considered
I found that arrow crate uses a function and Scalar to do this: https://docs.rs/arrow-ord/46.0.0/src/arrow_ord/comparison.rs.html#36-221 however the Scalar appears to be strongly typed which makes it hard to use with a dynamically typed kernel that takes &dyn Datum (aka is "type erased") without changing all callsites.

Here is one possibility to perhaps expose the code that Arrow uses internally:

/// A dynamic Datum that wraps an array and represents a single value
#[derive(Debug, Clone)]
pub struct ScalarDatum
{
    arr: ArrayRef,
}

impl ScalarDatum {
    /// Create a [`ScalarDatum`] of type `d` that represents the primtive scalar value `scalar`
    fn try_new_primitive<T: num::ToPrimitive + std::fmt::Debug>(
        d: &DataType,
        scalar: T,
    ) -> Result<Self, ArrowError> {
        Ok(Self {
            arr: make_primitive_scalar_array(d, scalar)?,
        })
    }
}

impl Datum for ScalarDatum {
    ...
}

Additional context
Came up while updating IOx: https://github.com/influxdata/influxdb_iox/pull/8577

Note I may misunderstand the API so any help here would be apprecaited

@alamb alamb added the enhancement Any new improvement worthy of a entry in the changelog label Sep 5, 2023
@alamb alamb changed the title DynScalar abstraction (or something that makes it easy to create scalar values) DynScalar abstraction (something that makes it easy to create scalar Datums) Sep 5, 2023
@tustvold
Copy link
Contributor

tustvold commented Sep 5, 2023

This appears to be reinstating an API that was explicitly deprecated and removed... #2837

@alamb
Copy link
Contributor Author

alamb commented Sep 5, 2023

This appears to be reinstating an API that was explicitly deprecated and removed... #2837

I would phrase it as "the deprecated API has a real usecase which the new API does not address"

If there is some reason this shouldn't exist in arrow, I think what will happen is downstream crates will implement it (I was actually going to do so in IOx but I got stuck on #4780).

I am not sure what the concern about a dynamic scalar is given the kernels are now dynamically typed

@tustvold
Copy link
Contributor

tustvold commented Sep 5, 2023

dynamic scalar is given the kernels are now dynamically typed

It is the slightly funny type coercion that makes me uncomfortable, it seems strange to me that I'd be able to create a TimestampNanosecondArray from a float, or a u64, etc... A lesser point is that this won't work for Decimal256. I don't have a good solution here, I can appreciate the importance of making it easy to create scalars representing zero, and possibly other values like 1 or -1, I'm just not sure how best to express this

pub struct ScalarDatum

FWIW this could just be Scalar<ArrayRef>

https://github.com/influxdata/influxdb_iox/blob/2a71fcbc76a695b1129895932746084a0a258bab/iox_query_influxql/src/window/non_negative.rs#L59-L60

I appreciate this is not a general solution, but in this particular case we can see two lines above that there are only two possible output types, which could simply be enumerated. This would also likely be significantly faster.

Also not a great solution but you could just use ScalarValue::new_zero 😅

@alamb
Copy link
Contributor Author

alamb commented Sep 5, 2023

FWIW this could just be Scalar

That is good to know -- I did not realize that. Perhaps I will make a PR with some additional documentation about it in arrow-rs

Also not a great solution but you could just use ScalarValue::new_zero 😅

That would be just fine for this usecase -- but how can I use that? ScalarValue doesn't appear to implement Datum 🤔
https://docs.rs/datafusion/latest/datafusion/common/enum.ScalarValue.html

Or perhaps you are saying call https://docs.rs/datafusion/latest/datafusion/common/enum.ScalarValue.html#method.to_array and then make a Scalar<ArrayRef> 🤔 that could work

@tustvold
Copy link
Contributor

tustvold commented Sep 5, 2023

Or perhaps you are saying call

I believe there is a to_scalar method, possibly not released yet

@alamb
Copy link
Contributor Author

alamb commented Sep 7, 2023

Update here is that I could use ScalarValue::to_scalar successfully

So instead of

        let predicate = lt_dyn_scalar(&array, 0)?;

I did:

        let zero = ScalarValue::new_zero(array.data_type())?;
        let predicate = lt(&array, &zero.to_scalar())?;

Which seems to have worked nicely

@alamb
Copy link
Contributor Author

alamb commented Sep 7, 2023

Added some doc comments in DataFusion to make it easier to find: apache/datafusion#7491

@alamb
Copy link
Contributor Author

alamb commented Sep 7, 2023

FWIW this could just be Scalar

Here is a PR that improves the docs to encode this (it was what I was missing when I first filed this, I think): #4793

@alamb alamb added the arrow Changes to the arrow crate label Sep 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate enhancement Any new improvement worthy of a entry in the changelog
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants