Skip to content

Consider enabling other indices than usize #166

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
workingjubilee opened this issue Sep 24, 2021 · 13 comments
Open

Consider enabling other indices than usize #166

workingjubilee opened this issue Sep 24, 2021 · 13 comments
Labels
C-feature-request Category: a feature request, i.e. not implemented / a PR

Comments

@workingjubilee
Copy link
Member

This would minimize contortions for casting between various index sizes.

See: rust-lang/rust#89193 (comment)

Since there is a bit of chatter here on the use cases for performing a scatter or gather for indexes, I can give a little bit of context, since I do think it is somewhat informative.

In this case I am working on a columnar store. You can kind of think of the use case as a secondary index over columnar data. For just a single primary index, you have a list of row_ids Simd<usize, LANES> and you can perform a simd gather instruction to get all of the values at those indexes. However, if you have a secondary index the simd gather from that secondary index needs to produce a Simd<usize, LANES>, so that can be passed directly into the second of the gather functions over the primary keys. In this way you go from needed 2*LANES loads to perform a secondary index lookup over data to only needing 2 simd operations.

Alternatively it would be nice if the Simd libraries were updated to perform gathers using indexes as Simd<u32, LANES> or Simd<u64, LANES>. That would dodge this issue altogether and it would mean less copies in our columnar store. Many times columnar stores you keep offsets as u32 (or even smaller) to keep things compressed. And it is nice to be able to act directly on those indices rather than performing intermediate conversions to usize.

@workingjubilee workingjubilee added the C-feature-request Category: a feature request, i.e. not implemented / a PR label Sep 24, 2021
@calebzulawski
Copy link
Member

calebzulawski commented Sep 24, 2021

Since in LLVM it's just using pointers, anyway, we can probably allow it to take any unsigned integer and the optimizer will handle it from there. I don't believe this needs any rustc support.

@dyv
Copy link

dyv commented Sep 26, 2021

This would be awesome. I just ran into another case where I was operating on dictionary encoded values (flexibly encoded as u8 or u16), and then needed to convert into usize to perform lookups into the dictionary.

@workingjubilee
Copy link
Member Author

workingjubilee commented Sep 27, 2021

Doing it for u8 and u16 seems trivial since those both have Into<usize>.
I think my only realistic concern is a 32-bit arch that supports 128-bit vectors and people using indices of u64.

@calebzulawski
Copy link
Member

It would probably be reasonable to provide a wrapping int conversion function

@workingjubilee
Copy link
Member Author

In practice, the underlying API uses u32. Hrm...

@calebzulawski
Copy link
Member

I think it's best to hide that it's u32, since that's pretty arbitrary IMO. usize makes the most sense, I think

@workingjubilee
Copy link
Member Author

Wait, I am getting swizzle and gather confused here. llvm's shufflevector uses u32, and llvm.masked.gather uses pointers. Gah!

@pro465
Copy link
Contributor

pro465 commented Oct 25, 2021

Doing it for u8 and u16 seems trivial since those both have Into<usize>. I think my only realistic concern is a 32-bit arch that supports 128-bit vectors and people using indices of u64.

i think maybe we should not allow those and only allow types which do implement Into<usize> 🤔

@bjorn3
Copy link
Member

bjorn3 commented Oct 25, 2021

Only u8 and u16 implement Into<usize>. u32 and u64 don't as they would need to truncate on platforms with a small pointer size. Rust supports 16bit platforms, but not 8bit platforms so the minimal usize size is 16bit.

@pro465
Copy link
Contributor

pro465 commented Oct 25, 2021

@bjorn3 i dont think we should care about u32 and u64. after all, if std goes as far as implementing indexing only for usize for slices, we are already too much "forgiving" in implementing indexing for T: Into<usize>, why should we be more "forgiving"?

@workingjubilee
Copy link
Member Author

The choice of usize is actually arbitrary here. There is no immediately obvious reason to not use u32 as the upper bound and accept Into<u32>. The design is intended to allow us to make maximal usage of the actual machines, so that is the reason why: we get to choose between things like this in our design.

@calebzulawski
Copy link
Member

I disagree that it's arbitrary--usize is the correct type for indexing arrays, which is effectively all gather is. I personally don't think we should overcomplicate the interface, and shouldn't use any tricks with Into. Once we have casts analogous to as (#116) it should be pretty painless to use any integer type.

@scottmcm
Copy link
Member

usize is the type for indexing arrays (unfortunately; I wish we had a smaller index type), but u32 is the type for "indexing" bits. I think I could justifiably argue for either in SIMD -- thinking of lanes in simd ops like bits in bitwise ops also seems reasonable to me.

(Though really we'll probably keep things usize as the primary option because otherwise as_array and friends look like they'd be a huge mess.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-feature-request Category: a feature request, i.e. not implemented / a PR
Projects
None yet
Development

No branches or pull requests

6 participants