Skip to content

Migrate containers from Vec<(Tuple)> to columnar containers #54

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

frankmcsherry
Copy link
Contributor

@frankmcsherry frankmcsherry commented May 30, 2025

The Relation<Tuple> type wraps a Vec<Tuple> that is sorted and deduplicated. This is great, except for the opinion that it must be a Vec<Tuple>. This is a stronger opinion about the layout of the data than we require. By constrast, the columnar crate is able to lay out sequences of structured types using large allocations of simple types. There are a few advantages here:

  1. Non-Copy types become less painful. Strings, JSON, other non-trivial types avoid allocating all over the place and inducing lots of random accesses.
  2. Containers for (Key, Val) segregate the key information, which makes searching for keys that much easier. Only key data are brought in, and various forms of compression (e.g. run-length, for sorted data) can make the data that much smaller. Merging and matching keys is often a large source of work.
  3. Reshaping containers is easy. The container for a k-tuple (A, B, C, .. K) is a k-tuple of containers, and one can reshape it into a container for ((A, B), (C, .. K)) at essentially no cost. This allows a sorted list of tuples to be treated as a trie, and serve as indexes for any prefix of the sorted list. This avoids needing e.g. separate indexes for a (A, B) collection on each of (), (A), and (A, B). You'd need them on B and (B, A) additionally, but that would only be one additional relation/variable as well.

NB: Draft for the time being; no action required. To be honest, I thought I was pointing at my fork and definitely don't mean to cause any work to happen looking at this yet. Oops. :D

@frankmcsherry
Copy link
Contributor Author

The plan is to start by migrating out the current structural requirements that e.g. we refer to data with &Val references, that data are in slices &[(K, V)], that we have vectors backing things. In columnar terms these are indexable containers, they tell you their index types, and that's all you need to know.

The first PR removes the requirement of &Val references from the Leaper and Leapers traits, also removing the 'leap lifetime, by retconning Val as instead &'leap Val if that's what you want. If you'd like a different Val, perhaps a copy type, or a GAT, or just a type with a lifetime inside it (e.g. a (&A, &B)) you now have that flex.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant