Migrate containers from Vec<(Tuple)>
to columnar containers
#54
+31
−31
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The
Relation<Tuple>
type wraps aVec<Tuple>
that is sorted and deduplicated. This is great, except for the opinion that it must be aVec<Tuple>
. This is a stronger opinion about the layout of the data than we require. By constrast, thecolumnar
crate is able to lay out sequences of structured types using large allocations of simple types. There are a few advantages here:Copy
types become less painful. Strings, JSON, other non-trivial types avoid allocating all over the place and inducing lots of random accesses.(Key, Val)
segregate the key information, which makes searching for keys that much easier. Only key data are brought in, and various forms of compression (e.g. run-length, for sorted data) can make the data that much smaller. Merging and matching keys is often a large source of work.(A, B, C, .. K)
is a k-tuple of containers, and one can reshape it into a container for((A, B), (C, .. K))
at essentially no cost. This allows a sorted list of tuples to be treated as a trie, and serve as indexes for any prefix of the sorted list. This avoids needing e.g. separate indexes for a(A, B)
collection on each of()
,(A)
, and(A, B)
. You'd need them onB
and(B, A)
additionally, but that would only be one additional relation/variable as well.NB: Draft for the time being; no action required. To be honest, I thought I was pointing at my fork and definitely don't mean to cause any work to happen looking at this yet. Oops. :D