Skip to content

RowSelection::and_then is slow #7458

Open
@alamb

Description

@alamb

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

This ticket records the symptoms reported by @mbutrovich in (discord) where they see inconsistent performance. It appears the root cause is allocations related to computing the RowSelection to evaluate multiple predicates:

In our case it's currently RowSelection::and_then, so I'm trying to make sense of that function and see if there's a more efficient way to go about it other than the iter().cloned() over both inputs, mutating those, and building the output one element at a time

i was wondering about the better representation of Vec

I'm coming at Rust from C and C++, and a struct with a uint64 and a bool stuck on teh end is just gonna end up aligned to 64 bits with a bunch of padding on the end between each one. Is Rust going to do something similar?

Background:

RowSelection::and_then is used to combine the results of multiple ArrowPredicates in a RowFilter -- see source:

Here is the code for RowSelection::and_then.

Describe the solution you'd like
I would like the combination of multiple RowSelections to go faster

Describe alternatives you've considered
Some suggestions from @Dandandan in discord:

selectors can reduce allocations in from log(N) to 1 allocations using Vec::with_capacity(len_left + len_right)
Alternatively: the self.selectors allocation probably could be reused for the new one
Any better way to represent Vec ?

Here is one idea for better representing RowSelection instead of Vec<RowSelector>

Additional context

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementAny new improvement worthy of a entry in the changelogparquetChanges to the parquet crate

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions