-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Factor out Substrait consumers into separate files #15794
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Factor out Substrait consumers into separate files #15794
Conversation
4bfa3a8
to
d974ee1
Compare
Thanks! This has indeed been a long time todo :) also cc @vbarua I think personally I'd prefer a bit less files, but that's just a suggestion: I'd probably do something like:
If you prefer to keep things in smaller pieces, I'd suggest separating expressions and relations into their own folders, and considering combining at least things which are either tightly related (struct_type & named_struct -> types) or used by only one other file (like bound.rs -> window_function.rs). Thoughts? |
I have only a very slight preference for smaller pieces, but since I wasn’t on the front lines coding this, I trust your judgment much more. I’ll apply your suggestions for the next round, and we’ll see how things shape up from there. |
@Blizzara applied your suggestions. I think scoping expressions and relations into their own modules made a lot of sense. |
Nice! I didn't dive into the details of each file, but the high-level organisation makes sense to me and looks good 👍 Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor suggestions, but otherwise LGTM
The only real change is exposing the from_substrait_type
, otherwise refactor is not visible externally and consist only of moving code around.
datafusion/substrait/src/logical_plan/consumer/rex/extended_expression.rs
Outdated
Show resolved
Hide resolved
4872b9f
to
b015ed3
Compare
@alamb this PR should be ready for review I've checked that this PR consists purely of code moves with no functional changes, at this point in time, aside from This will likely conflict with #15854, or if that gets merged first we'll need to make sure this PR pulls in those changes. Full disclosure as well, I work with @gabotechs at DataDog |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @gabotechs and @vbarua -- I trust your reviews and I think this is a very nice improvement to the code maintainability.
I merged up and assuming the CI passes I'll merge this PR in |
Thanks again @vbarua and @gabotechs |
Which issue does this PR close?
Rationale for this change
The
consumer.rs
file grew a bit too big (~3400 LOC). Good thing is that it's easily splittable into separate files, each one responsible for converting one Substrait node into one DataFusion Logical plan node. With this change, people can just go to the file that they care about greatly reducing the amount of information that they need to deal with.What changes are included in this PR?
A refactor of Substrait
consumer.rs
file into multiple files following these rules:from_*
prefixed function responsible for converting one Substrait node into one DataFusion Logical plan node is now factored out into its own file named after the original Substrait node (e.g.cast.rs
,literal.rs
,aggregate_rel.rs
)utils.rs
filepub(super)
for functions that now need to be shared across different files.There's one subtle public API change that might be nice to keep:
That one should probably have been public from the beginning, and not exposing it now gets a bit messy as it's used outside of the
consume
module in theproducer.rs
tests, and hiding it to the outside would require introducing some #[cfg(test)] to the module definitions. I can hide though if people one a perfect no-api-change refactor.Are these changes tested?
yes, by current pipelines.
Are there any user-facing changes?
The
from_substrait_type
function is now exposed