-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Remove Wildcard
from Expr
#7765
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@waynexia -- I agree with your points about Concretely, what would be the equivalent to the following? ctx.table("alltypes_plain")
.await?
.select(vec![count(Expr::Wildcard)]) BTW maybe we could improve the situation by adding more documentation to the |
That's the point 👍 For The lucky thing is
Makes sense, I'll add it first. |
|
BTW it's clear some expressions are mere hacks, not real expressions. It's enough to look at datafusion/datafusion/expr/src/expr_schema.rs Lines 214 to 218 in 68e372f
|
I agree As long as we also have some way to represent the same thing in the expr_fn API I don't see any reason that we need to have |
I don't think we use wildcard for count in datafusion, As long as we have alternative representation of wildcard (i.e. None in projection expressions) then removing it makes sense to me. |
From a user perspective, my opinion is that representing wildcard with
How would we differentiate qualified vs. unqualified wildcards with Just to think about alternatives, what if we kept We could also think about a type safe way to represent this, but I think that would come with significant breaking changes. |
Here is a PR that marks Expr::Wildcard deprecated (thanks @linhr 🙏 ) |
will be undefined in some future datafusion release apache/datafusion#7765
Is your feature request related to a problem or challenge?
Expr::Wildcard
andExpr::QualifiedWildcard
are expressions that reference all columns. But it seems redundant that we don't need a special expr type to do that. This issue proposes to remove these two expr kinds.Describe the solution you'd like
Wildcard (
*
) can be expanded to concrete column lists when it appears. This manner seems viable in the three most common use cases I can come up with:select * from table
will generate a projection with all fields, rather than a wildcard expr.LogicalPlanBuilder
and (3) fromDataFrame
. In some aspects, these two entrances are the same. And both are strong typed (or strong schema-ed), which allows us to expand the wildcard immediately using the schema from current stage.Besides this,
Expr::Wildcard
is not properly handled in the codebase, because it's not a "first class" expr. Take some functions as examples:expr_to_columns
: from the correctness aspect, it should also count columns referenced byExpr::Wildcard
andExpr::QualifiedWildcard
.create_physical_name
: this function requires all the wildcards to be expanded before calling it.And I find another issue that related to
Wildcard
: #5473. The solution is to add an optimizer rule that expands all wildcards. This proposal is going to do something similar but in a more eager way.Describe alternatives you've considered
No response
Additional context
Since
Expr
is widely used, we may need several versions to deprecate it (depending on the compatibility rule). We may ship this change step by step before fully removing these two variantsExpr::Wildcard
andExpr::QualifiedWildcard
as#[deprecated]
The text was updated successfully, but these errors were encountered: