Skip to content

Minor: remove duplicated select #11424

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion datafusion/core/tests/dataframe/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -212,7 +212,6 @@ async fn test_count_wildcard_on_aggregate() -> Result<()> {
let sql_results = ctx
.sql("select count(*) from t1")
.await?
.select(vec![col("count(*)")])?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There seems to be a reason for doing it; I once wanted to remove it as well 😂.
See #10459 (comment)

Copy link
Contributor Author

@jayzhan211 jayzhan211 Jul 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, but I still don't get it.

Perhaps it meant to show how to select the count(*) again -- if so maybe we could update the tests to do something like this (rather than removing it entirely):

If it is just to show select twice can get the correct result, could we do count(wildcard()) instead of count(*)?

For count(*), it directly goes to PlanBuilder and miss the chance for ExprPlanner rewrite. IMO we should use count(wildcard()) instead but consider select(vec![col("count(*)")]) invalid 🤔

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the idea was to show that it is possible to select a column named "count(*)" after computing the actual count(*) value in a previous plan.

Projection(expr=[col("count(*)")]
  Aggregate(agg=[count(*))
   Scan(..)

If we can remove the code and the test still passes though clearly something is not working as intended.

Maybe it should be instead col("count(*)") + lit(1) so the expression doesn't get removed by the optimizer 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😕

.explain(false, false)?
.collect()
.await?;
Expand Down