Skip to content

Remove workaround for COUNT(*) in subquery decorrelation code #10553

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
alamb opened this issue May 17, 2024 · 3 comments · Fixed by #15050
Closed

Remove workaround for COUNT(*) in subquery decorrelation code #10553

alamb opened this issue May 17, 2024 · 3 comments · Fixed by #15050
Labels
enhancement New feature or request

Comments

@alamb
Copy link
Contributor

alamb commented May 17, 2024

Is your feature request related to a problem or challenge?

While working on #10500 I found reference to "the count" bug in the code but it wasn't clear it was tracked by any ticket

@comphead figured out #10500 (comment) that if the relevant workaround is disabled, then the following query is incorrect:

Running "subquery.slt"
External error: query result mismatch:
[SQL] SELECT t1_id, (SELECT count(*) FROM t2 WHERE t2.t2_int = t1.t1_int) from t1
[Diff] (-expected|+actual)
    11 1
-   22 0
+   22 NULL
    33 3
-   44 0
+   44 NULL
at test_files/subquery.slt:763

Describe the solution you'd like

Remove the workaround / handle the issue correctly

I am not quite sure what this means (maybe @mingmwang can provide more details if he has time)

Describe alternatives you've considered

No response

Additional context

No response

@alamb alamb added the enhancement New feature or request label May 17, 2024
@alamb alamb changed the title REmove workaround for 1COUNT(*)` in subquery decorrelation code REmove workaround for COUNT(*) in subquery decorrelation code May 17, 2024
@findepi findepi changed the title REmove workaround for COUNT(*) in subquery decorrelation code Remove workaround for COUNT(*) in subquery decorrelation code Feb 4, 2025
@suibianwanwank
Copy link
Contributor

Hi, @alamb. I described the "count-bug" bug in #15032. Maybe I can try to optimize this comment~

@suibianwanwank
Copy link
Contributor

@alamb I have submitted a PR. When you have time, greatly appreciate your review. As I am new to DataFusion, any feedback for improvement would be very helpful. Thank you.

@alamb
Copy link
Contributor Author

alamb commented Mar 7, 2025

For anyone following along, from @suibianwanwank's PR I wanted to post this link here:

The "count bug" was described in Optimization of Nested SQL Queries Revisited

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants