Skip to content

fix: [native_iceberg_compat / native_datafusion] Fall back to Spark for maps containing structs #1764

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

andygrove
Copy link
Member

@andygrove andygrove commented May 21, 2025

Which issue does this PR close?

Workaround for #1754

Rationale for this change

Fix the following Spark SQL tests in core-2.

- Spark vectorized reader - without partition data column - select a single complex field from a map entry and in clause *** FAILED *** (290 milliseconds)
- Spark vectorized reader - with partition data column - select a single complex field from a map entry and in clause *** FAILED *** (269 milliseconds)
- Non-vectorized reader - without partition data column - select a single complex field from a map entry and in clause *** FAILED *** (263 milliseconds)
- Non-vectorized reader - with partition data column - select a single complex field from a map entry and in clause *** FAILED *** (322 milliseconds)
- Spark vectorized reader - without partition data column - select nested field from a complex map key using map_keys *** FAILED *** (315 milliseconds)
- Spark vectorized reader - with partition data column - select nested field from a complex map key using map_keys *** FAILED *** (278 milliseconds)
- Non-vectorized reader - without partition data column - select nested field from a complex map key using map_keys *** FAILED *** (261 milliseconds)
- Non-vectorized reader - with partition data column - select nested field from a complex map key using map_keys *** FAILED *** (312 milliseconds)
- Spark vectorized reader - without partition data column - select nested field from a complex map value using map_values *** FAILED *** (268 milliseconds)
- Spark vectorized reader - with partition data column - select nested field from a complex map value using map_values *** FAILED *** (262 milliseconds)
- Non-vectorized reader - without partition data column - select nested field from a complex map value using map_values *** FAILED *** (264 milliseconds)
- Non-vectorized reader - with partition data column - select nested field from a complex map value using map_values *** FAILED *** (308 milliseconds)
- Spark vectorized reader - without partition data column - SPARK-40033: Schema pruning support through element_at *** FAILED *** (554 milliseconds)
- Spark vectorized reader - with partition data column - SPARK-40033: Schema pruning support through element_at *** FAILED *** (510 milliseconds)
- Non-vectorized reader - without partition data column - SPARK-40033: Schema pruning support through element_at *** FAILED *** (525 milliseconds)
- Non-vectorized reader - with partition data column - SPARK-40033: Schema pruning support through element_at *** FAILED *** (488 milliseconds)

What changes are included in this PR?

How are these changes tested?

@andygrove andygrove marked this pull request as ready for review May 21, 2025 14:13
@andygrove andygrove changed the title fix: Fall back to Spark for maps containing complex types fix: [native_iceberg_compat] Fall back to Spark for maps containing complex types May 21, 2025
@andygrove andygrove changed the title fix: [native_iceberg_compat] Fall back to Spark for maps containing complex types fix: [native_iceberg_compat / native_datafusion] Fall back to Spark for maps containing complex types May 21, 2025
Copy link
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @andygrove

@andygrove andygrove changed the title fix: [native_iceberg_compat / native_datafusion] Fall back to Spark for maps containing complex types fix: [native_iceberg_compat / native_datafusion] Fall back to Spark for maps containing structs May 21, 2025
@codecov-commenter
Copy link

codecov-commenter commented May 21, 2025

Codecov Report

Attention: Patch coverage is 87.50000% with 1 line in your changes missing coverage. Please review.

Project coverage is 58.62%. Comparing base (f09f8af) to head (843f9e8).
Report is 205 commits behind head on main.

Files with missing lines Patch % Lines
.../main/scala/org/apache/comet/DataTypeSupport.scala 87.50% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #1764      +/-   ##
============================================
+ Coverage     56.12%   58.62%   +2.49%     
- Complexity      976     1138     +162     
============================================
  Files           119      130      +11     
  Lines         11743    12671     +928     
  Branches       2251     2367     +116     
============================================
+ Hits           6591     7428     +837     
- Misses         4012     4058      +46     
- Partials       1140     1185      +45     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Comment on lines +156 to +157
// https://github.com/apache/datafusion-comet/issues/1754
ignore("native reader - read MAP of value STRUCT fields") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now we are doing the data type support check for maps of struct. Dow we still have to ignore these tests?
If so, is there a way to ignore only for native datafusion/iceberg-compat?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test suite is specific to native datafusion/iceberg-compat scans. The fallback is just a temporary workaround until we have the real fix for #1754, then we can re-enable these tests.

@andygrove
Copy link
Member Author

One test keeps failing without any actual test failures. This is happening on other PRs as well.

Copy link
Contributor

@kazuyukitanimura kazuyukitanimura left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pending with CI

@andygrove andygrove marked this pull request as draft May 21, 2025 23:01
@andygrove
Copy link
Member Author

Moving this to draft because we may want to merge #1771 instead

@andygrove andygrove closed this May 23, 2025
@andygrove andygrove deleted the complex-map-fallback branch May 23, 2025 11:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants