-
Notifications
You must be signed in to change notification settings - Fork 205
fix: correct schema type checking in native_iceberg_compat #1755
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: correct schema type checking in native_iceberg_compat #1755
Conversation
@mbutrovich @andygrove Spark test fixes for native_iceberg_compat |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1755 +/- ##
============================================
+ Coverage 56.12% 58.49% +2.37%
- Complexity 976 1131 +155
============================================
Files 119 130 +11
Lines 11743 12684 +941
Branches 2251 2363 +112
============================================
+ Hits 6591 7420 +829
- Misses 4012 4078 +66
- Partials 1140 1186 +46 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
conf.set("spark.sql.parquet.binaryAsString", "false"); | ||
conf.set("spark.sql.parquet.int96AsTimestamp", "false"); | ||
conf.set("spark.sql.caseSensitive", "false"); | ||
conf.set("spark.sql.parquet.inferTimestampNTZ.enabled", "true"); | ||
conf.set("spark.sql.legacy.parquet.nanosAsLong", "false"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are mutating these configs, do we need to restore them to the original value at some point?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No I don't think so. native_comet
appears to do the same.
if (!isEqual(field, optFileField.get())) { | ||
throw new UnsupportedOperationException("Schema evolution is not supported"); | ||
} | ||
// This makes the same check as Spark's VectorzedParquetReader |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// This makes the same check as Spark's VectorzedParquetReader | |
// This makes the same check as Spark's VectorizedParquetReader |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not very familiar with some of this Parquet logic, but LGTM
Thanks for the initial review Andy. I've changed this to draft while I investigate the ci failures. |
Open for review again. |
common/src/main/scala/org/apache/spark/sql/comet/parquet/CometParquetReadSupport.scala
Outdated
Show resolved
Hide resolved
spark/src/test/scala/org/apache/comet/parquet/ParquetReadSuite.scala
Outdated
Show resolved
Hide resolved
if (enableSchemaEvolution || CometConf.COMET_NATIVE_SCAN_IMPL | ||
.get(conf) | ||
.equals(CometConf.SCAN_NATIVE_DATAFUSION)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can just use ==
rather than .equals
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@parthchandra which Spark SQL tests does this PR help with? |
88979e9
to
de8c336
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pending with CI
Merged. Thanks @kazuyukitanimura @andygrove |
Which issue does this PR close?
Part of #1542
Closes #.
Rationale for this change
This addresses test failures due to incompatibilities in schema conversion and validation between Spark and the native_iceberg_compat Scan
The implementation now does nearly identical conversion and checking to Spark. Keeping with the original native_comet implementation, the changes duplicates some code from Spark.