fix: correct schema type checking in native_iceberg_compat #1755

parthchandra · 2025-05-20T00:02:52Z

Which issue does this PR close?

Part of #1542

Closes #.

Rationale for this change

This addresses test failures due to incompatibilities in schema conversion and validation between Spark and the native_iceberg_compat Scan
The implementation now does nearly identical conversion and checking to Spark. Keeping with the original native_comet implementation, the changes duplicates some code from Spark.

parthchandra · 2025-05-20T00:03:45Z

@mbutrovich @andygrove Spark test fixes for native_iceberg_compat

codecov-commenter · 2025-05-20T01:21:27Z

Codecov Report

Attention: Patch coverage is 0% with 54 lines in your changes missing coverage. Please review.

Project coverage is 58.49%. Comparing base (f09f8af) to head (de8c336).
Report is 207 commits behind head on main.

Files with missing lines	Patch %	Lines
...va/org/apache/comet/parquet/NativeBatchReader.java	0.00%	54 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff              @@
##               main    #1755      +/-   ##
============================================
+ Coverage     56.12%   58.49%   +2.37%     
- Complexity      976     1131     +155     
============================================
  Files           119      130      +11     
  Lines         11743    12684     +941     
  Branches       2251     2363     +112     
============================================
+ Hits           6591     7420     +829     
- Misses         4012     4078      +66     
- Partials       1140     1186      +46

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

andygrove · 2025-05-20T14:04:43Z

common/src/main/java/org/apache/comet/parquet/NativeBatchReader.java

+    conf.set("spark.sql.parquet.binaryAsString", "false");
+    conf.set("spark.sql.parquet.int96AsTimestamp", "false");
+    conf.set("spark.sql.caseSensitive", "false");
+    conf.set("spark.sql.parquet.inferTimestampNTZ.enabled", "true");
+    conf.set("spark.sql.legacy.parquet.nanosAsLong", "false");


If we are mutating these configs, do we need to restore them to the original value at some point?

No I don't think so. native_comet appears to do the same.

andygrove · 2025-05-20T14:05:03Z

common/src/main/java/org/apache/comet/parquet/NativeBatchReader.java

-          if (!isEqual(field, optFileField.get())) {
-            throw new UnsupportedOperationException("Schema evolution is not supported");
-          }
+          // This makes the same check as Spark's VectorzedParquetReader


Suggested change

// This makes the same check as Spark's VectorzedParquetReader

// This makes the same check as Spark's VectorizedParquetReader

andygrove

I'm not very familiar with some of this Parquet logic, but LGTM

parthchandra · 2025-05-20T16:24:11Z

Thanks for the initial review Andy. I've changed this to draft while I investigate the ci failures.

parthchandra · 2025-05-20T20:52:19Z

Open for review again.

common/src/main/scala/org/apache/spark/sql/comet/parquet/CometParquetReadSupport.scala

spark/src/test/scala/org/apache/comet/parquet/ParquetReadSuite.scala

andygrove · 2025-05-21T12:59:34Z

spark/src/test/scala/org/apache/comet/parquet/ParquetReadSuite.scala

+              if (enableSchemaEvolution || CometConf.COMET_NATIVE_SCAN_IMPL
+                  .get(conf)
+                  .equals(CometConf.SCAN_NATIVE_DATAFUSION)) {


nit: can just use == rather than .equals

andygrove · 2025-05-21T17:21:07Z

@parthchandra which Spark SQL tests does this PR help with?

…ution support

kazuyukitanimura

pending with CI

parthchandra · 2025-05-23T15:41:26Z

Merged. Thanks @kazuyukitanimura @andygrove

parthchandra requested a review from andygrove May 20, 2025 00:03

parthchandra changed the title ~~fix : correct schema type checking in native_iceberg_compat~~ fix: correct schema type checking in native_iceberg_compat May 20, 2025

andygrove reviewed May 20, 2025

View reviewed changes

andygrove approved these changes May 20, 2025

View reviewed changes

parthchandra marked this pull request as draft May 20, 2025 16:23

parthchandra marked this pull request as ready for review May 20, 2025 20:51

kazuyukitanimura reviewed May 20, 2025

View reviewed changes

common/src/main/scala/org/apache/spark/sql/comet/parquet/CometParquetReadSupport.scala Outdated Show resolved Hide resolved

spark/src/test/scala/org/apache/comet/parquet/ParquetReadSuite.scala Outdated Show resolved Hide resolved

andygrove reviewed May 21, 2025

View reviewed changes

parthchandra added 6 commits May 22, 2025 16:00

fix : correct schema type checking in native_iceberg_compat

a160400

Remove unused

fed4826

spotless

2776b40

native_iceberg_compat is now identical to native_comet in schema evol…

b010a3c

…ution support

spelling

0e52321

review comments

de8c336

parthchandra force-pushed the complex-sql-test3-rebased branch from 88979e9 to de8c336 Compare May 22, 2025 23:01

kazuyukitanimura approved these changes May 23, 2025

View reviewed changes

parthchandra merged commit 15bbadb into apache:main May 23, 2025
83 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: correct schema type checking in native_iceberg_compat #1755

fix: correct schema type checking in native_iceberg_compat #1755

Uh oh!

parthchandra commented May 20, 2025

Uh oh!

parthchandra commented May 20, 2025

Uh oh!

codecov-commenter commented May 20, 2025 •

edited

Loading

Uh oh!

andygrove May 20, 2025

Uh oh!

parthchandra May 20, 2025

Uh oh!

andygrove May 20, 2025

Uh oh!

andygrove left a comment

Uh oh!

parthchandra commented May 20, 2025

Uh oh!

parthchandra commented May 20, 2025

Uh oh!

Uh oh!

Uh oh!

andygrove May 21, 2025

Uh oh!

parthchandra May 21, 2025

Uh oh!

andygrove commented May 21, 2025

Uh oh!

kazuyukitanimura left a comment

Uh oh!

Uh oh!

parthchandra commented May 23, 2025

Uh oh!

Uh oh!

	// This makes the same check as Spark's VectorzedParquetReader
	// This makes the same check as Spark's VectorizedParquetReader

fix: correct schema type checking in native_iceberg_compat #1755

fix: correct schema type checking in native_iceberg_compat #1755

Uh oh!

Conversation

parthchandra commented May 20, 2025

Which issue does this PR close?

Rationale for this change

Uh oh!

parthchandra commented May 20, 2025

Uh oh!

codecov-commenter commented May 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

andygrove May 20, 2025

Choose a reason for hiding this comment

Uh oh!

parthchandra May 20, 2025

Choose a reason for hiding this comment

Uh oh!

andygrove May 20, 2025

Choose a reason for hiding this comment

Uh oh!

andygrove left a comment

Choose a reason for hiding this comment

Uh oh!

parthchandra commented May 20, 2025

Uh oh!

parthchandra commented May 20, 2025

Uh oh!

Uh oh!

Uh oh!

andygrove May 21, 2025

Choose a reason for hiding this comment

Uh oh!

parthchandra May 21, 2025

Choose a reason for hiding this comment

Uh oh!

andygrove commented May 21, 2025

Uh oh!

kazuyukitanimura left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

parthchandra commented May 23, 2025

Uh oh!

Uh oh!

codecov-commenter commented May 20, 2025 •

edited

Loading