Handle timestamp_ntz in delta and iceberg #647

vinishjail97 · 2025-02-12T08:06:16Z

Important Read

Please ensure the GitHub issue is mentioned at the beginning of the PR

What is the purpose of the pull request

Handle timestamp_ntz in delta target, ensured backwards compatibility by using the min and max writer version from deltaLog.snapshot()
Handle timestamp_ntz for iceberg target as well.

Brief change log

(for example:)

Handle timestamp_ntz in delta target
Handle timestamp_ntz for iceberg target as well

Verify this pull request

(Please pick either of the following options)

This change added tests and can be verified as follows:

(example:)

testTimestampNtz

xtable-core/src/main/java/org/apache/xtable/delta/DeltaConversionTarget.java

xtable-core/src/main/java/org/apache/xtable/delta/DeltaSchemaExtractor.java

emilie-wang · 2025-02-25T21:38:57Z

Hi I just came across to this PR, and if you planned to continue this fix, the target sources also need fix and they need to know how to deal with this timestamp_ntz type correctly: for example Iceberg: https://github.com/apache/incubator-xtable/blob/main/xtable-core/src/main/java/org/apache/xtable/iceberg/IcebergSchemaExtractor.java#L176

vinishjail97 · 2025-04-01T08:24:22Z

Hi I just came across to this PR, and if you planned to continue this fix, the target sources also need fix and they need to know how to deal with this timestamp_ntz type correctly: for example Iceberg: https://github.com/apache/incubator-xtable/blob/main/xtable-core/src/main/java/org/apache/xtable/iceberg/IcebergSchemaExtractor.java#L176

I will pick this up in a follow-up PR.

vinishjail97 · 2025-04-01T08:34:48Z

xtable-core/src/test/java/org/apache/xtable/delta/TestDeltaSync.java

+
+    TableFormatSync.getInstance()
+        .syncSnapshot(Collections.singletonList(conversionTarget), snapshot1);
+    // Delta standalone library can't read versions (3,7) and needs delta kernel dependency.


This needs delta-kernel upgrade as standalone doesn't support (3,7) versions in delta protocol.
#671

io.delta.standalone.internal.exception.DeltaErrors$InvalidProtocolVersionException: Delta protocol version (3,7) is too new for this version of Delta Standalone Reader/Writer (1,2). Please upgrade to a newer release. at io.delta.standalone.internal.DeltaLogImpl.assertProtocolRead(DeltaLogImpl.scala:214) at io.delta.standalone.internal.SnapshotImpl.<init>(SnapshotImpl.scala:244) at io.delta.standalone.internal.SnapshotManagement.createSnapshot(SnapshotManagement.scala:257) at io.delta.standalone.internal.SnapshotManagement.getSnapshotAtInit(SnapshotManagement.scala:224)

This is the exception we hit using latest version of delta-standalone library 3.3.0

Handled the validation using spark for now, upgrading to delta-kernel will be handled separately.

vinishjail97 · 2025-04-01T09:16:52Z

The integration tests for source format as HUDI and target format as DELTA are failing, because of the following exception.
Hudi writes parquet files with physicalType as INT64 and logicalType as TimestampType, but when reading the delta table fails because it's expecting a TimestampNTZType type.

Caused by: org.apache.spark.sql.execution.datasources.SchemaColumnConvertNotSupportedException: column: [timestamp_local_millis_nullable_field], physicalType: INT64, logicalType: timestamp_ntz
	at org.apache.spark.sql.execution.datasources.parquet.ParquetVectorUpdaterFactory.constructConvertNotSupportedException(ParquetVectorUpdaterFactory.java:1129)
	at org.apache.spark.sql.execution.datasources.parquet.ParquetVectorUpdaterFactory.getUpdater(ParquetVectorUpdaterFactory.java:191)
	at org.apache.spark.sql.execution.datasources.parquet.VectorizedColumnReader.readBatch(VectorizedColumnReader.java:175)
	at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:328)
	at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.nextKeyValue(VectorizedParquetRecordReader.java:219)
	at org.apache.spark.sql.execution.datasources.RecordReaderIterator.hasNext(RecordReaderIterator.scala:39)
	at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:125)
	at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:297)
	... 19 more

The test will fail for hudi 0.x and will pass when hudi is upgraded to 1.x.
https://github.com/apache/hudi/blob/master/hudi-spark-datasource/hudi-spark3.3.x/src/main/scala/org/apache/spark/sql/avro/AvroDeserializer.scala#L152

vinishjail97 · 2025-04-01T20:34:38Z

Hi I just came across to this PR, and if you planned to continue this fix, the target sources also need fix and they need to know how to deal with this timestamp_ntz type correctly: for example Iceberg: https://github.com/apache/incubator-xtable/blob/main/xtable-core/src/main/java/org/apache/xtable/iceberg/IcebergSchemaExtractor.java#L176

I will pick this up in a follow-up PR.

@emilie-wang I have fixed it for iceberg as well in this PR.

xtable-core/src/main/java/org/apache/xtable/avro/AvroSchemaConverter.java

xtable-core/src/main/java/org/apache/xtable/delta/DeltaSchemaExtractor.java

the-other-tim-brown · 2025-04-01T20:47:45Z

xtable-core/src/test/java/org/apache/xtable/delta/TestDeltaSync.java

+  public void testTimestampNtz() {
+    InternalSchema schema1 = getInternalSchemaWithTimestampNtz();
+    List<InternalField> fields2 = new ArrayList<>(schema1.getFields());
+    fields2.add(


Have you tested whether the write works if the initial schema contains the timestamp_ntz?

Yes in this test, schema1 contains ntz column and the second commit is adding a nullable float column.

vinishjail97 commented Feb 12, 2025

View reviewed changes

xtable-core/src/main/java/org/apache/xtable/delta/DeltaConversionTarget.java Outdated Show resolved Hide resolved

ashvina reviewed Feb 13, 2025

View reviewed changes

xtable-core/src/main/java/org/apache/xtable/delta/DeltaConversionTarget.java Outdated Show resolved Hide resolved

xtable-core/src/main/java/org/apache/xtable/delta/DeltaSchemaExtractor.java Show resolved Hide resolved

vinishjail97 force-pushed the timestamp-ntz branch from 214f1d6 to eafaf5e Compare April 1, 2025 08:21

vinishjail97 marked this pull request as ready for review April 1, 2025 08:21

vinishjail97 commented Apr 1, 2025

View reviewed changes

the-other-tim-brown reviewed Apr 1, 2025

View reviewed changes

xtable-core/src/main/java/org/apache/xtable/avro/AvroSchemaConverter.java Outdated Show resolved Hide resolved

the-other-tim-brown reviewed Apr 1, 2025

View reviewed changes

vinishjail97 changed the title ~~Handle timestamp_ntz in delta conversion target~~ Handle timestamp_ntz in delta and iceberg Apr 1, 2025

the-other-tim-brown approved these changes Apr 1, 2025

View reviewed changes

Handle timestamp_ntz in delta and iceberg

ff91b1d

vinishjail97 force-pushed the timestamp-ntz branch from b90b38e to ff91b1d Compare April 1, 2025 21:48

vinishjail97 merged commit 680cf9c into main Apr 1, 2025
2 checks passed

vinishjail97 deleted the timestamp-ntz branch April 1, 2025 22:07

vinishjail97 mentioned this pull request Apr 7, 2025

XTable 3rd Release #678

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle timestamp_ntz in delta and iceberg #647

Handle timestamp_ntz in delta and iceberg #647

vinishjail97 commented Feb 12, 2025 •

edited

Loading

emilie-wang commented Feb 25, 2025

vinishjail97 commented Apr 1, 2025

vinishjail97 Apr 1, 2025

vinishjail97 Apr 1, 2025

vinishjail97 Apr 1, 2025

vinishjail97 commented Apr 1, 2025

vinishjail97 commented Apr 1, 2025

the-other-tim-brown Apr 1, 2025

vinishjail97 Apr 1, 2025

Handle timestamp_ntz in delta and iceberg #647

Handle timestamp_ntz in delta and iceberg #647

Conversation

vinishjail97 commented Feb 12, 2025 • edited Loading

Important Read

What is the purpose of the pull request

Brief change log

Verify this pull request

emilie-wang commented Feb 25, 2025

vinishjail97 commented Apr 1, 2025

vinishjail97 Apr 1, 2025

Choose a reason for hiding this comment

vinishjail97 Apr 1, 2025

Choose a reason for hiding this comment

vinishjail97 Apr 1, 2025

Choose a reason for hiding this comment

vinishjail97 commented Apr 1, 2025

vinishjail97 commented Apr 1, 2025

the-other-tim-brown Apr 1, 2025

Choose a reason for hiding this comment

vinishjail97 Apr 1, 2025

Choose a reason for hiding this comment

vinishjail97 commented Feb 12, 2025 •

edited

Loading