-
Notifications
You must be signed in to change notification settings - Fork 172
Handle timestamp_ntz in delta and iceberg #647
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
xtable-core/src/main/java/org/apache/xtable/delta/DeltaConversionTarget.java
Outdated
Show resolved
Hide resolved
xtable-core/src/main/java/org/apache/xtable/delta/DeltaConversionTarget.java
Outdated
Show resolved
Hide resolved
xtable-core/src/main/java/org/apache/xtable/delta/DeltaSchemaExtractor.java
Show resolved
Hide resolved
Hi I just came across to this PR, and if you planned to continue this fix, the target sources also need fix and they need to know how to deal with this timestamp_ntz type correctly: for example Iceberg: https://github.com/apache/incubator-xtable/blob/main/xtable-core/src/main/java/org/apache/xtable/iceberg/IcebergSchemaExtractor.java#L176 |
214f1d6
to
eafaf5e
Compare
I will pick this up in a follow-up PR. |
|
||
TableFormatSync.getInstance() | ||
.syncSnapshot(Collections.singletonList(conversionTarget), snapshot1); | ||
// Delta standalone library can't read versions (3,7) and needs delta kernel dependency. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs delta-kernel upgrade as standalone doesn't support (3,7) versions in delta protocol.
#671
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
io.delta.standalone.internal.exception.DeltaErrors$InvalidProtocolVersionException:
Delta protocol version (3,7) is too new for this version of Delta
Standalone Reader/Writer (1,2). Please upgrade to a newer release.
at io.delta.standalone.internal.DeltaLogImpl.assertProtocolRead(DeltaLogImpl.scala:214)
at io.delta.standalone.internal.SnapshotImpl.<init>(SnapshotImpl.scala:244)
at io.delta.standalone.internal.SnapshotManagement.createSnapshot(SnapshotManagement.scala:257)
at io.delta.standalone.internal.SnapshotManagement.getSnapshotAtInit(SnapshotManagement.scala:224)
This is the exception we hit using latest version of delta-standalone library 3.3.0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Handled the validation using spark for now, upgrading to delta-kernel will be handled separately.
The integration tests for source format as HUDI and target format as DELTA are failing, because of the following exception.
The test will fail for hudi 0.x and will pass when hudi is upgraded to 1.x. |
@emilie-wang I have fixed it for iceberg as well in this PR. |
xtable-core/src/main/java/org/apache/xtable/avro/AvroSchemaConverter.java
Outdated
Show resolved
Hide resolved
xtable-core/src/main/java/org/apache/xtable/delta/DeltaSchemaExtractor.java
Show resolved
Hide resolved
public void testTimestampNtz() { | ||
InternalSchema schema1 = getInternalSchemaWithTimestampNtz(); | ||
List<InternalField> fields2 = new ArrayList<>(schema1.getFields()); | ||
fields2.add( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you tested whether the write works if the initial schema contains the timestamp_ntz?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes in this test, schema1 contains ntz column and the second commit is adding a nullable float column.
b90b38e
to
ff91b1d
Compare
Important Read
What is the purpose of the pull request
timestamp_ntz
in delta target, ensured backwards compatibility by using the min and max writer version fromdeltaLog.snapshot()
timestamp_ntz
for iceberg target as well.Brief change log
(for example:)
timestamp_ntz
for iceberg target as wellVerify this pull request
(Please pick either of the following options)
This change added tests and can be verified as follows:
(example:)