-
Notifications
You must be signed in to change notification settings - Fork 1.5k
GH-3070: Add Variant logical type annotation to parquet-java #3072
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Do we need to support writing and reading variant data? |
The Variant reading and writing are getting implemented in Iceberg and/or the engines themselves. I think later we can think of pulling the implementation to Parquet if needed. |
I think this is problematic if the spec lives in parquet and doesn't have a complete implementation per previously agreed upon guidelines for new parquet features. This probably warrants a discussion on the mailing list. CC @julienledem @rdblue @RussellSpitzer |
@aihuaxu I agree with @emkornfield that the It would also be great to drop some example parquet files in https://github.com/apache/parquet-testing, this will also help the adoption of other implementations, see apache/parquet-format#456 (comment) |
Usually we need two reference implementations for spec changes like this. I'm not sure if there is any chance to have another implementation ready in a timely manner. IMO, at least parquet-java should support basic roundtrip read and write. |
I see. Per guideline, we need to have the implementation in parquet-java and then another one. Do we usually include the implementation with this annotation change or should be separate?
|
I think it should be in one change. The parquet-format cannot be released without concrete PoC implementation in parquet-java. Without that release, separate changes may break CI and thus cannot be merged. |
@wgtmac With https://github.com/apache/parquet-java/pull/3117/files implementing encoding/decoding, should we consider merging this separately? |
I think at least it needs the conversion from/to thrift definition of the variant type. So we need to wait for the release of parquet-format 2.11.0. |
parquet-column/src/main/java/org/apache/parquet/schema/LogicalTypeAnnotation.java
Outdated
Show resolved
Hide resolved
9473e1f
to
e7c97e6
Compare
parquet-column/src/test/java/org/apache/parquet/parser/TestParquetParser.java
Outdated
Show resolved
Hide resolved
b67c034
to
a683f6a
Compare
parquet-column/src/test/java/org/apache/parquet/schema/TestTypeBuilders.java
Show resolved
Hide resolved
parquet-hadoop/src/main/java/org/apache/parquet/format/converter/ParquetMetadataConverter.java
Outdated
Show resolved
Hide resolved
parquet-format-structures/src/main/java/org/apache/parquet/format/LogicalTypes.java
Outdated
Show resolved
Hide resolved
af4576a
to
ba4bbdf
Compare
parquet-format-structures/src/main/java/org/apache/parquet/format/LogicalTypes.java
Outdated
Show resolved
Hide resolved
parquet-format-structures/src/test/java/org/apache/parquet/format/TestLogicalTypes.java
Outdated
Show resolved
Hide resolved
parquet-hadoop/src/main/java/org/apache/parquet/format/converter/ParquetMetadataConverter.java
Outdated
Show resolved
Hide resolved
20d29b5
to
707a0a0
Compare
parquet-format-structures/src/main/java/org/apache/parquet/format/LogicalTypes.java
Outdated
Show resolved
Hide resolved
parquet-hadoop/src/main/java/org/apache/parquet/format/converter/ParquetMetadataConverter.java
Outdated
Show resolved
Hide resolved
...t-hadoop/src/test/java/org/apache/parquet/format/converter/TestParquetMetadataConverter.java
Show resolved
Hide resolved
59bad9e
to
3dcb486
Compare
c717a05
to
eb679e8
Compare
parquet-column/src/main/java/org/apache/parquet/schema/LogicalTypeAnnotation.java
Outdated
Show resolved
Hide resolved
parquet-column/src/main/java/org/apache/parquet/schema/LogicalTypeAnnotation.java
Outdated
Show resolved
Hide resolved
Thanks, @aihuaxu! And thanks to @emkornfield for taking a look as well. |
Rationale for this change
This is to add Variant logical type in parquet-java to be used by dependent projects.
What changes are included in this PR?
The
Variant
logical type has been added to LogicalTypeAnnotation. For variant columns, the corresponding Parquet group is annotated as VARIANT(), indicating that the variant data may be encoded according to the specified or lower version. Readers can use this version information to validate compatibility and fail early if the version is not supported.Are these changes tested?
Yes.
Are there any user-facing changes?
Yes. Variant logical type is available.
Closes #3070