-
Notifications
You must be signed in to change notification settings - Fork 1.5k
fix: describe Parquet schema with coerce_int96 #15750
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@mbutrovich cc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This generally LGTM, but there's going to be a conflict with #15723. Depending on which one goes in first, the other will need to be updated, which isn't the end of the world.
Thank you @chenkovsky!
@@ -297,3 +297,21 @@ CREATE EXTERNAL TABLE staging.foo STORED AS parquet LOCATION '../../parquet-test | |||
# Create external table with qualified name, but no schema should error | |||
statement error DataFusion error: Error during planning: failed to resolve schema: release | |||
CREATE EXTERNAL TABLE release.bar STORED AS parquet LOCATION '../../parquet-testing/data/alltypes_plain.parquet'; | |||
|
|||
|
|||
statement ok |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we put this test case in parquet.slt
like #15723 does?
Maybe a title change too: |
) -> Result<Schema> { | ||
let metadata = fetch_parquet_metadata(store, file, metadata_size_hint).await?; | ||
let file_metadata = metadata.file_metadata(); | ||
let schema = parquet_to_arrow_schema( | ||
file_metadata.schema_descr(), | ||
file_metadata.key_value_metadata(), | ||
)?; | ||
let schema = match coerce_int96 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a nit: it might be rewritten more concise ?
let schema = coerce_int96
.and_then(|time_unit| coerce_int96_to_resolution(file_metadata.schema_descr(), &schema, &time_unit).ok())
.unwrap_or(schema);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm thanks @chenkovsky and @mbutrovich for review
Nice |
* fix: parquet coerce_int96 schema * move test to parquet.slt * update based on comphead's suggestion
Which issue does this PR close?
datafusion.execution.parquet.coerce_int96
is set, timestamp type is still reported as Timestamp(nanoseconds) #15721.Rationale for this change
coerce_int96 is ignored when infer schema.
What changes are included in this PR?
call coerce_int96_to_resolution when fetching schema
Are these changes tested?
UT
Are there any user-facing changes?
No