Skip to content

fix: describe Parquet schema with coerce_int96 #15750

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Apr 19, 2025

Conversation

chenkovsky
Copy link
Contributor

Which issue does this PR close?

Rationale for this change

coerce_int96 is ignored when infer schema.

What changes are included in this PR?

call coerce_int96_to_resolution when fetching schema

Are these changes tested?

UT

Are there any user-facing changes?

No

@github-actions github-actions bot added sqllogictest SQL Logic Tests (.slt) datasource Changes to the datasource crate labels Apr 17, 2025
@comphead
Copy link
Contributor

@mbutrovich cc

Copy link
Contributor

@mbutrovich mbutrovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This generally LGTM, but there's going to be a conflict with #15723. Depending on which one goes in first, the other will need to be updated, which isn't the end of the world.

Thank you @chenkovsky!

@@ -297,3 +297,21 @@ CREATE EXTERNAL TABLE staging.foo STORED AS parquet LOCATION '../../parquet-test
# Create external table with qualified name, but no schema should error
statement error DataFusion error: Error during planning: failed to resolve schema: release
CREATE EXTERNAL TABLE release.bar STORED AS parquet LOCATION '../../parquet-testing/data/alltypes_plain.parquet';


statement ok
Copy link
Contributor

@mbutrovich mbutrovich Apr 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we put this test case in parquet.slt like #15723 does?

@mbutrovich
Copy link
Contributor

Maybe a title change too: fix: describe Parquet schema with coerce_int96.

@chenkovsky chenkovsky changed the title fix: parquet coerce_int96 schema fix: describe Parquet schema with coerce_int96 Apr 17, 2025
) -> Result<Schema> {
let metadata = fetch_parquet_metadata(store, file, metadata_size_hint).await?;
let file_metadata = metadata.file_metadata();
let schema = parquet_to_arrow_schema(
file_metadata.schema_descr(),
file_metadata.key_value_metadata(),
)?;
let schema = match coerce_int96 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a nit: it might be rewritten more concise ?

let schema = coerce_int96
    .and_then(|time_unit| coerce_int96_to_resolution(file_metadata.schema_descr(), &schema, &time_unit).ok())
    .unwrap_or(schema);

@chenkovsky chenkovsky closed this Apr 19, 2025
@chenkovsky chenkovsky reopened this Apr 19, 2025
Copy link
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm thanks @chenkovsky and @mbutrovich for review

@comphead comphead merged commit 8a193c2 into apache:main Apr 19, 2025
46 of 53 checks passed
@alamb
Copy link
Contributor

alamb commented Apr 28, 2025

Nice

nirnayroy pushed a commit to nirnayroy/datafusion that referenced this pull request May 2, 2025
* fix: parquet coerce_int96 schema

* move test to parquet.slt

* update based on comphead's suggestion
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datasource Changes to the datasource crate sqllogictest SQL Logic Tests (.slt)
Projects
None yet
4 participants