Skip to content

feat: add SchemaProvider::table_type(table_name: &str) #16401

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

epgif
Copy link

@epgif epgif commented Jun 13, 2025

InformationSchemaConfig::make_tables only needs the TableType not the whole TableProvider, and the former may require an expensive catalog operation to construct and the latter may not.

This allows avoiding SELECT * FROM information_schema.tables having to make 1 of those potentially expensive operations per table.

@github-actions github-actions bot added the catalog Related to the catalog crate label Jun 13, 2025
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feature makes sense to me -- thank you @epgif

I wonder if there is some way we can write a test for it (mostly to prevent it from being accidentally broken/changed in the future)

Maybe an example or something 🤔

@epgif
Copy link
Author

epgif commented Jun 13, 2025

@alamb

I wonder if there is some way we can write a test for it (mostly to prevent it from being accidentally broken/changed in the future)

I looked around for some tests implementing this interface that I could extend, but didn't find any. I meant to ask earlier but forgot: is there anything like that? It would be a big help if I had some previous tests to take inspiration from.

Failing that, I'll take a stab at it anyway :)

Thanks!

/// returns `None`. Implementations for which this operation is cheap but [Self::table] is
/// expensive can override this to improve operations that only need the type, e.g.
/// `SELECT * FROM information_schema.tables`.
async fn table_type(&self, name: &str) -> Result<Option<TableType>, DataFusionError> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
async fn table_type(&self, name: &str) -> Result<Option<TableType>, DataFusionError> {
async fn table_type(&self, name: &str) -> DataFusionResult<Option<TableType>> {

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any DataFusionResult in the project, but I do see that E = DataFusionError is the default so I dropped that redundancy. Is that what you meant?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, messed with another repo. in this file it should be like datafusion_common::Result

in DF we have a type alias in common crate

pub type Result<T, E = DataFusionError> = result::Result<T, E>;

please reuse it

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All references to Result in this file are already the one from datafusion_common, including what I've added.

Copy link
Contributor

@comphead comphead Jun 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please change also one in the test

    ) -> Result<Option<Arc<dyn TableProvider>>, DataFusionError> {

/// expensive can override this to improve operations that only need the type, e.g.
/// `SELECT * FROM information_schema.tables`.
async fn table_type(&self, name: &str) -> Result<Option<TableType>, DataFusionError> {
self.table(name).await.map(|o| o.map(|t| t.table_type()))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there any way to avoid nested map? perhaps flat_map or and_then ? would me more readable IMHO

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to and_then.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, clippy suggested what I had to begin with:

warning: using `Result.and_then(|x| Ok(y))`, which is more succinctly expressed as `map(|x| y)`
  --> datafusion/catalog/src/schema.rs:63:9
   |
63 | /         self.table(name)
64 | |             .await
65 | |             .and_then(|o| Ok(o.map(|t| t.table_type())))
   | |________________________________________________________^
   |
   = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#bind_instead_of_map
   = note: `#[warn(clippy::bind_instead_of_map)]` on by default
help: try
   |
63 ~         self.table(name)
64 +             .await.map(|o| o.map(|t| t.table_type()))

So I changed it back.

@epgif epgif force-pushed the schema-provider-table-type branch from 50e9552 to 6961bcf Compare June 16, 2025 22:59
@epgif
Copy link
Author

epgif commented Jun 16, 2025

Please take another look @comphead -- thanks!

epgif added 2 commits June 17, 2025 18:46
InformationSchemaConfig::make_tables only needs the TableType not the
whole TableProvider, and the former may require an expensive catalog
operation to construct and the latter may not.

This allows avoiding `SELECT * FROM information_schema.tables` having to
make 1 of those potentially expensive operations per table.
@epgif epgif force-pushed the schema-provider-table-type branch from 6961bcf to 9366395 Compare June 17, 2025 23:48
@epgif
Copy link
Author

epgif commented Jun 17, 2025

@alamb

I wonder if there is some way we can write a test for it (mostly to prevent it from being accidentally broken/changed in the future)

I looked around for some tests implementing this interface that I could extend, but didn't find any.

I tried panic-driven development and identified various unrelated test, none providing a clue.

The trick here is a test needs to check that one function is called rather than another. This seems like a classic low-level unit test.

So, that's what I've done.

It looks noisy if you're not used to this kind of test, but it does work, and does provide the expected failure when the information_schema.rs part is reverted (i.e. wrong function called).

Let me know what you think.

Thanks, @alamb!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
catalog Related to the catalog crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants