-
Notifications
You must be signed in to change notification settings - Fork 1.5k
feat: add SchemaProvider::table_type(table_name: &str) #16401
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feature makes sense to me -- thank you @epgif
I wonder if there is some way we can write a test for it (mostly to prevent it from being accidentally broken/changed in the future)
Maybe an example or something 🤔
I looked around for some tests implementing this interface that I could extend, but didn't find any. I meant to ask earlier but forgot: is there anything like that? It would be a big help if I had some previous tests to take inspiration from. Failing that, I'll take a stab at it anyway :) Thanks! |
datafusion/catalog/src/schema.rs
Outdated
/// returns `None`. Implementations for which this operation is cheap but [Self::table] is | ||
/// expensive can override this to improve operations that only need the type, e.g. | ||
/// `SELECT * FROM information_schema.tables`. | ||
async fn table_type(&self, name: &str) -> Result<Option<TableType>, DataFusionError> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
async fn table_type(&self, name: &str) -> Result<Option<TableType>, DataFusionError> { | |
async fn table_type(&self, name: &str) -> DataFusionResult<Option<TableType>> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see any DataFusionResult
in the project, but I do see that E = DataFusionError
is the default so I dropped that redundancy. Is that what you meant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, messed with another repo. in this file it should be like datafusion_common::Result
in DF we have a type alias in common
crate
pub type Result<T, E = DataFusionError> = result::Result<T, E>;
please reuse it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All references to Result
in this file are already the one from datafusion_common
, including what I've added.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please change also one in the test
) -> Result<Option<Arc<dyn TableProvider>>, DataFusionError> {
/// expensive can override this to improve operations that only need the type, e.g. | ||
/// `SELECT * FROM information_schema.tables`. | ||
async fn table_type(&self, name: &str) -> Result<Option<TableType>, DataFusionError> { | ||
self.table(name).await.map(|o| o.map(|t| t.table_type())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there any way to avoid nested map
? perhaps flat_map
or and_then
? would me more readable IMHO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to and_then
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, clippy suggested what I had to begin with:
warning: using `Result.and_then(|x| Ok(y))`, which is more succinctly expressed as `map(|x| y)`
--> datafusion/catalog/src/schema.rs:63:9
|
63 | / self.table(name)
64 | | .await
65 | | .and_then(|o| Ok(o.map(|t| t.table_type())))
| |________________________________________________________^
|
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#bind_instead_of_map
= note: `#[warn(clippy::bind_instead_of_map)]` on by default
help: try
|
63 ~ self.table(name)
64 + .await.map(|o| o.map(|t| t.table_type()))
So I changed it back.
50e9552
to
6961bcf
Compare
Please take another look @comphead -- thanks! |
InformationSchemaConfig::make_tables only needs the TableType not the whole TableProvider, and the former may require an expensive catalog operation to construct and the latter may not. This allows avoiding `SELECT * FROM information_schema.tables` having to make 1 of those potentially expensive operations per table.
6961bcf
to
9366395
Compare
I tried panic-driven development and identified various unrelated test, none providing a clue. The trick here is a test needs to check that one function is called rather than another. This seems like a classic low-level unit test. So, that's what I've done. It looks noisy if you're not used to this kind of test, but it does work, and does provide the expected failure when the Let me know what you think. Thanks, @alamb! |
InformationSchemaConfig::make_tables only needs the TableType not the whole TableProvider, and the former may require an expensive catalog operation to construct and the latter may not.
This allows avoiding
SELECT * FROM information_schema.tables
having to make 1 of those potentially expensive operations per table.