-
Notifications
You must be signed in to change notification settings - Fork 1.5k
[substrait] Build basic test suite to validate produced Substrait plans #15069
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Related ticket perhaps: |
I started looking into running the sqllogictests through Substrait, converting the original SQL to Substrait through Isthmus. |
Quick update: 105 failures was actually the number of test files that fail so... all of them. I've managed to get roughly 1/3 of the select queries to produce valid Substrait plans, but then most of them fail to execute because of type issues or missing functions. And that's after transforming all SQL strings to double quote identifiers to avoid casing and reserved names errors. So I don't believe automatically running the full sqllogictests through Substrait is feasible today. A more realistic goal would be to have a few hand-crafted queries (TPCH queries as a start for instance), and assert that their SQL and Substrait results match. |
Thanks @geoffreyclaude -- I think we have the basic TPCH plans covered in https://github.com/apache/datafusion/blob/main/datafusion/substrait/tests/substrait_integration.rs https://github.com/apache/datafusion/tree/main/datafusion/substrait/tests/testdata/tpch_substrait_plans specifically that @vbarua worked on |
One idea that comes to mind is to be able to run the sqllogictests in "substrait roundrip" mode. Upon building a logical DataFusion plan, it would be converted to Substrait and then back to DataFusion logical for each sqllogictest statement. I played around with this idea and it seems to be catching existing issues like #15855. Some advantage of this is that it's a high-coverage/low-effort solution, and that it tests both consuming and producing Substrait plans, but on the other hand the tests would be biased towards how |
It is a good idea -- another potential issue is that it would effectively "tax" other features in the sense that writing tests for unrelated features might trigger a substrait bug that was unrelated to the test being written If we want to add this round trip method I think we should also add a per-test way to disable the round trip testing so we could merge the unrelated PR (and file a ticket to fix the substrait issue as a follow on issue) Basically I think we should avoid blocking other development as much as possible |
Fortunately sqllogictest-rs has support for this via engine names. Having a separate implementation of sqllogictest::AsyncDB with a "DataFusionSubstraitRoundrip" engine name should allow us to flag individual sqllogictests with Although I imagine that if we go down this path, some time will pass until this new testing mode can be enabled in the CI. |
Is your feature request related to a problem or challenge?
In #12244 we found that DataFusion was producing invalid Substrait plans. The
datafusion-substrait
crate has quite a bit of tests but none that actually verify plans are valid according to the https://github.com/substrait-io/substrait-validator.Describe the solution you'd like
Add a test suite to the
datafusion-substrait
crate that uses https://github.com/substrait-io/substrait-validator to validate plans. We can start small and just add a test that would've failed prior to #15011.Describe alternatives you've considered
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: