Skip to content

Add recursion limit configuration to DFParser #14095

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 7 additions & 2 deletions datafusion/sql/src/parser.rs
Original file line number Diff line number Diff line change
Expand Up @@ -257,6 +257,9 @@ fn ensure_not_set<T>(field: &Option<T>, name: &str) -> Result<(), ParserError> {
Ok(())
}

// By default, allow expressions up to this deep before error
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can say this is 2x the default limit of sqlparser so it is clear this is meant to increase the limit

Suggested change
// By default, allow expressions up to this deep before error
// By default, allow expressions up to this deep before error
// (2x the default in sqlparser)

const DEFAULT_REMAINING_DEPTH: usize = 100;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I double checked that the default in sqlparser is 50 so this is 2x the size https://docs.rs/sqlparser/latest/src/sqlparser/parser/mod.rs.html#187

I am a little worried about this as it is a hard coded limit but I think we can always make it a config flag later


/// DataFusion SQL Parser based on [`sqlparser`]
///
/// Parses DataFusion's SQL dialect, often delegating to [`sqlparser`]'s [`Parser`].
Expand Down Expand Up @@ -287,7 +290,9 @@ impl<'a> DFParser<'a> {
let tokens = tokenizer.tokenize()?;

Ok(DFParser {
parser: Parser::new(dialect).with_tokens(tokens),
parser: Parser::new(dialect)
.with_recursion_limit(DEFAULT_REMAINING_DEPTH)
.with_tokens(tokens),
})
}

Expand All @@ -299,7 +304,7 @@ impl<'a> DFParser<'a> {
}

/// Parse a SQL string and produce one or more [`Statement`]s with
/// with the specified dialect.
/// the specified dialect.
pub fn parse_sql_with_dialect(
sql: &str,
dialect: &dyn Dialect,
Expand Down
1 change: 1 addition & 0 deletions datafusion/sqllogictest/bin/sqllogictests.rs
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,7 @@ const SQLITE_PREFIX: &str = "sqlite";
pub fn main() -> Result<()> {
tokio::runtime::Builder::new_multi_thread()
.enable_all()
.thread_stack_size(4 * 1024 * 1024)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to adjust this config to avoid error thread 'tokio-runtime-worker' has overflowed its stack of sqllogictest process since we change the with_recursion_limit config.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am worried about this. We have recently added the the recursive protection feature precisely to avoid stack overflows (ever) but now after this change the stacks can overflow.

Here is the stack where it overflows (precisely what the Recursion limit is designed to prevent:

<sqlparser::tokenizer::Token as core::clone::Clone>::clone tokenizer.rs:52
<sqlparser::tokenizer::TokenWithSpan as core::clone::Clone>::clone tokenizer.rs:633
core::option::Option<&T>::cloned option.rs:1917
sqlparser::parser::Parser::next_token mod.rs:3443
sqlparser::parser::Parser::parse_data_type_helper mod.rs:8097
sqlparser::parser::Parser::parse_data_type mod.rs:8083
sqlparser::parser::Parser::parse_prefix::{{closure}} mod.rs:1243
sqlparser::parser::Parser::try_parse mod.rs:3808
sqlparser::parser::Parser::maybe_parse mod.rs:3795
sqlparser::parser::Parser::parse_prefix mod.rs:1242
sqlparser::parser::Parser::parse_subexpr mod.rs:964
sqlparser::parser::Parser::parse_expr mod.rs:957
core::ops::function::FnMut::call_mut function.rs:166
sqlparser::parser::Parser::parse_comma_separated_with_trailing_commas mod.rs:3730
sqlparser::parser::Parser::parse_comma_separated mod.rs:3715
sqlparser::parser::Parser::parse_prefix mod.rs:1381
sqlparser::parser::Parser::parse_subexpr mod.rs:964
sqlparser::parser::Parser::parse_infix mod.rs:2928
sqlparser::parser::Parser::parse_subexpr mod.rs:974
sqlparser::parser::Parser::parse_expr mod.rs:957
core::ops::function::FnMut::call_mut function.rs:166
sqlparser::parser::Parser::parse_comma_separated_with_trailing_commas mod.rs:3730
sqlparser::parser::Parser::parse_comma_separated mod.rs:3715
sqlparser::parser::Parser::parse_prefix mod.rs:1381
sqlparser::parser::Parser::parse_subexpr mod.rs:964
sqlparser::parser::Parser::parse_infix mod.rs:2928
sqlparser::parser::Parser::parse_subexpr mod.rs:974
sqlparser::parser::Parser::parse_expr mod.rs:957
core::ops::function::FnMut::call_mut function.rs:166
sqlparser::parser::Parser::parse_comma_separated_with_trailing_commas mod.rs:3730
sqlparser::parser::Parser::parse_comma_separated mod.rs:3715
sqlparser::parser::Parser::parse_prefix mod.rs:1381
sqlparser::parser::Parser::parse_subexpr mod.rs:964
sqlparser::parser::Parser::parse_infix mod.rs:2928
sqlparser::parser::Parser::parse_subexpr mod.rs:974
sqlparser::parser::Parser::parse_expr mod.rs:957
core::ops::function::FnMut::call_mut function.rs:166
sqlparser::parser::Parser::parse_comma_separated_with_trailing_commas mod.rs:3730
sqlparser::parser::Parser::parse_comma_separated mod.rs:3715
sqlparser::parser::Parser::parse_prefix mod.rs:1381
sqlparser::parser::Parser::parse_subexpr mod.rs:964
sqlparser::parser::Parser::parse_infix mod.rs:2928
sqlparser::parser::Parser::parse_subexpr mod.rs:974
sqlparser::parser::Parser::parse_expr mod.rs:957
core::ops::function::FnMut::call_mut function.rs:166
sqlparser::parser::Parser::parse_comma_separated_with_trailing_commas mod.rs:3730
sqlparser::parser::Parser::parse_comma_separated mod.rs:3715
...
std::panicking::try::do_call panicking.rs:557
__rust_try 0x000000010761f11c
[Inlined] std::panicking::try panicking.rs:520
[Inlined] std::panic::catch_unwind panic.rs:358
std::thread::Builder::spawn_unchecked_::{{closure}} mod.rs:559
core::ops::function::FnOnce::call_once{{vtable.shim}} function.rs:250
[Inlined] <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once boxed.rs:1972
[Inlined] <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once boxed.rs:1972
std::sys::pal::unix::thread::Thread::new::thread_start thread.rs:105
_pthread_start 0x00000001940832e4

Note @blaginin recently added the recursive feature to sql parser (but it isn't yet released) which I think will allow this query to run without having to update the stack size

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, very clear. So, I will close this PR and wait for the new release of sqlparser-rs. Tthanks for the review @alamb @2010YOUY01

.build()?
.block_on(run_tests())
}
Expand Down
14 changes: 14 additions & 0 deletions datafusion/sqllogictest/test_files/errors.slt
Original file line number Diff line number Diff line change
Expand Up @@ -161,3 +161,17 @@ create table records (timestamp timestamp, value float) as values (
'2021-01-01 00:00:00', 1.0,
'2021-01-01 00:00:00', 2.0
);

# Error number of nested expressions exceeds limit
statement error DataFusion error: SQL error: RecursionLimitExceeded
SELECT
c1
FROM
aggregate_test_100
WHERE
c1=0 OR (c2=0 OR (c3=0 OR (c4=0 OR (c5=0 OR (c6=0 OR (c7=0 OR (c8=0 OR (c9=0 OR (c10=0 OR
(c1=1 OR (c2=1 OR (c3=1 OR (c4=1 OR (c5=1 OR (c6=1 OR (c7=1 OR (c8=1 OR (c9=1 OR (c10=1 OR
(c1=2 OR (c2=2 OR (c3=2 OR (c4=2 OR (c5=2 OR (c6=2 OR (c7=2 OR (c8=2 OR (c9=2 OR (c10=2 OR
(c1=3 OR (c2=3 OR (c3=3 OR (c4=3 OR (c5=3 OR (c6=3 OR (c7=3 OR (c8=3 OR (c9=3 OR (c10=3 OR
(c1=4 OR (c2=4 OR (c3=4 OR (c4=4 OR (c5=4 OR (c6=4 OR (c7=4 OR (c8=4 OR (c9=4 OR (c10=4
)))))))))))))))))))))))))))))))))))))))))))))))));
Loading