You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem or challenge?
Building a system that works with graph-like data on DataFusion will stumble upon the need to join the intermediate results of graph patterns. However, null handling is a bit different in these systems compared to SQL.
Usually you combine two intermediary results based on a notion of compatibility instead of strict equality. In these semantics, NULL is compatible with everything. Here is a small table that demonstrates this behavior on a single value:
Lhs
Rhs
Matches?
NULL
NULL
Yes
"A"
NULL
Yes
NULL
"A"
Yes
"A"
"A"
Yes
"A"
"B"
No
Currently, we use UDFs to check for compatibility which can be implemented using a NestedLoopJoinExec as we do not have a "native" equal join condition. Having access to the HashJoin etc. implementation of DataFusion would be great, as we would not have to re-invent the join infrastructure.
Is this something that you'd be interested in having in DF?
Describe the solution you'd like
I propose addressing this problem in three steps:
Replace Join::null_equals_null with an enum JoinNullBehavior (or similar).
Add an additional variant JoinNullBehavior::NullMatchesEverything.
Extending join implementations one-by-one by checking in the planner whether a join implementation is available for the given JoinNullBehavior.
Describe alternatives you've considered
Sticking with NestedLoopJoins
Leaving this behavior downstream
Additional context
Definition of Solution Compatibility in SPARQL 1.1:
Uh oh!
There was an error while loading. Please reload this page.
Is your feature request related to a problem or challenge?
Building a system that works with graph-like data on DataFusion will stumble upon the need to join the intermediate results of graph patterns. However, null handling is a bit different in these systems compared to SQL.
Usually you combine two intermediary results based on a notion of compatibility instead of strict equality. In these semantics,
NULL
is compatible with everything. Here is a small table that demonstrates this behavior on a single value:NULL
NULL
NULL
"A"
"A"
"A"
"B"
Currently, we use UDFs to check for compatibility which can be implemented using a
NestedLoopJoinExec
as we do not have a "native" equal join condition. Having access to the HashJoin etc. implementation of DataFusion would be great, as we would not have to re-invent the join infrastructure.Is this something that you'd be interested in having in DF?
Describe the solution you'd like
I propose addressing this problem in three steps:
Join::null_equals_null
with an enumJoinNullBehavior
(or similar).JoinNullBehavior::NullMatchesEverything
.JoinNullBehavior
.Describe alternatives you've considered
NestedLoopJoins
Additional context
Definition of Solution Compatibility in SPARQL 1.1:
NULL
above represents unbound)This could also be helpful for SQL/PGQ or GQL implementations based on DF.
Related Issues:
The text was updated successfully, but these errors were encountered: