Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recursive query takes long time, none-recursive query returns immediately (same output row count and query structure) #5152

Open
sapalli2989 opened this issue Mar 28, 2025 · 1 comment

Comments

@sapalli2989
Copy link
Contributor

sapalli2989 commented Mar 28, 2025

Description

I am currently using a query looking similar to this one:

MATCH (b:A)-[]->(:Attribute)-[]->(:C)<-[* SHORTEST 1..6]-(:C {name: 'Kùzu'})
      RETURN count(b);

Above query returns ~ 1k rows immediately.

In order to make statement a bit more concise, I changed it to:

MATCH (b:A)-[* 2..2]->(:C)<-[* SHORTEST 1..6]-(:C {name: 'Kùzu'})
      RETURN count(b);

This query works hard - I skipped at "Current Pipeline Progress: 10%" after a minute or so.

Conceptually type A or C - there are more, but let's limit to these - is always followed by Attribute, which then is followed by A or C again, and cycle may repeat multiple times. There is a single relation type for (A|C) --> (Attribute) and another for (Attribute) --> (A|C). Attribute is a reified node, hence this pairwise hopping sequence.

(A|C) --> (Attribute) --> (A|C) --> (Attribute) -> ...

In summary, given same output rows and otherwise equal query structure, there is a big performance difference between recursive and non-recursive searching. Here is the row count from splitting query parts:

MATCH (c:C)<-[* SHORTEST 1..6]-(:C {name: 'Kùzu'}) RETURN count(c);
// 33
MATCH (b:A)-[]->(:Attribute)-[]->(:C) RETURN count(b);
// 41206
MATCH (b:A)-[* 2..2]->(:C) RETURN count(b);
// 41206

Is this expected behavior?
For now I marked issue as performance optimization, not sure about a bug. Hopefully you got a hint from description, otherwise let me try to re-create a graph with dummy data.

@sapalli2989
Copy link
Contributor Author

sapalli2989 commented Apr 5, 2025

I abstracted my example to make discussion a bit simpler.

Dataset size: ~200k nodes

First case

Description: Reach :B from :A via 2 hops.

// Non-recursive version
// Execution time: 26.15ms
// returns 41483 rows
MATCH (:A)-[]->()-[]->(:B) RETURN count(*);

// Recursive version
// Execution time: 28484.25ms
// returns 41483 rows
MATCH (:A)-[* 2]->(:B) RETURN count(*);

Comment:

  • Recursive query traverses same amount of hops and returns same rows as non-recursive query.
  • But there is a performance decrease of factor ~ 1000 with recursive variant.
  • It would be great, if this performance decrease could be optimized, to align both variants and make recursive queries more appealing.
  • This also is in light of the relatively small dataset of 200k nodes.

Second case

Description: Reach :B from :A via 2 hops - and join it with another node :C.

// Execution time: 97.12ms
// returns 1113 rows
MATCH (:A)-[]->()-[]->(:B)<-(:C) RETURN count(*);

// Execution time: 220731.62ms
// returns 1113 rows
MATCH (:A)-[* 2]->(:B)<-(:C) RETURN count(*);

, where

// Execution time: 28.30ms
// returns 33 rows
MATCH (:B)<-(:C) RETURN count(*);

(Original example was a more complex query pattern, with 33 output rows as well; let's keep thing simple.)

Comment:

  • Performance decrease of factor ~ 2300 with recursive variant
  • This example might show, that query processor can make better use of join optimization. Given (:B)<-(:C) has just 33 rows and immediately returns, it would be more performant to perform recursive query resolution with left side (:A)-[* 2]->(:B) for these joined 33 nodes only instead of full graph.

Kuzu version 0.9.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant