Recursive query takes long time, none-recursive query returns immediately (same output row count and query structure) #5152

sapalli2989 · 2025-03-28T10:26:51Z

Description

I am currently using a query looking similar to this one:

MATCH (b:A)-[]->(:Attribute)-[]->(:C)<-[* SHORTEST 1..6]-(:C {name: 'Kùzu'})
      RETURN count(b);

Above query returns ~ 1k rows immediately.

In order to make statement a bit more concise, I changed it to:

MATCH (b:A)-[* 2..2]->(:C)<-[* SHORTEST 1..6]-(:C {name: 'Kùzu'})
      RETURN count(b);

This query works hard - I skipped at "Current Pipeline Progress: 10%" after a minute or so.

Conceptually type A or C - there are more, but let's limit to these - is always followed by Attribute, which then is followed by A or C again, and cycle may repeat multiple times. There is a single relation type for (A|C) --> (Attribute) and another for (Attribute) --> (A|C). Attribute is a reified node, hence this pairwise hopping sequence.

(A|C) --> (Attribute) --> (A|C) --> (Attribute) -> ...

In summary, given same output rows and otherwise equal query structure, there is a big performance difference between recursive and non-recursive searching. Here is the row count from splitting query parts:

MATCH (c:C)<-[* SHORTEST 1..6]-(:C {name: 'Kùzu'}) RETURN count(c);
// 33
MATCH (b:A)-[]->(:Attribute)-[]->(:C) RETURN count(b);
// 41206
MATCH (b:A)-[* 2..2]->(:C) RETURN count(b);
// 41206

Is this expected behavior?
For now I marked issue as performance optimization, not sure about a bug. Hopefully you got a hint from description, otherwise let me try to re-create a graph with dummy data.

The text was updated successfully, but these errors were encountered:

sapalli2989 · 2025-04-05T18:41:56Z

I abstracted my example to make discussion a bit simpler.

Dataset size: ~200k nodes

First case

Description: Reach :B from :A via 2 hops.

// Non-recursive version
// Execution time: 26.15ms
// returns 41483 rows
MATCH (:A)-[]->()-[]->(:B) RETURN count(*);

// Recursive version
// Execution time: 28484.25ms
// returns 41483 rows
MATCH (:A)-[* 2]->(:B) RETURN count(*);

Comment:

Recursive query traverses same amount of hops and returns same rows as non-recursive query.
But there is a performance decrease of factor ~ 1000 with recursive variant.
It would be great, if this performance decrease could be optimized, to align both variants and make recursive queries more appealing.
This also is in light of the relatively small dataset of 200k nodes.

Second case

Description: Reach :B from :A via 2 hops - and join it with another node :C.

// Execution time: 97.12ms
// returns 1113 rows
MATCH (:A)-[]->()-[]->(:B)<-(:C) RETURN count(*);

// Execution time: 220731.62ms
// returns 1113 rows
MATCH (:A)-[* 2]->(:B)<-(:C) RETURN count(*);

, where

// Execution time: 28.30ms
// returns 33 rows
MATCH (:B)<-(:C) RETURN count(*);

(Original example was a more complex query pattern, with 33 output rows as well; let's keep thing simple.)

Comment:

Performance decrease of factor ~ 2300 with recursive variant
This example might show, that query processor can make better use of join optimization. Given (:B)<-(:C) has just 33 rows and immediately returns, it would be more performant to perform recursive query resolution with left side (:A)-[* 2]->(:B) for these joined 33 nodes only instead of full graph.

Kuzu version 0.9.0

sapalli2989 added the performance optimization label Mar 28, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Recursive query takes long time, none-recursive query returns immediately (same output row count and query structure) #5152

Recursive query takes long time, none-recursive query returns immediately (same output row count and query structure) #5152

sapalli2989 commented Mar 28, 2025 •

edited

Loading

sapalli2989 commented Apr 5, 2025 •

edited

Loading

Recursive query takes long time, none-recursive query returns immediately (same output row count and query structure) #5152

Recursive query takes long time, none-recursive query returns immediately (same output row count and query structure) #5152

Comments

sapalli2989 commented Mar 28, 2025 • edited Loading

Description

sapalli2989 commented Apr 5, 2025 • edited Loading

First case

Second case

sapalli2989 commented Mar 28, 2025 •

edited

Loading

sapalli2989 commented Apr 5, 2025 •

edited

Loading