Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: Convenience function to return a table of database statistics #5025

Open
prrao87 opened this issue Mar 10, 2025 · 2 comments
Open
Assignees
Labels
feature New features or missing components of existing features

Comments

@prrao87
Copy link
Member

prrao87 commented Mar 10, 2025

Feature

Return a single table containing statistics of the number of nodes and relationships (grouped by table name).

Description

As of v0.8.2, it's quite tedious to display the summary of the number of nodes and relationships, grouped by the table name. It takes the following two queries run sequentially:

Query 1

MATCH (n) 
RETURN label(n), COUNT(label(n));
┌──────────────────────────┬─────────────────────────────────┐
│ LABEL(n._ID,[User,City]) │ COUNT(LABEL(n._ID,[User,City])) │
│ STRING                   │ INT64                           │
├──────────────────────────┼─────────────────────────────────┤
│ User                     │ 4                               │
│ City                     │ 3                               │
└──────────────────────────┴─────────────────────────────────┘

Query 2

MATCH ()-[r]->() 
RETURN label(r), COUNT(label(r));
┌──────────────────────────────────┬─────────────────────────────────────────┐
│ LABEL(r._ID,[,,Follows,LivesIn]) │ COUNT(LABEL(r._ID,[,,Follows,LivesIn])) │
│ STRING                           │ INT64                                   │
├──────────────────────────────────┼─────────────────────────────────────────┤
│ LivesIn                          │ 4                                       │
│ Follows                          │ 4                                       │
└──────────────────────────────────┴─────────────────────────────────────────┘

It takes a lot of manual typing and remembering the Cypher syntax for label(x) to get the results in two separate statements.

Suggested syntax

It would be great to expose a function that does this in one single command.

CALL get_db_stats() RETURN *;

Or something along those lines, to return a combined table that shows the number of nodes and relationships, grouped by the table name. Neo4j's APOC does this via apoc.meta.stats.

Why is this useful?

After a large data ingestion job, it can be very useful to run a single command to check that the right number of nodes and relationships exist in the database. Many visualization tools do this by default. Examples of G.V() and Neo4j browser below.

G.V()

Image

Neo4j browser

Image

Downstream benefit

If this is implemented, we can pass the results to the Kuzu Explorer UI so that we can display the statistics when the user opens the UI, similar to how other visualization tools do it.

@prrao87 prrao87 added the feature New features or missing components of existing features label Mar 10, 2025
@ray6080
Copy link
Contributor

ray6080 commented Mar 10, 2025

Why would CALL get_db_stats() RETURN *; be much better than MATCH statements? I think the MATCH statements are simple enough and more expressive to cater customized user needs. If this is for UI tools, e.g., our Explorer, then it should be quite easy to embed the MATCH statements inside these tools to grab needed stats in my understanding.

@prrao87
Copy link
Member Author

prrao87 commented Mar 10, 2025

Why would CALL get_db_stats() RETURN *; be much better than MATCH statements? I think the MATCH statements are simple enough and more expressive to cater customized user needs. If this is for UI tools, e.g., our Explorer, then it should be quite easy to embed the MATCH statements inside these tools to grab needed stats in my understanding.

It's fewer lines to write manually, and all the results are collected into a single table, and it takes one line of Cypher (less to type and less to remember for the user). If all the function does under the hood is calling those two underlying MATCH statements, it's still worth imo, because it reduces the user effort in getting the info they need (this is valuable information that tells them if their data ingestion was done correctly as per expectations).

@andyfengHKU andyfengHKU mentioned this issue Mar 17, 2025
45 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New features or missing components of existing features
Projects
None yet
Development

No branches or pull requests

3 participants