Improve tutorial documentation #353

prrao87 · 2025-02-03T16:59:23Z

Users have been asking for more tutorials and examples in other languages than Python.

I propose that we update the Tutorials section in our docs to demonstrate the versatility of Kùzu to be used in various client languages. We need to showcase the same workflow, on the same dataset, highlighting that Kùzu caters to users coming from almost any language.

Subtask 1

First, we need to create an artificial dataset that clearly demonstrates the benefits of using a graph to answer the following kinds of queries.

2-hop queries in graphs
Aggregation (Cypher doesn't have a CROUP BY clause, so we need to show how you can aggregate on a particular property while grouping on another)
Shortest paths using our convenient SHORTEST keyword in Cypher

Subtask 2

Write tutorials in each client language that showcases the end-to-end workflows in each client language that we officially support. We would read in data from CSV/Parquet files and create individual sub-issues linked to this issue that various team members can take on.

cc @aracardan @WWW0030

The text was updated successfully, but these errors were encountered:

WWW0030 · 2025-02-03T20:53:39Z

I think that we can use an artificial dataset that represents a community of twitter users to demonstrate these benefits.

Dataset: Twitter community

Nodes:

User

userId (INT64 PRIMARY KEY)
username (STRING)
account_creation_date (DATE)

Posts

postId (INT64 PRIMARY KEY)
post_date (DATE)
like_amount (INT64)
retweet_amount (INT64)

Relations:

Follows (FROM User TO User)
Posts (FROM User TO Posts)
Likes (FROM User TO Posts)

Queries

2-hop queries can recommend users on who to follow via their follower's follower
Aggregation can return statstics of the users in the group, for example, the average amount of followers/followees each user has
Shortest path can return the shortest path from user A to user B.

Other suggested queries:

We should start the query off with a couple of queries which can be both achieved by SQL and graph query. I think that this can allow some users to start in more familar territories and also provide basic syntax of cypher for someone with SQL knowledge.
We can also add a bit more complicated queries. One query which I think can extensively show the power of graph queries is to use the graph given to generate a personal recommendation page for a specific User (By matching their followers' posts and liked posts, and ordering them by like counts).

I also attached the csvs to be used in the tutorial, cc @prrao87 please take a look 👍
tutorial_likes.csv.csv
tutorial_posts.csv.csv
tutorial_tweets.csv.csv
tutorial_users.csv.csv
tutorial_follows.csv.csv

WWW0030 · 2025-02-04T23:54:19Z

Here is a rough draft of what RUST's tutorial queries will look like, I think we should:

Break down a query step by step, explaining what each step does
Link the query syntax used in each query to their respective documents so that the user can easily refer to them for more context

Any other suggestions can also be helpful!

use kuzu::{Connection, Database, Error, SystemConfig};

fn main() -> Result<(), Error> {
    // Create an empty on-disk database and connect to it
    let db = Database::new("./demo_db", SystemConfig::default())?;
    let conn = Connection::new(&db)?;

    // Create the tables
    conn.query("CREATE NODE TABLE User(userId INT64 PRIMARY KEY, username STRING, account_creation_date DATE)")?;
    conn.query("CREATE NODE TABLE User_Post(postId INT64 PRIMARY KEY, post_date DATE, like_count INT64, retweet_count INT64)")?;
    conn.query("CREATE REL TABLE FOLLOWS(FROM User TO User)")?;
    conn.query("CREATE REL TABLE POSTS(FROM User TO User_Post)")?;
    conn.query("CREATE REL TABLE LIKES(FROM User TO User_Post)")?;

    conn.query("COPY User FROM './data/tutorial_user.csv'")?;
    conn.query("COPY User_Post FROM './data/tutorial_user_post.csv'")?;
    conn.query("COPY FOLLOWS FROM './data/TUTORIAL_FOLLOWS.csv'")?;
    conn.query("COPY POSTS FROM './data/TUTORIAL_POSTS.csv'")?;
    conn.query("COPY LIKES FROM './data/TUTORIAL_LIKES.csv'")?;

    // Querying a two-hop statement, giving user recommended follows:
    // First, we want to query for users that we follow follows. We should start off with a query which looks like this:
    conn.query("""
        MATCH (u1:User)-[f1:FOLLOWS]->(u2:User)-[f2:FOLLOWS]->(u3:User)
        RETURN u3
        """)?;

    // Adding onto the query, we want to specify the u1 to be the user we wish to recommend to. We use a WHERE Clause to do so:
    conn.query("""
        MATCH (u1:User)-[f1:FOLLOWS]->(u2:User)-[f2:FOLLOWS]->(u3:User)
        WHERE u1.username = 'epicking81'
        RETURN u3
    """)?;
    
    // This is still not entirely correct, since u3 can return users which u1 already follow. As a last step, we need to expand the WHERE Clause:
    conn.query("""
        MATCH (u1:User)-[f1:FOLLOWS]->(u2:User)-[f2:FOLLOWS]->(u3:User)
        WHERE u1.username = 'epicking81'
        AND NOT (u1)-[:FOLLOWS]->(u3)
        RETURN u3
    """)?;

    // Querying for stats by aggregation, giving the number of people a user follows:
    // Similar to above, we wish to first specify the relationship. In this case, we want to know how many people a specific user follows:
    conn.query("""
        MATCH (u1:User)-[f:FOLLOWS]->(u2:User)
        WHERE u1.username = 'epicking81'
        RETURN u2
    """)?;

    // The previous query will return the list of users our user follows. We can alter the query to use aggregation to return the count instead:
    conn.query("""
        MATCH (u1:User)-[f:FOLLOWS]->(u2:User)
        WHERE u1.username = 'epicking81'
        RETURN count(u2)
    """)?;

    // This is extremely useful in multiple scenarios! Here are some more examples:

    // 1. Querying for average like count of a user:
    conn.query("""
        MATCH (u1:User)-[p:POSTS]->(p2:User_Post)
        WHERE u1.username = 'epicking81'
        RETURN avg(p2.like_count)
    """)?;

    // 2. Querying for max like count of a user:
    conn.query("""
        MATCH (u1:User)-[p:POSTS]->(p2:User_Post)
        WHERE u1.username = 'epicking81'
        RETURN max(p2.like_count)
    """)?;

    // Querying for shortest path
    // We can use recursive matching to find paths between nodes, an example of this showing the shortest length between two users:
    conn.query("""
        MATCH (u1:user)-[f:FOLLOWS* SHORTEST 1..4]->(u2:User)
        WHERE u1.username = 'silentguy245' AND u2.username = 'epicwolf202'
        RETURN length(f) AS length;
    """)?;

    // Recommendation page for user:
    conn.query("""
        MATCH (u1:user)-[f:FOLLOWS]->(u2:User)-[]->(p:User_Post)
        WHERE p.post_date > "2022-01-01" AND u1.username = 'fastgirl798'
        RETURN p.*
        ORDER BY p.like_count DESC LIMIT 10;
    """)?;
}

prrao87 · 2025-02-05T04:15:38Z

This is a good starting point! Some thoughts:

Rename User_Post to Post. It's okay to call this node Post and have a POSTS relationship because we follow a naming convention for nodes/rels and it's easy to distinguish between them as we read the queries
The queries could be reformulated based on questions we're trying to ask about the data, for example:
- Q1: Which user has the most followers, and how many are followers do they have?
- Q2: Which user follows the most people, and how many users do they follow?
- Q3: What is the shortest path between user A and user B?
- Q4: How many 3-hop paths exist between user C and user D that pass through user A?

Along those lines. I'm not fully sure I follow the recommendation logic, but maybe flesh out those queries more.

Also, we need to think about how the output results are formatted and displayed so that we can explain them. Maybe having the COPY logic in one file and the queries run in another file make sense?

sdht0 · 2025-02-05T05:29:08Z

we need to think about how the output results are formatted and displayed so that we can explain them.

Related, maybe we can show how to use the output in further processing, such as using them in other queries or exporting them in other formats?

Also, we can show how to perform parameterized queries.

prrao87 · 2025-02-05T13:11:27Z

OMG, yes, @sdht0 thanks for that callout - we totally should show parameterized queries ("prepared statements"). Please find a way to work that in @WWW0030 .

Place a new markdown section under the "Tutorials" section in the docs. When you make the PR, make it to the dev branch so that I can work on the organization of the page better after the 0.8.0 release.

WWW0030 self-assigned this Feb 3, 2025

prrao87 self-assigned this Feb 3, 2025

prrao87 pinned this issue Feb 5, 2025

prrao87 linked a pull request Feb 5, 2025 that will close this issue

🚧 [WIP]: Add new structure and docs for tutorials #359

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve tutorial documentation #353

Improve tutorial documentation #353

prrao87 commented Feb 3, 2025 •

edited

Loading

WWW0030 commented Feb 3, 2025

WWW0030 commented Feb 4, 2025 •

edited by prrao87

Loading

prrao87 commented Feb 5, 2025 •

edited

Loading

sdht0 commented Feb 5, 2025 •

edited

Loading

prrao87 commented Feb 5, 2025

Improve tutorial documentation #353

Improve tutorial documentation #353

Comments

prrao87 commented Feb 3, 2025 • edited Loading

Subtask 1

Subtask 2

WWW0030 commented Feb 3, 2025

Dataset: Twitter community

Nodes:

Relations:

Queries

Other suggested queries:

WWW0030 commented Feb 4, 2025 • edited by prrao87 Loading

prrao87 commented Feb 5, 2025 • edited Loading

sdht0 commented Feb 5, 2025 • edited Loading

prrao87 commented Feb 5, 2025

prrao87 commented Feb 3, 2025 •

edited

Loading

WWW0030 commented Feb 4, 2025 •

edited by prrao87

Loading

prrao87 commented Feb 5, 2025 •

edited

Loading

sdht0 commented Feb 5, 2025 •

edited

Loading