Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: Graceful stopping and reloading if connection to database is stopped midquery #5016

Closed
DDoyle1066 opened this issue Mar 8, 2025 · 6 comments · Fixed by #5111
Closed
Assignees
Labels
bug Something isn't working

Comments

@DDoyle1066
Copy link

API

Python

Description

I am not sure if this comes under the heading of feature or bug. But one thing I have noticed (especially when running large queries) is that sometimes if I have written and tried to execute a query that is taking a long time and then I terminate the process, I am unable to connect back to the database and the database just hangs. I don't know what the cause of this is because sometimes when I terminate the process I am able to reconnect just fine; it is not clear how to fix the database to go back to a version that I can query from and I have to reload everything from scratch which is often time consuming.

@DDoyle1066 DDoyle1066 added the feature New features or missing components of existing features label Mar 8, 2025
@ray6080
Copy link
Contributor

ray6080 commented Mar 8, 2025

if I have written and tried to execute a query that is taking a long time and then I terminate the process, I am unable to connect back to the database and the database just hangs.

hi @DDoyle1066 Are you experiencing this in Python API? "terminate the process, I am unable to connect back to the database and the database just hangs" Do you mean the creation of Kuzu Database object hangs for you if you restart your process to connect to the same db directory?

@DDoyle1066
Copy link
Author

I have experienced it in both the python API and when running a query in kuzu explorer. The most recent case was that detaching and deleting nodes was taking a lot longer than I anticipated, I terminated the process to figure out how to do it faster and when I tried reconnecting with a fresh process running kuzu.Database("path/to/db") it just hung for a while.
If terminating the process will generally occassionally create a situation where you are unable to reconnect to the database (if it was in the middle of modifying data) then it would be nice to be able to connect to the database from a different process and run a query that halts all queries being run against the database in an orderly fashion.

@ray6080
Copy link
Contributor

ray6080 commented Mar 8, 2025

I have experienced it in both the python API and when running a query in kuzu explorer. The most recent case was that detaching and deleting nodes was taking a lot longer than I anticipated, I terminated the process to figure out how to do it faster and when I tried reconnecting with a fresh process running kuzu.Database("path/to/db") it just hung for a while. If terminating the process will generally occassionally create a situation where you are unable to reconnect to the database (if it was in the middle of modifying data) then it would be nice to be able to connect to the database from a different process and run a query that halts all queries being run against the database in an orderly fashion.

Yes, I agree. We should be able to stop gracefully and restart. This looks like a bug to me rather than a feature. We should look into this and get it fixed. Are you able to ship a reproducible example to us for debugging? That would help a lot!

@ray6080 ray6080 pinned this issue Mar 8, 2025
@DDoyle1066
Copy link
Author

DDoyle1066 commented Mar 8, 2025

The same dataset from this issue will reproduce this. I loaded the dataset until 2022 (this can be done by changing line 413 to or year in range(2009, 2023): and then ran:

partition_name = '2022q4_notes'
conn.execute(f"MATCH (n:Num) where n.partition_name= '{partition_name}' DETACH DELETE n")

Terminating the process midway gave me the error. Running it on a small set of data that does not take as long to load gives me this error in case that is helpful:

RuntimeError: Error during recovery: Runtime exception: Failed to replay wal record from WAL file. Error: Runtime exception: Reading past the end of the file /kuzu_db/.wal with size 143360 at offset 143360

@ray6080 ray6080 added bug Something isn't working and removed feature New features or missing components of existing features labels Mar 10, 2025
@ray6080 ray6080 self-assigned this Mar 10, 2025
@andyfengHKU andyfengHKU mentioned this issue Mar 17, 2025
45 tasks
@royi-luo
Copy link
Contributor

royi-luo commented Mar 25, 2025

Hi @DDoyle1066 just a heads up we made some changes that hopefully help with this issue, they will be available in the next nightly build (or the coming release):

  1. The python Connection class now has a interrupt() API. This can be called from a separate thread to interrupt an executing query. While it's a bit cumbersome, you could make queries interruptible by KeyboardInterrupts with something like:
def run_queries():
  # run all you queries here
  # this needs to be run in a separate thread because only the main python thread can detect signals
  # when done...
  done = True

with kuzu.Database(...) as db, kuzu.Connection(db) as conn:
  signal.signal(signal.SIGINT, lambda: conn.interrupt())
  t = threading.Thread(target=run_queries)
  t.start()
  while not done:
    time.sleep(1)
  t.join()

Edit: you need to busy loop in the main thread to give the interpreter opportunities to process signals

Alternatively if you use the new AsyncConnection class you can also cancel the task executing the query (not sure if it's documented anywhere yet, you can refer to this test as an example).

  1. Reloading the DB after terminating it will no longer give the error Failed to replay wal record from WAL file.... Unfortunately reloading will still take a long time as our WAL replaying is currently quite slow. We are still working on optimizing this (going to track this with WAL replaying optimization for incomplete transactions #5120) but for now I'd recommend explicitly interrupting queries via the API as that will avoid this issue altogether.

@DDoyle1066
Copy link
Author

@royi-luo Thanks! I will check if out and let you know if I have any issues

@ray6080 ray6080 unpinned this issue Mar 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants