Skip to content

Allow WAL replaying on some corrupted WAL files (e.g. after DB was killed) #5111

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Mar 25, 2025

Conversation

royi-luo
Copy link
Contributor

@royi-luo royi-luo commented Mar 24, 2025

Description

After a kuzu process is killed, the WAL may be in an incomplete state. This PR stops an error from being reported if we are replaying an incomplete WAL. Instead the replaying logic will be:

for each WAL record:
  record = WAL::deserializeRecord() // if this throws stop replaying
  WAL::replayRecord(record) // if this throws we continue replaying as the next record should be a rollback

Fixes #5016

Contributor agreement

@royi-luo royi-luo self-assigned this Mar 24, 2025
Copy link

Benchmark Result

Master commit hash: 210199d806759092b43103925312df9fd20fdfc6
Branch commit hash: d70da8b1e7e7a5451ff807997676642aaf905f81

Query Group Query Name Mean Time - Commit (ms) Mean Time - Master (ms) Diff
aggregation q24 726.20 723.78 2.42 (0.33%)
aggregation q28 6540.81 6568.87 -28.05 (-0.43%)
filter q14 131.00 124.49 6.51 (5.23%)
filter q15 127.06 127.32 -0.26 (-0.21%)
filter q16 342.18 338.03 4.15 (1.23%)
filter q17 445.78 445.33 0.45 (0.10%)
filter q18 1945.56 1925.69 19.87 (1.03%)
filter zonemap-node 89.92 89.25 0.67 (0.75%)
filter zonemap-node-lhs-cast 89.36 89.29 0.07 (0.08%)
filter zonemap-node-null 89.76 88.59 1.17 (1.32%)
filter zonemap-rel 5761.30 5766.97 -5.68 (-0.10%)
fixed_size_expr_evaluator q07 686.43 681.89 4.55 (0.67%)
fixed_size_expr_evaluator q08 969.13 964.09 5.04 (0.52%)
fixed_size_expr_evaluator q09 966.49 957.04 9.44 (0.99%)
fixed_size_expr_evaluator q10 262.32 257.36 4.97 (1.93%)
fixed_size_expr_evaluator q11 263.99 255.30 8.69 (3.40%)
fixed_size_expr_evaluator q12 240.59 234.04 6.56 (2.80%)
fixed_size_expr_evaluator q13 1571.71 1563.05 8.66 (0.55%)
fixed_size_seq_scan q23 121.42 111.30 10.12 (9.09%)
join q29 757.67 763.08 -5.41 (-0.71%)
join q30 1681.31 1739.36 -58.05 (-3.34%)
join q31 5.95 5.58 0.37 (6.61%)
join SelectiveTwoHopJoin 48.97 56.99 -8.02 (-14.08%)
ldbc_snb_ic q35 10.45 9.64 0.81 (8.37%)
ldbc_snb_ic q36 99.50 86.13 13.37 (15.52%)
ldbc_snb_is q32 4.12 4.50 -0.39 (-8.63%)
ldbc_snb_is q33 10.90 12.59 -1.69 (-13.43%)
ldbc_snb_is q34 1.21 1.20 0.02 (1.28%)
multi-rel multi-rel-large-scan 1682.90 1681.41 1.49 (0.09%)
multi-rel multi-rel-lookup 9.63 10.22 -0.59 (-5.77%)
multi-rel multi-rel-small-scan 202.97 202.27 0.70 (0.35%)
order_by q25 127.67 130.80 -3.13 (-2.39%)
order_by q26 443.88 465.04 -21.16 (-4.55%)
order_by q27 1377.72 1397.49 -19.78 (-1.42%)
recursive_join recursive-join-bidirection 303.27 259.92 43.35 (16.68%)
recursive_join recursive-join-dense 7176.93 7163.20 13.73 (0.19%)
recursive_join recursive-join-path 23171.52 23129.25 42.28 (0.18%)
recursive_join recursive-join-sparse 626.06 628.62 -2.56 (-0.41%)
recursive_join recursive-join-trail 7088.53 7054.63 33.90 (0.48%)
scan_after_filter q01 173.98 167.55 6.43 (3.84%)
scan_after_filter q02 160.52 154.12 6.40 (4.15%)
shortest_path_ldbc100 q37 91.86 79.26 12.60 (15.90%)
shortest_path_ldbc100 q38 309.73 323.32 -13.59 (-4.20%)
shortest_path_ldbc100 q39 63.29 64.44 -1.16 (-1.79%)
shortest_path_ldbc100 q40 394.83 406.29 -11.45 (-2.82%)
var_size_expr_evaluator q03 2094.08 2124.17 -30.09 (-1.42%)
var_size_expr_evaluator q04 2256.68 2192.49 64.19 (2.93%)
var_size_expr_evaluator q05 2705.33 2681.42 23.91 (0.89%)
var_size_expr_evaluator q06 1347.93 1334.77 13.16 (0.99%)
var_size_seq_scan q19 1423.70 1434.33 -10.62 (-0.74%)
var_size_seq_scan q20 2723.94 2753.16 -29.22 (-1.06%)
var_size_seq_scan q21 2276.77 2255.38 21.39 (0.95%)
var_size_seq_scan q22 125.27 126.87 -1.60 (-1.26%)

@royi-luo royi-luo marked this pull request as ready for review March 24, 2025 20:26
@royi-luo royi-luo requested a review from ray6080 March 24, 2025 20:27
Copy link

codecov bot commented Mar 24, 2025

Codecov Report

Attention: Patch coverage is 60.00000% with 2 lines in your changes missing coverage. Please review.

Project coverage is 87.61%. Comparing base (06d703b) to head (0d74cb8).
Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
src/storage/wal_replayer.cpp 50.00% 2 Missing ⚠️

❌ Your patch status has failed because the patch coverage (60.00%) is below the target coverage (80.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #5111   +/-   ##
=======================================
  Coverage   87.61%   87.61%           
=======================================
  Files        1403     1403           
  Lines       63594    63596    +2     
  Branches     7522     7522           
=======================================
+ Hits        55715    55720    +5     
+ Misses       7850     7847    -3     
  Partials       29       29           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@royi-luo royi-luo force-pushed the royi/wal-replay-after-kill branch from 7cc3f12 to c336b2e Compare March 25, 2025 19:07
Copy link

Benchmark Result

Master commit hash: 7edebcadb3337483422e3f1ce1404b117c8fd77b
Branch commit hash: a672cb2723d072613f5285a7108105fbe4a4ff60

Query Group Query Name Mean Time - Commit (ms) Mean Time - Master (ms) Diff
aggregation q24 784.07 731.79 52.28 (7.14%)
aggregation q28 6560.66 6574.00 -13.34 (-0.20%)
filter q14 134.36 134.67 -0.31 (-0.23%)
filter q15 135.49 134.10 1.40 (1.04%)
filter q16 351.01 350.67 0.34 (0.10%)
filter q17 454.62 453.56 1.06 (0.23%)
filter q18 1951.93 1906.92 45.01 (2.36%)
filter zonemap-node 97.57 97.87 -0.30 (-0.30%)
filter zonemap-node-lhs-cast 98.67 96.60 2.07 (2.15%)
filter zonemap-node-null 97.79 96.62 1.18 (1.22%)
filter zonemap-rel 5865.79 5753.47 112.32 (1.95%)
fixed_size_expr_evaluator q07 695.46 697.09 -1.64 (-0.23%)
fixed_size_expr_evaluator q08 975.11 983.58 -8.47 (-0.86%)
fixed_size_expr_evaluator q09 974.06 984.57 -10.52 (-1.07%)
fixed_size_expr_evaluator q10 269.15 271.69 -2.54 (-0.93%)
fixed_size_expr_evaluator q11 271.28 272.21 -0.93 (-0.34%)
fixed_size_expr_evaluator q12 249.06 248.58 0.49 (0.20%)
fixed_size_expr_evaluator q13 1578.40 1578.07 0.33 (0.02%)
fixed_size_seq_scan q23 123.08 125.40 -2.32 (-1.85%)
join q29 739.56 746.41 -6.85 (-0.92%)
join q30 1715.10 1558.35 156.75 (10.06%)
join q31 6.69 6.50 0.18 (2.83%)
join SelectiveTwoHopJoin 47.27 54.34 -7.07 (-13.02%)
ldbc_snb_ic q35 10.45 9.34 1.11 (11.86%)
ldbc_snb_ic q36 95.63 99.66 -4.03 (-4.04%)
ldbc_snb_is q32 4.35 3.99 0.35 (8.87%)
ldbc_snb_is q33 12.97 14.53 -1.56 (-10.73%)
ldbc_snb_is q34 1.46 1.29 0.17 (13.09%)
multi-rel multi-rel-large-scan 1766.21 1792.90 -26.69 (-1.49%)
multi-rel multi-rel-lookup 11.00 11.73 -0.72 (-6.15%)
multi-rel multi-rel-small-scan 186.78 208.53 -21.74 (-10.43%)
order_by q25 138.91 137.66 1.24 (0.90%)
order_by q26 453.93 454.93 -1.01 (-0.22%)
order_by q27 1400.61 1403.32 -2.72 (-0.19%)
recursive_join recursive-join-bidirection 290.96 293.48 -2.51 (-0.86%)
recursive_join recursive-join-dense 7189.88 7165.76 24.12 (0.34%)
recursive_join recursive-join-path 23032.37 23311.32 -278.95 (-1.20%)
recursive_join recursive-join-sparse 633.28 631.40 1.88 (0.30%)
recursive_join recursive-join-trail 7074.92 7076.69 -1.77 (-0.03%)
scan_after_filter q01 179.26 177.42 1.84 (1.04%)
scan_after_filter q02 162.85 162.41 0.44 (0.27%)
shortest_path_ldbc100 q37 85.89 87.54 -1.65 (-1.88%)
shortest_path_ldbc100 q38 307.39 413.79 -106.40 (-25.71%)
shortest_path_ldbc100 q39 52.90 59.12 -6.22 (-10.52%)
shortest_path_ldbc100 q40 387.45 516.08 -128.62 (-24.92%)
var_size_expr_evaluator q03 2113.82 2106.07 7.75 (0.37%)
var_size_expr_evaluator q04 2240.66 2241.25 -0.59 (-0.03%)
var_size_expr_evaluator q05 2612.98 2612.22 0.76 (0.03%)
var_size_expr_evaluator q06 1371.22 1377.74 -6.52 (-0.47%)
var_size_seq_scan q19 1442.98 1441.49 1.49 (0.10%)
var_size_seq_scan q20 2700.69 2708.59 -7.90 (-0.29%)
var_size_seq_scan q21 2282.40 2282.96 -0.56 (-0.02%)
var_size_seq_scan q22 128.00 128.98 -0.97 (-0.75%)

Copy link
Contributor

@ray6080 ray6080 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@royi-luo royi-luo force-pushed the royi/wal-replay-after-kill branch from 8a4daa5 to f5e5ddd Compare March 25, 2025 20:29
Copy link

Benchmark Result

Master commit hash: 7edebcadb3337483422e3f1ce1404b117c8fd77b
Branch commit hash: ab87c40101557929517788f23f19c89ed3a71dd9

Query Group Query Name Mean Time - Commit (ms) Mean Time - Master (ms) Diff
aggregation q24 734.54 731.79 2.75 (0.38%)
aggregation q28 6554.12 6574.00 -19.88 (-0.30%)
filter q14 132.63 134.67 -2.04 (-1.51%)
filter q15 136.08 134.10 1.98 (1.48%)
filter q16 355.49 350.67 4.82 (1.37%)
filter q17 461.25 453.56 7.68 (1.69%)
filter q18 1915.07 1906.92 8.15 (0.43%)
filter zonemap-node 96.47 97.87 -1.39 (-1.42%)
filter zonemap-node-lhs-cast 96.91 96.60 0.31 (0.32%)
filter zonemap-node-null 97.53 96.62 0.91 (0.94%)
filter zonemap-rel 5766.87 5753.47 13.40 (0.23%)
fixed_size_expr_evaluator q07 692.43 697.09 -4.66 (-0.67%)
fixed_size_expr_evaluator q08 982.12 983.58 -1.46 (-0.15%)
fixed_size_expr_evaluator q09 978.41 984.57 -6.17 (-0.63%)
fixed_size_expr_evaluator q10 269.37 271.69 -2.32 (-0.85%)
fixed_size_expr_evaluator q11 269.65 272.21 -2.56 (-0.94%)
fixed_size_expr_evaluator q12 249.20 248.58 0.62 (0.25%)
fixed_size_expr_evaluator q13 1577.94 1578.07 -0.13 (-0.01%)
fixed_size_seq_scan q23 127.45 125.40 2.05 (1.63%)
join q29 701.00 746.41 -45.41 (-6.08%)
join q30 1646.16 1558.35 87.81 (5.63%)
join q31 7.38 6.50 0.87 (13.39%)
join SelectiveTwoHopJoin 56.23 54.34 1.88 (3.46%)
ldbc_snb_ic q35 6.90 9.34 -2.44 (-26.16%)
ldbc_snb_ic q36 71.61 99.66 -28.05 (-28.14%)
ldbc_snb_is q32 5.80 3.99 1.80 (45.18%)
ldbc_snb_is q33 11.21 14.53 -3.31 (-22.82%)
ldbc_snb_is q34 1.24 1.29 -0.05 (-3.64%)
multi-rel multi-rel-large-scan 1695.58 1792.90 -97.32 (-5.43%)
multi-rel multi-rel-lookup 11.16 11.73 -0.56 (-4.79%)
multi-rel multi-rel-small-scan 215.58 208.53 7.06 (3.39%)
order_by q25 138.92 137.66 1.26 (0.91%)
order_by q26 458.07 454.93 3.14 (0.69%)
order_by q27 1399.86 1403.32 -3.47 (-0.25%)
recursive_join recursive-join-bidirection 313.30 293.48 19.82 (6.75%)
recursive_join recursive-join-dense 7152.65 7165.76 -13.11 (-0.18%)
recursive_join recursive-join-path 23275.45 23311.32 -35.87 (-0.15%)
recursive_join recursive-join-sparse 629.56 631.40 -1.84 (-0.29%)
recursive_join recursive-join-trail 7068.69 7076.69 -8.01 (-0.11%)
scan_after_filter q01 178.66 177.42 1.24 (0.70%)
scan_after_filter q02 164.76 162.41 2.35 (1.45%)
shortest_path_ldbc100 q37 95.34 87.54 7.80 (8.91%)
shortest_path_ldbc100 q38 335.24 413.79 -78.54 (-18.98%)
shortest_path_ldbc100 q39 56.77 59.12 -2.35 (-3.97%)
shortest_path_ldbc100 q40 404.56 516.08 -111.52 (-21.61%)
var_size_expr_evaluator q03 2120.91 2106.07 14.84 (0.70%)
var_size_expr_evaluator q04 2229.95 2241.25 -11.30 (-0.50%)
var_size_expr_evaluator q05 2675.01 2612.22 62.79 (2.40%)
var_size_expr_evaluator q06 1385.56 1377.74 7.82 (0.57%)
var_size_seq_scan q19 1451.01 1441.49 9.52 (0.66%)
var_size_seq_scan q20 2739.46 2708.59 30.87 (1.14%)
var_size_seq_scan q21 2276.15 2282.96 -6.81 (-0.30%)
var_size_seq_scan q22 131.05 128.98 2.07 (1.61%)

@royi-luo royi-luo merged commit a30c532 into master Mar 25, 2025
27 of 28 checks passed
@royi-luo royi-luo deleted the royi/wal-replay-after-kill branch March 25, 2025 22:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Feature: Graceful stopping and reloading if connection to database is stopped midquery
2 participants