Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-51717][SS] [RocksDB] Fix SST mismatch corruption that can happen for second snapshot created for a new query #50512

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

micheal-o
Copy link
Contributor

What changes were proposed in this pull request?

Fix error: Sst file size mismatch ... MANIFEST-000005 may be corrupted.

This is an edge case in SST file reuse that can only happen for the first ever RocksDB checkpoint if the following conditions happen:

  1. The first ever RocksDB checkpoint (e.g. for version 10) was created with x.sst, but not yet upload by maintenance
  2. The next batch using RocksDB at v10 fails and rolls back store to -1 (invalidates RocksDB)
  3. A new request to load RocksDB at v10 comes in, but v10 checkpoint is still not uploaded hence we have to start replaying changelog starting from checkpoint v0.
  4. We create a new v11 and new checkpoint with new x*.sst. v10 is now uploaded by maintenance. Then during upload of x*.sst for v11, we reuse x.sst DFS file, thinking it is the same as x*.sst.

The problem here is from step 3, the way the file manager loads v0 is different from how it loads other versions. During the load of other versions, when we delete an existing local file we also delete it from file mapping. But for v0, file manager just deletes the local dir and we missed clearing the file mapping in this case. Hence the old x.sst was still showing in the file mapping at step 4. We need to fix this and also add additional size check.

Why are the changes needed?

Can cause checkpoint corruption, hence the query will fail.

Does this PR introduce any user-facing change?

No

How was this patch tested?

New test included

Was this patch authored or co-authored using generative AI tooling?

No

Copy link
Contributor

@anishshri-db anishshri-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm thx

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants