-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Fix double spares for failed vdev #17231
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the new place the function is called we already have the nvlist converted into actual vdevs. So while it might be a functional equivalent, I think it would be nice to be consistent with the code around and do the checks on the vdevs.
PS: Comments are good, but only while they fit the screen. ;)
b31d080
to
3ca520e
Compare
When I was originally testing this, for whatever reason, Anyway, my latest push does away with the nvlist check and uses |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
It's possible for two spares to get attached to a single failed vdev. This happens when you have a failed disk that is spared, and then you replace the failed disk with a new disk, but during the resilver the new disk fails, and ZED kicks in a spare for the failed new disk. This commit checks for that condition and disallows it. Closes: openzfs#16547 Signed-off-by: Tony Hutter <[email protected]>
|
||
/* | ||
* To determine if this configuration would cause a double spare, we | ||
* look at the vdev_op_type string of the parent vdev, and of the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
* look at the vdev_op_type string of the parent vdev, and of the | |
* look at the vdev_ops struct type of the parent vdev, and of the |
* 4. New blank disk starts resilvering | ||
* 5. While resilvering, new blank disk has IO errors and faults | ||
* 6. 2nd spare is kicked in for new blank disk | ||
* 7. At this point two spares are kicked in for the original disk1. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should add this as a test case.
Motivation and Context
Closes: #16547
Description
It's possible for two spares to get attached to a single failed vdev. This happens when you have a failed disk that is spared, and then you replace the failed disk with a new disk, but during the resilver the new disk fails, and ZED kicks in a spare for the failed new disk. This commit checks for that condition and disallows it.
Here's an example of what the double spares looks like:
How Has This Been Tested?
I was able to reproduce the issue in a VM for both traditional spares and draid spares, and confirmed this PR doesn't allow the 2nd spare to get attached.
Types of changes
Checklist:
Signed-off-by
.