Skip to content

Fix double spares for failed vdev #17231

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

tonyhutter
Copy link
Contributor

@tonyhutter tonyhutter commented Apr 9, 2025

Motivation and Context

Closes: #16547

Description

It's possible for two spares to get attached to a single failed vdev. This happens when you have a failed disk that is spared, and then you replace the failed disk with a new disk, but during the resilver the new disk fails, and ZED kicks in a spare for the failed new disk. This commit checks for that condition and disallows it.

Here's an example of what the double spares looks like:

 NAME                                            STATE     READ WRITE CKSUM
  tank2                                           DEGRADED     0     0     0
    draid2:6d:10c:2s-0                            DEGRADED     0     0     0
      scsi-0QEMU_QEMU_HARDDISK_d1                 ONLINE       0     0     0
      scsi-0QEMU_QEMU_HARDDISK_d2                 ONLINE       0     0     0
      scsi-0QEMU_QEMU_HARDDISK_d3                 ONLINE       0     0     0
      scsi-0QEMU_QEMU_HARDDISK_d4                 ONLINE       0     0     0
      scsi-0QEMU_QEMU_HARDDISK_d5                 ONLINE       0     0     0
      scsi-0QEMU_QEMU_HARDDISK_d6                 ONLINE       0     0     0
      scsi-0QEMU_QEMU_HARDDISK_d7                 ONLINE       0     0     0
      scsi-0QEMU_QEMU_HARDDISK_d8                 ONLINE       0     0     0
      scsi-0QEMU_QEMU_HARDDISK_d9                 ONLINE       0     0     0
      spare-9                                     DEGRADED     0     0     0
        replacing-0                               DEGRADED     0    93     0
          scsi-0QEMU_QEMU_HARDDISK_d10-part1/old  UNAVAIL      0     0     0
          spare-1                                 DEGRADED     0     0     0
            scsi-0QEMU_QEMU_HARDDISK_d10          REMOVED      0     0     0
            draid2-0-0                            ONLINE       0     0     0
        draid2-0-1                                ONLINE       0     0     0
  spares
    draid2-0-0                                    INUSE     currently in use
    draid2-0-1                                    INUSE     currently in use

How Has This Been Tested?

I was able to reproduce the issue in a VM for both traditional spares and draid spares, and confirmed this PR doesn't allow the 2nd spare to get attached.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

@amotin
Copy link
Member

amotin commented Apr 9, 2025

  [ 1990.862841] ZTS run /usr/share/zfs/zfs-tests/tests/functional/cli_root/zpool_resilver/zpool_resilver_restart
  [ 1993.907076] BUG: kernel NULL pointer dereference, address: 0000000000000060
  [ 1993.909224] #PF: supervisor read access in kernel mode
  [ 1993.910950] #PF: error_code(0x0000) - not-present page
  [ 1993.912642] PGD 0 P4D 0 
  [ 1993.913675] Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI
  [ 1993.915470] CPU: 0 UID: 0 PID: 153664 Comm: zpool Tainted: P           OE      6.13.9-100.fc40.x86_64 #1
  [ 1993.918325] Tainted: [P]=PROPRIETARY_MODULE, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
  [ 1993.920732] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
  [ 1993.923557] RIP: 0010:spa_vdev_new_spare_would_cause_double_spares+0xbe/0x1d0 [zfs]
  [ 1993.929329] Code: 10 48 8d 54 24 08 48 c7 44 24 08 00 00 00 00 48 c7 c6 9f c5 4b c1 48 8b 38 e8 fe 85 e8 ff 48 85 db 74 b8 48 8b 83 80 00 00 00 <48> 8b 40 60 48 85 c0 74 a8 48 8b 53 60 48 85 d2 74 9f 48 8d 9a a8
  [ 1993.935519] RSP: 0018:ffffb5e3c2c0bc78 EFLAGS: 00010286
  [ 1993.937508] RAX: 0000000000000000 RBX: ffff888440df0000 RCX: 00000000096a6f85
  [ 1993.939826] RDX: 0000000000000074 RSI: ffffffffc14bc59f RDI: 0000000000000000
  [ 1993.943320] RBP: ffff888463dfeaa0 R08: ffffb5e3c2c0bc80 R09: 0000000000000010
  [ 1993.945647] R10: ffffb5e3c2c0bb08 R11: 0000000000000008 R12: ffff8884407b0000
  [ 1993.947845] R13: 0000000000000000 R14: ffff888463dfeaa0 R15: ffff888440df0000
  [ 1993.950195] FS:  00007fbd89c26940(0000) GS:ffff888577c00000(0000) knlGS:0000000000000000
  [ 1993.952936] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [ 1993.954827] CR2: 0000000000000060 CR3: 000000017837a003 CR4: 0000000000370ef0
  [ 1993.957111] Call Trace:
  [ 1993.958217]  <TASK>
  [ 1993.959019]  ? srso_alias_return_thunk+0x5/0xfbef5
  [ 1993.960802]  ? show_trace_log_lvl+0x255/0x2f0
  [ 1993.962598]  ? show_trace_log_lvl+0x255/0x2f0
  [ 1993.964351]  ? spa_vdev_attach+0x258/0xb40 [zfs]
  [ 1993.966511]  ? __die_body.cold+0x8/0x12
  [ 1993.968050]  ? page_fault_oops+0x148/0x160
  [ 1993.969626]  ? exc_page_fault+0x7e/0x180
  [ 1993.971129]  ? asm_exc_page_fault+0x26/0x30
  [ 1993.972754]  ? spa_vdev_new_spare_would_cause_double_spares+0xbe/0x1d0 [zfs]
  [ 1993.978891]  ? spa_vdev_new_spare_would_cause_double_spares+0xb2/0x1d0 [zfs]
  [ 1993.982597]  spa_vdev_attach+0x258/0xb40 [zfs]
  [ 1993.984475]  ? srso_alias_return_thunk+0x5/0xfbef5
  [ 1993.986105]  zfs_ioc_vdev_attach+0xd3/0xf0 [zfs]
  [ 1993.987943]  zfsdev_ioctl_common+0x5a0/0x6b0 [zfs]
  [ 1993.989785]  ? srso_alias_return_thunk+0x5/0xfbef5
  [ 1993.991443]  zfsdev_ioctl+0x53/0xe0 [zfs]
  [ 1993.993175]  __x64_sys_ioctl+0x97/0xc0
  [ 1993.994685]  do_syscall_64+0x82/0x160
  [ 1993.996079]  ? srso_alias_return_thunk+0x5/0xfbef5
  [ 1993.997788]  ? __memcg_slab_free_hook+0x11a/0x170
  [ 1993.999313]  ? srso_alias_return_thunk+0x5/0xfbef5
  [ 1994.000568]  ? srso_alias_return_thunk+0x5/0xfbef5
  [ 1994.001777]  ? kmem_cache_free+0x3f0/0x450
  [ 1994.002869]  ? srso_alias_return_thunk+0x5/0xfbef5
  [ 1994.004079]  ? srso_alias_return_thunk+0x5/0xfbef5
  [ 1994.005293]  ? syscall_exit_to_user_mode+0x10/0x210
  [ 1994.006542]  ? srso_alias_return_thunk+0x5/0xfbef5
  [ 1994.007744]  ? do_syscall_64+0x8e/0x160
  [ 1994.008662]  ? exc_page_fault+0x7e/0x180
  [ 1994.012716]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
  [ 1994.015138] RIP: 0033:0x7fbd8a4e58ad
  [ 1994.016756] Code: 04 25 28 00 00 00 48 89 45 c8 31 c0 48 8d 45 10 c7 45 b0 10 00 00 00 48 89 45 b8 48 8d 45 d0 48 89 45 c0 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 1a 48 8b 45 c8 64 48 2b 04 25 28 00 00 00
  [ 1994.022264] RSP: 002b:00007ffdc42dc3f0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
  [ 1994.024071] RAX: ffffffffffffffda RBX: 00005645dcf733e0 RCX: 00007fbd8a4e58ad
  [ 1994.025765] RDX: 00007ffdc42dc900 RSI: 0000000000005a0e RDI: 0000000000000003
  [ 1994.027649] RBP: 00007ffdc42dc440 R08: 0000000000000001 R09: 00005645dcf73b40
  [ 1994.029920] R10: 0000000000000130 R11: 0000000000000246 R12: 00005645dcf68930
  [ 1994.032262] R13: 0000000000000000 R14: 00007ffdc42dc900 R15: 00007ffdc42dc500
  [ 1994.034627]  </TASK>
  [ 1994.035532] Modules linked in: tls zfs(POE) spl(OE) xfs isofs binfmt_misc intel_rapl_msr intel_rapl_common kvm_amd vfat virtio_net fat kvm i2c_i801 i2c_smbus net_failover failover virtio_balloon pktcdvd joydev nfsd auth_rpcgss nfs_acl lockd grace nfs_localio fuse loop sunrpc nfnetlink zram lz4hc_compress lz4_compress crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3 sha256_ssse3 virtio_console sha1_ssse3 virtio_blk serio_raw qemu_fw_cfg [last unloaded: scsi_debug]
  [ 1994.049482] CR2: 0000000000000060
  [ 1994.050731] ---[ end trace 0000000000000000 ]---
  [ 1994.052346] RIP: 0010:spa_vdev_new_spare_would_cause_double_spares+0xbe/0x1d0 [zfs]
  [ 1994.055209] Code: 10 48 8d 54 24 08 48 c7 44 24 08 00 00 00 00 48 c7 c6 9f c5 4b c1 48 8b 38 e8 fe 85 e8 ff 48 85 db 74 b8 48 8b 83 80 00 00 00 <48> 8b 40 60 48 85 c0 74 a8 48 8b 53 60 48 85 d2 74 9f 48 8d 9a a8
  [ 1994.061341] RSP: 0018:ffffb5e3c2c0bc78 EFLAGS: 00010286
  [ 1994.063219] RAX: 0000000000000000 RBX: ffff888440df0000 RCX: 00000000096a6f85
  [ 1994.065677] RDX: 0000000000000074 RSI: ffffffffc14bc59f RDI: 0000000000000000
  [ 1994.068124] RBP: ffff888463dfeaa0 R08: ffffb5e3c2c0bc80 R09: 0000000000000010
  [ 1994.070522] R10: ffffb5e3c2c0bb08 R11: 0000000000000008 R12: ffff8884407b0000
  [ 1994.072918] R13: 0000000000000000 R14: ffff888463dfeaa0 R15: ffff888440df0000
  [ 1994.075295] FS:  00007fbd89c26940(0000) GS:ffff888577c00000(0000) knlGS:0000000000000000
  [ 1994.078484] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [ 1994.081086] CR2: 0000000000000060 CR3: 000000017837a003 CR4: 0000000000370ef0
  [ 1994.084585] note: zpool[153664] exited with irqs disabled

@amotin amotin added the Status: Code Review Needed Ready for review and testing label Apr 11, 2025
Copy link
Member

@amotin amotin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the new place the function is called we already have the nvlist converted into actual vdevs. So while it might be a functional equivalent, I think it would be nice to be consistent with the code around and do the checks on the vdevs.

PS: Comments are good, but only while they fit the screen. ;)

@tonyhutter tonyhutter force-pushed the sparefix branch 2 times, most recently from b31d080 to 3ca520e Compare April 18, 2025 18:44
@tonyhutter
Copy link
Contributor Author

So while it might be a functional equivalent, I think it would be nice to be consistent with the code around and do the checks on the vdevs.

When I was originally testing this, for whatever reason, newvd->vdev_isspare was not getting set on the spare. That's why I was looking in the nvlist for whether or not the new disk is a spare. Now, I see it correcly getting set.. So maybe moving the spa_vdev_new_spare_would_cause_double_spares() check to farther down in the code fixed something? I dunno...

Anyway, my latest push does away with the nvlist check and uses newvd->vdev_isspare instead.

Copy link
Member

@amotin amotin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

It's possible for two spares to get attached to a single failed vdev.
This happens when you have a failed disk that is spared, and then you
replace the failed disk with a new disk, but during the resilver
the new disk fails, and ZED kicks in a spare for the failed new
disk.  This commit checks for that condition and disallows it.

Closes: openzfs#16547

Signed-off-by: Tony Hutter <[email protected]>
@amotin amotin requested review from ixhamza and akashb-22 April 25, 2025 15:21
@behlendorf behlendorf added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels Apr 25, 2025

/*
* To determine if this configuration would cause a double spare, we
* look at the vdev_op_type string of the parent vdev, and of the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
* look at the vdev_op_type string of the parent vdev, and of the
* look at the vdev_ops struct type of the parent vdev, and of the

* 4. New blank disk starts resilvering
* 5. While resilvering, new blank disk has IO errors and faults
* 6. 2nd spare is kicked in for new blank disk
* 7. At this point two spares are kicked in for the original disk1.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add this as a test case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Accepted Ready to integrate (reviewed, tested)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Two dRAID spares for one vdev
5 participants