Fix double spares for failed vdev #17231

tonyhutter · 2025-04-09T00:05:19Z

Motivation and Context

Description

It's possible for two spares to get attached to a single failed vdev. This happens when you have a failed disk that is spared, and then you replace the failed disk with a new disk, but during the resilver the new disk fails, and ZED kicks in a spare for the failed new disk. This commit checks for that condition and disallows it.

Here's an example of what the double spares looks like:

 NAME                                            STATE     READ WRITE CKSUM
  tank2                                           DEGRADED     0     0     0
    draid2:6d:10c:2s-0                            DEGRADED     0     0     0
      scsi-0QEMU_QEMU_HARDDISK_d1                 ONLINE       0     0     0
      scsi-0QEMU_QEMU_HARDDISK_d2                 ONLINE       0     0     0
      scsi-0QEMU_QEMU_HARDDISK_d3                 ONLINE       0     0     0
      scsi-0QEMU_QEMU_HARDDISK_d4                 ONLINE       0     0     0
      scsi-0QEMU_QEMU_HARDDISK_d5                 ONLINE       0     0     0
      scsi-0QEMU_QEMU_HARDDISK_d6                 ONLINE       0     0     0
      scsi-0QEMU_QEMU_HARDDISK_d7                 ONLINE       0     0     0
      scsi-0QEMU_QEMU_HARDDISK_d8                 ONLINE       0     0     0
      scsi-0QEMU_QEMU_HARDDISK_d9                 ONLINE       0     0     0
      spare-9                                     DEGRADED     0     0     0
        replacing-0                               DEGRADED     0    93     0
          scsi-0QEMU_QEMU_HARDDISK_d10-part1/old  UNAVAIL      0     0     0
          spare-1                                 DEGRADED     0     0     0
            scsi-0QEMU_QEMU_HARDDISK_d10          REMOVED      0     0     0
            draid2-0-0                            ONLINE       0     0     0
        draid2-0-1                                ONLINE       0     0     0
  spares
    draid2-0-0                                    INUSE     currently in use
    draid2-0-1                                    INUSE     currently in use

How Has This Been Tested?

I was able to reproduce the issue in a VM for both traditional spares and draid spares, and confirmed this PR doesn't allow the 2nd spare to get attached.

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Performance enhancement (non-breaking change which improves efficiency)
Code cleanup (non-breaking change which makes code smaller or more readable)
Breaking change (fix or feature that would cause existing functionality to change)
Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
Documentation (a change to man pages or other documentation)

Checklist:

My code follows the OpenZFS code style requirements.
I have updated the documentation accordingly.
I have read the contributing document.
I have added tests to cover my changes.
I have run the ZFS Test Suite with this change applied.
All commit messages are properly formatted and contain Signed-off-by.

amotin · 2025-04-09T14:52:59Z

  [ 1990.862841] ZTS run /usr/share/zfs/zfs-tests/tests/functional/cli_root/zpool_resilver/zpool_resilver_restart
  [ 1993.907076] BUG: kernel NULL pointer dereference, address: 0000000000000060
  [ 1993.909224] #PF: supervisor read access in kernel mode
  [ 1993.910950] #PF: error_code(0x0000) - not-present page
  [ 1993.912642] PGD 0 P4D 0 
  [ 1993.913675] Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI
  [ 1993.915470] CPU: 0 UID: 0 PID: 153664 Comm: zpool Tainted: P           OE      6.13.9-100.fc40.x86_64 #1
  [ 1993.918325] Tainted: [P]=PROPRIETARY_MODULE, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
  [ 1993.920732] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
  [ 1993.923557] RIP: 0010:spa_vdev_new_spare_would_cause_double_spares+0xbe/0x1d0 [zfs]
  [ 1993.929329] Code: 10 48 8d 54 24 08 48 c7 44 24 08 00 00 00 00 48 c7 c6 9f c5 4b c1 48 8b 38 e8 fe 85 e8 ff 48 85 db 74 b8 48 8b 83 80 00 00 00 <48> 8b 40 60 48 85 c0 74 a8 48 8b 53 60 48 85 d2 74 9f 48 8d 9a a8
  [ 1993.935519] RSP: 0018:ffffb5e3c2c0bc78 EFLAGS: 00010286
  [ 1993.937508] RAX: 0000000000000000 RBX: ffff888440df0000 RCX: 00000000096a6f85
  [ 1993.939826] RDX: 0000000000000074 RSI: ffffffffc14bc59f RDI: 0000000000000000
  [ 1993.943320] RBP: ffff888463dfeaa0 R08: ffffb5e3c2c0bc80 R09: 0000000000000010
  [ 1993.945647] R10: ffffb5e3c2c0bb08 R11: 0000000000000008 R12: ffff8884407b0000
  [ 1993.947845] R13: 0000000000000000 R14: ffff888463dfeaa0 R15: ffff888440df0000
  [ 1993.950195] FS:  00007fbd89c26940(0000) GS:ffff888577c00000(0000) knlGS:0000000000000000
  [ 1993.952936] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [ 1993.954827] CR2: 0000000000000060 CR3: 000000017837a003 CR4: 0000000000370ef0
  [ 1993.957111] Call Trace:
  [ 1993.958217]  <TASK>
  [ 1993.959019]  ? srso_alias_return_thunk+0x5/0xfbef5
  [ 1993.960802]  ? show_trace_log_lvl+0x255/0x2f0
  [ 1993.962598]  ? show_trace_log_lvl+0x255/0x2f0
  [ 1993.964351]  ? spa_vdev_attach+0x258/0xb40 [zfs]
  [ 1993.966511]  ? __die_body.cold+0x8/0x12
  [ 1993.968050]  ? page_fault_oops+0x148/0x160
  [ 1993.969626]  ? exc_page_fault+0x7e/0x180
  [ 1993.971129]  ? asm_exc_page_fault+0x26/0x30
  [ 1993.972754]  ? spa_vdev_new_spare_would_cause_double_spares+0xbe/0x1d0 [zfs]
  [ 1993.978891]  ? spa_vdev_new_spare_would_cause_double_spares+0xb2/0x1d0 [zfs]
  [ 1993.982597]  spa_vdev_attach+0x258/0xb40 [zfs]
  [ 1993.984475]  ? srso_alias_return_thunk+0x5/0xfbef5
  [ 1993.986105]  zfs_ioc_vdev_attach+0xd3/0xf0 [zfs]
  [ 1993.987943]  zfsdev_ioctl_common+0x5a0/0x6b0 [zfs]
  [ 1993.989785]  ? srso_alias_return_thunk+0x5/0xfbef5
  [ 1993.991443]  zfsdev_ioctl+0x53/0xe0 [zfs]
  [ 1993.993175]  __x64_sys_ioctl+0x97/0xc0
  [ 1993.994685]  do_syscall_64+0x82/0x160
  [ 1993.996079]  ? srso_alias_return_thunk+0x5/0xfbef5
  [ 1993.997788]  ? __memcg_slab_free_hook+0x11a/0x170
  [ 1993.999313]  ? srso_alias_return_thunk+0x5/0xfbef5
  [ 1994.000568]  ? srso_alias_return_thunk+0x5/0xfbef5
  [ 1994.001777]  ? kmem_cache_free+0x3f0/0x450
  [ 1994.002869]  ? srso_alias_return_thunk+0x5/0xfbef5
  [ 1994.004079]  ? srso_alias_return_thunk+0x5/0xfbef5
  [ 1994.005293]  ? syscall_exit_to_user_mode+0x10/0x210
  [ 1994.006542]  ? srso_alias_return_thunk+0x5/0xfbef5
  [ 1994.007744]  ? do_syscall_64+0x8e/0x160
  [ 1994.008662]  ? exc_page_fault+0x7e/0x180
  [ 1994.012716]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
  [ 1994.015138] RIP: 0033:0x7fbd8a4e58ad
  [ 1994.016756] Code: 04 25 28 00 00 00 48 89 45 c8 31 c0 48 8d 45 10 c7 45 b0 10 00 00 00 48 89 45 b8 48 8d 45 d0 48 89 45 c0 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 1a 48 8b 45 c8 64 48 2b 04 25 28 00 00 00
  [ 1994.022264] RSP: 002b:00007ffdc42dc3f0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
  [ 1994.024071] RAX: ffffffffffffffda RBX: 00005645dcf733e0 RCX: 00007fbd8a4e58ad
  [ 1994.025765] RDX: 00007ffdc42dc900 RSI: 0000000000005a0e RDI: 0000000000000003
  [ 1994.027649] RBP: 00007ffdc42dc440 R08: 0000000000000001 R09: 00005645dcf73b40
  [ 1994.029920] R10: 0000000000000130 R11: 0000000000000246 R12: 00005645dcf68930
  [ 1994.032262] R13: 0000000000000000 R14: 00007ffdc42dc900 R15: 00007ffdc42dc500
  [ 1994.034627]  </TASK>
  [ 1994.035532] Modules linked in: tls zfs(POE) spl(OE) xfs isofs binfmt_misc intel_rapl_msr intel_rapl_common kvm_amd vfat virtio_net fat kvm i2c_i801 i2c_smbus net_failover failover virtio_balloon pktcdvd joydev nfsd auth_rpcgss nfs_acl lockd grace nfs_localio fuse loop sunrpc nfnetlink zram lz4hc_compress lz4_compress crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3 sha256_ssse3 virtio_console sha1_ssse3 virtio_blk serio_raw qemu_fw_cfg [last unloaded: scsi_debug]
  [ 1994.049482] CR2: 0000000000000060
  [ 1994.050731] ---[ end trace 0000000000000000 ]---
  [ 1994.052346] RIP: 0010:spa_vdev_new_spare_would_cause_double_spares+0xbe/0x1d0 [zfs]
  [ 1994.055209] Code: 10 48 8d 54 24 08 48 c7 44 24 08 00 00 00 00 48 c7 c6 9f c5 4b c1 48 8b 38 e8 fe 85 e8 ff 48 85 db 74 b8 48 8b 83 80 00 00 00 <48> 8b 40 60 48 85 c0 74 a8 48 8b 53 60 48 85 d2 74 9f 48 8d 9a a8
  [ 1994.061341] RSP: 0018:ffffb5e3c2c0bc78 EFLAGS: 00010286
  [ 1994.063219] RAX: 0000000000000000 RBX: ffff888440df0000 RCX: 00000000096a6f85
  [ 1994.065677] RDX: 0000000000000074 RSI: ffffffffc14bc59f RDI: 0000000000000000
  [ 1994.068124] RBP: ffff888463dfeaa0 R08: ffffb5e3c2c0bc80 R09: 0000000000000010
  [ 1994.070522] R10: ffffb5e3c2c0bb08 R11: 0000000000000008 R12: ffff8884407b0000
  [ 1994.072918] R13: 0000000000000000 R14: ffff888463dfeaa0 R15: ffff888440df0000
  [ 1994.075295] FS:  00007fbd89c26940(0000) GS:ffff888577c00000(0000) knlGS:0000000000000000
  [ 1994.078484] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [ 1994.081086] CR2: 0000000000000060 CR3: 000000017837a003 CR4: 0000000000370ef0
  [ 1994.084585] note: zpool[153664] exited with irqs disabled

module/zfs/spa.c

amotin

At the new place the function is called we already have the nvlist converted into actual vdevs. So while it might be a functional equivalent, I think it would be nice to be consistent with the code around and do the checks on the vdevs.

PS: Comments are good, but only while they fit the screen. ;)

module/zfs/spa.c

tonyhutter · 2025-04-18T18:49:49Z

So while it might be a functional equivalent, I think it would be nice to be consistent with the code around and do the checks on the vdevs.

When I was originally testing this, for whatever reason, newvd->vdev_isspare was not getting set on the spare. That's why I was looking in the nvlist for whether or not the new disk is a spare. Now, I see it correcly getting set.. So maybe moving the spa_vdev_new_spare_would_cause_double_spares() check to farther down in the code fixed something? I dunno...

Anyway, my latest push does away with the nvlist check and uses newvd->vdev_isspare instead.

module/zfs/spa.c

amotin

Thanks.

module/zfs/spa.c

It's possible for two spares to get attached to a single failed vdev. This happens when you have a failed disk that is spared, and then you replace the failed disk with a new disk, but during the resilver the new disk fails, and ZED kicks in a spare for the failed new disk. This commit checks for that condition and disallows it. Closes: openzfs#16547 Signed-off-by: Tony Hutter <[email protected]>

behlendorf · 2025-04-25T17:31:11Z

module/zfs/spa.c

+
+	/*
+	 * To determine if this configuration would cause a double spare, we
+	 * look at the vdev_op_type string of the parent vdev, and of the


nit:

Suggested change

* look at the vdev_op_type string of the parent vdev, and of the

* look at the vdev_ops struct type of the parent vdev, and of the

behlendorf · 2025-04-25T17:32:56Z

module/zfs/spa.c

+ * 4. New blank disk starts resilvering
+ * 5. While resilvering, new blank disk has IO errors and faults
+ * 6. 2nd spare is kicked in for new blank disk
+ * 7. At this point two spares are kicked in for the original disk1.


We should add this as a test case.

tonyhutter mentioned this pull request Apr 9, 2025

Two dRAID spares for one vdev #16547

Open

ixhamza reviewed Apr 9, 2025

View reviewed changes

module/zfs/spa.c Outdated Show resolved Hide resolved

tonyhutter force-pushed the sparefix branch from c00c3bb to f2a0b31 Compare April 9, 2025 18:40

amotin reviewed Apr 10, 2025

View reviewed changes

module/zfs/spa.c Outdated Show resolved Hide resolved

amotin added the Status: Code Review Needed Ready for review and testing label Apr 11, 2025

akashb-22 approved these changes Apr 16, 2025

View reviewed changes

tonyhutter force-pushed the sparefix branch from f2a0b31 to 85ea780 Compare April 16, 2025 20:43

ixhamza approved these changes Apr 16, 2025

View reviewed changes

tonyhutter force-pushed the sparefix branch from 85ea780 to bbb12f6 Compare April 17, 2025 22:29

amotin reviewed Apr 18, 2025

View reviewed changes

module/zfs/spa.c Outdated Show resolved Hide resolved

tonyhutter force-pushed the sparefix branch 2 times, most recently from b31d080 to 3ca520e Compare April 18, 2025 18:44

amotin approved these changes Apr 21, 2025

View reviewed changes

module/zfs/spa.c Outdated Show resolved Hide resolved

tonyhutter force-pushed the sparefix branch from 3ca520e to a3490ce Compare April 23, 2025 22:03

amotin approved these changes Apr 24, 2025

View reviewed changes

module/zfs/spa.c Outdated Show resolved Hide resolved

tonyhutter force-pushed the sparefix branch from a3490ce to 527403a Compare April 24, 2025 16:37

amotin approved these changes Apr 24, 2025

View reviewed changes

amotin requested review from ixhamza and akashb-22 April 25, 2025 15:21

ixhamza approved these changes Apr 25, 2025

View reviewed changes

behlendorf approved these changes Apr 25, 2025

View reviewed changes

behlendorf added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels Apr 25, 2025

behlendorf reviewed Apr 25, 2025

View reviewed changes

akashb-22 approved these changes Apr 29, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix double spares for failed vdev #17231

Fix double spares for failed vdev #17231

tonyhutter commented Apr 9, 2025 •

edited

Loading

amotin commented Apr 9, 2025

amotin left a comment

tonyhutter commented Apr 18, 2025

amotin left a comment

behlendorf Apr 25, 2025

behlendorf Apr 25, 2025

	* look at the vdev_op_type string of the parent vdev, and of the
	* look at the vdev_ops struct type of the parent vdev, and of the

Fix double spares for failed vdev #17231

Are you sure you want to change the base?

Fix double spares for failed vdev #17231

Conversation

tonyhutter commented Apr 9, 2025 • edited Loading

Motivation and Context

Description

How Has This Been Tested?

Types of changes

Checklist:

amotin commented Apr 9, 2025

amotin left a comment

Choose a reason for hiding this comment

tonyhutter commented Apr 18, 2025

amotin left a comment

Choose a reason for hiding this comment

behlendorf Apr 25, 2025

Choose a reason for hiding this comment

behlendorf Apr 25, 2025

Choose a reason for hiding this comment

tonyhutter commented Apr 9, 2025 •

edited

Loading