Skip to content

Filesystem resize skipped if original PVC is deleted when FilesystemResizePending but PV is retained #88683

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
xing-yang opened this issue Feb 29, 2020 · 13 comments · Fixed by kubernetes-csi/external-resizer#140 or #99326 · May be fixed by sunpa93/kubernetes#1 or sunpa93/external-resizer#1
Assignees
Labels
kind/bug Categorizes issue or PR as related to a bug. sig/storage Categorizes an issue or PR as relevant to SIG Storage. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@xing-yang
Copy link
Contributor

xing-yang commented Feb 29, 2020

What happened:

  • Create a PVC using a StorageClass with Retain policy and allowVolumeExpansion.
  • Create a Pod to use the PVC. Check the filesystem size.
  • Start volume expansion by modifying the PVC size. After PV and the associated volume is expanded to the new size, PVC size is unchanged and it is in FilesystemResizePending condition.
  • Delete the Pod.
  • Delete the PVC. PV and the associated volume remains due to Retain policy.
  • Remove "uid" field in ClaimRef of PV.
  • Create a new PVC to statically bind with the existing PV.
  • Create a Pod to use the new PVC. PVC requested size is set to the new size.
  • However, filesystem size remain unchanged.

What you expected to happen:
Filesystem should be resized.

How to reproduce it (as minimally and precisely as possible):

  1. Create a SC with Retain policy and allowVolumeExpansion set to true.
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: example-vanilla-block-sc
  namespace: kube-system
  annotations:
    storageclass.kubernetes.io/is-default-class: "false"
provisioner: csi.vsphere.vmware.com
allowVolumeExpansion: true
reclaimPolicy: Retain
parameters:
  1. Create a PVC using the SC.
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: example-vanilla-block-pvc2
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi
  storageClassName: example-vanilla-block-sc
# kubectl get pvc
NAME                         STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS               AGE
example-vanilla-block-pvc2   Bound    pvc-a94bcc93-5a93-11ea-bc63-0050568bf4d2   1Gi        RWO            example-vanilla-block-sc   28s
# kubectl get pv
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                                STORAGECLASS               REASON   AGE
pvc-a94bcc93-5a93-11ea-bc63-0050568bf4d2   1Gi        RWO            Retain           Bound    default/example-vanilla-block-pvc2   example-vanilla-block-sc            24s
  1. Create a Pod to use the PVC.
  2. Login to check the filesystem size.
kubectl exec -it example-vanilla-block-pod2 /bin/sh
/dev/sdb             ext4          999320      2568    927940   0% /mnt/volume1
  1. Delete the Pod.
  2. Increase the PVC size from 1 to 3G.
  3. PV is changed to the new expanded size but PVC remains the same size.
    PVC has FileSystemResizePending condition.
root@k8-master-430:/etc/kubernetes# kubectl get pv
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                                STORAGECLASS               REASON   AGE
pvc-a94bcc93-5a93-11ea-bc63-0050568bf4d2   3Gi        RWO            Retain           Bound    default/example-vanilla-block-pvc2   example-vanilla-block-sc            18m
root@k8-master-430:/etc/kubernetes# kubectl get pvc
NAME                         STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS               AGE
example-vanilla-block-pvc2   Bound    pvc-a94bcc93-5a93-11ea-bc63-0050568bf4d2   1Gi        RWO            example-vanilla-block-sc   18m
# kubectl describe pvc example-vanilla-block-pvc2
Name:          example-vanilla-block-pvc2
Namespace:     default
StorageClass:  example-vanilla-block-sc
Status:        Bound
Volume:        pvc-a94bcc93-5a93-11ea-bc63-0050568bf4d2
Labels:        <none>
Annotations:   kubectl.kubernetes.io/last-applied-configuration:
                 {"apiVersion":"v1","kind":"PersistentVolumeClaim","metadata":{"annotations":{},"name":"example-vanilla-block-pvc2","namespace":"default"},...
               pv.kubernetes.io/bind-completed: yes
               pv.kubernetes.io/bound-by-controller: yes
               volume.beta.kubernetes.io/storage-provisioner: csi.vsphere.vmware.com
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:      1Gi
Access Modes:  RWO
VolumeMode:    Filesystem
Conditions:
  Type                      Status  LastProbeTime                     LastTransitionTime                Reason  Message
  ----                      ------  -----------------                 ------------------                ------  -------
  FileSystemResizePending   True    Mon, 01 Jan 0001 00:00:00 +0000   Fri, 28 Feb 2020 17:52:58 -0800           Waiting for user to (re-)start a pod to finish file system resize of volume on node.
  1. Delete PVC. PV remains.
  2. Save PV config to a yaml file.
# kubectl get pv pvc-a94bcc93-5a93-11ea-bc63-0050568bf4d2 -o yaml > static-pv2.yaml
  1. Modify PV to remove "uid" field of ClaimRef in PV.
apiVersion: v1
kind: PersistentVolume
metadata:
  annotations:
    pv.kubernetes.io/provisioned-by: csi.vsphere.vmware.com
  creationTimestamp: "2020-02-29T01:34:49Z"
  finalizers:
  - kubernetes.io/pv-protection
  - external-attacher/csi-vsphere-vmware-com
  name: pvc-a94bcc93-5a93-11ea-bc63-0050568bf4d2
  resourceVersion: "46099"
  selfLink: /api/v1/persistentvolumes/pvc-a94bcc93-5a93-11ea-bc63-0050568bf4d2
  uid: acda8aa8-5a93-11ea-bc63-0050568bf4d2
spec:
  accessModes:
  - ReadWriteOnce
  capacity:
    storage: 3Gi
  claimRef:
    apiVersion: v1
    kind: PersistentVolumeClaim
    name: example-vanilla-block-pvc2
    namespace: default
    resourceVersion: "43306"
    uid:
  csi:
    driver: csi.vsphere.vmware.com
    fsType: ext4
    volumeAttributes:
      storage.kubernetes.io/csiProvisionerIdentity: 1582932805902-8081-csi.vsphere.vmware.com
      type: vSphere CNS Block Volume
    volumeHandle: 58e18582-965d-437e-accf-d9a4a084d43a
  persistentVolumeReclaimPolicy: Retain
  storageClassName: example-vanilla-block-sc
  volumeMode: Filesystem
status:
  phase: Released
# kubectl apply -f static-pv2.yaml
  1. Create a PVC to statically bind with PV.
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: example-vanilla-block-pvc2
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 3Gi
  volumeName: pvc-a94bcc93-5a93-11ea-bc63-0050568bf4d2
kubectl apply -f static-pvc2.yaml
  1. Create a Pod to use PVC.
  2. Check the filesystem. It is unchanged.
/dev/sdb             ext4          999320      2568    927940   0% /mnt/volume1

Anything else we need to know?:
Note: If the volume does not have a pre-existing filesystem before resize (without steps 3-5 above), filesystem will be created with the correct new size (at step 13 above).

Environment:

  • Kubernetes version (use kubectl version): v1.14.2 (Note: this is the setup I have currently, but this would happen in later versions such as 1.17 or later as well)
  • Cloud provider or hardware configuration: vSphere CSI Driver
  • OS (e.g: cat /etc/os-release):
NAME="Ubuntu"
VERSION="18.04 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04 LTS"
VERSION_ID="18.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=bionic
UBUNTU_CODENAME=bionic
  • Kernel (e.g. uname -a):
    Linux k8-master-430 4.15.0-20-generic No scopes #21-Ubuntu SMP Tue Apr 24 06:16:15 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
  • Install tools: kubeadm
  • Network plugin and version (if this is a network-related bug):
  • Others:
@xing-yang xing-yang added the kind/bug Categorizes issue or PR as related to a bug. label Feb 29, 2020
@k8s-ci-robot k8s-ci-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Feb 29, 2020
@xing-yang
Copy link
Contributor Author

/sig storage

@k8s-ci-robot k8s-ci-robot added sig/storage Categorizes an issue or PR as relevant to SIG Storage. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Feb 29, 2020
@xing-yang
Copy link
Contributor Author

/assign @gnufied

@gnufied
Copy link
Member

gnufied commented Apr 15, 2020

So basically the reason this happened is because kubelet relies on size skew between pv.spec.capacity vs pvc.status.capacity to check if a PVC requires node expansion. If you delete the PVC before node expansion could happen, that information is lost. I assume this should be somewhat of a corner case.

@gnufied
Copy link
Member

gnufied commented May 8, 2020

There are multiple ways this could be fixed. One option is to add a finalizer to the PVC and have it removed when expansion is complete.

Another option is - add an annotation to PV with old size after ControllerExpandVolume completes. Resize controller will remove the annotation once expansion is complete on the node when it receives pvc update. If user deletes the PVC while expansion is pending on the node, then pv-controller when binding this PV to a PVC will set pvc.status.capacity to value inside annotation rather than pv.spec.capacity. This will allow NodeExpandVolume to be called on the node and kubelet will update pvc.status.capacity with correct value.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 6, 2020
@kk-src
Copy link

kk-src commented Sep 3, 2020

/assign

@kk-src
Copy link

kk-src commented Sep 10, 2020

@gnufied @xing-yang - planning to take the annotation approach mentioned by Hemant above. Will come up with draft PR with this approach and test it out. We can discuss further there.

@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Oct 10, 2020
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

@k8s-ci-robot
Copy link
Contributor

@fejta-bot: Closing this issue.

In response to this:

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@gnufied
Copy link
Member

gnufied commented Jan 14, 2021

/reopen
/remove-lifecycle rotten

@k8s-ci-robot k8s-ci-robot reopened this Jan 14, 2021
@k8s-ci-robot
Copy link
Contributor

@gnufied: Reopened this issue.

In response to this:

/reopen
/remove-lifecycle rotten

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. and removed lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. labels Jan 14, 2021
@xing-yang
Copy link
Contributor Author

/triage accepted

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jan 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. sig/storage Categorizes an issue or PR as relevant to SIG Storage. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
5 participants