Skip to content

feat: Add PreferencePolicy as an environment variable option #2122

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 10, 2025

Conversation

jonathan-innis
Copy link
Member

@jonathan-innis jonathan-innis commented Apr 4, 2025

Fixes #N/A

Description

This change adds an environment variable option called PreferencePolicy with the following definitions:

  • Respect: Karpenter will attempt to respect all preferences as requirements. If Karpenter cannot simulate pods against the node given these requirements, it will begin removing preferences from the pod until it is able to schedule the pod. Respect can also block consolidation more frequently due to the way that the preference relaxation ordering is orchestrated.
  • Ignore: Karpenter will ignore all preferences when it performs its scheduling simulations. In general, this is a more efficient way to run Karpenter because Karpenter will have to give less effort to satisfying all of the preferences on the application pods. In addition, it's currently not guaranteed that even if Karpenter does respect all of the preferences, then they will still be respected by the kube-scheduler. In particular, when considering things like preferredDuringSchedulingIgnoredDuringExecution anti-affinity preferences and ScheduleAnyways topologySpreadConstraints, it's very timing-dependent and non-deterministic on whether or not the kube-scheduler will see the pods in time to respect the preferences.

CI Benchmark Testing Results

============== Generic Pods ==============
1 pods      1 nodes     162.759µs per scheduling      162.759µs per pod
50 pods     10 nodes    38.027579ms per scheduling    760.551µs per pod
100 pods    20 nodes    87.226431ms per scheduling    872.264µs per pod
500 pods    100 nodes   406.788638ms per scheduling   813.577µs per pod
1000 pods   200 nodes   589.950666ms per scheduling   589.95µs per pod
1500 pods   300 nodes   1.902119667s per scheduling   1.268079ms per pod
2000 pods   400 nodes   2.537366s per scheduling      1.268683ms per pod
5000 pods   1000 nodes  5.126445375s per scheduling   1.025289ms per pod
10000 pods  2000 nodes  20.038828125s per scheduling  2.003882ms per pod
20000 pods  4000 nodes  50.439955s per scheduling     2.521997ms per pod
============== Preference Pods ==============
4000 pods  4000 nodes  51.578981334s per scheduling  12.894745ms per pod
4000 pods  6 nodes     711.665062ms per scheduling   177.916µs per pod
scheduled 48151 against 12037 nodes in total in 2m15.715480943s 354.793717 pods/sec

Scale Testing

I validated this change by creating a testing workload that had a preference that was unsatisfiable and therefore would have to be relaxed with PreferencePolicy=Respect. By setting PreferencePolicy=Ignore, we wouldn't capture these preferences in our cluster topology or consider it when adding pods to nodes.

Test Workload

apiVersion: apps/v1
kind: Deployment
metadata:
  name: inflate
  namespace: default
spec:
  replicas: 10000
  selector:
    matchLabels:
      app: inflate
  template:
    metadata:
      labels:
        app: inflate
    spec:
      affinity: 
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: topology.kubernetes.io/zone
                operator: In
                values:
                - test-zone-a
                - test-zone-b
                - test-zone-c
                - test-zone-d
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            - topologyKey: kubernetes.io/hostname
              labelSelector:
                matchLabels:
                  app: inflate
          # Can't satisfy this term
          preferredDuringSchedulingIgnoredDuringExecution:
            - weight: 100
              podAffinityTerm:
                labelSelector:
                  matchLabels:
                    app: inflate
                topologyKey: topology.kubernetes.io/zone
      topologySpreadConstraints:
      - maxSkew: 1
        topologyKey: topology.kubernetes.io/zone
        whenUnsatisfiable: DoNotSchedule
        labelSelector:
          matchLabels:
            app: inflate
      containers:
        - name: inflate
          image: public.ecr.aws/ubuntu/ubuntu:22.04_stable
          command: ["/bin/sh", "-ec", "while :; do echo '.'; sleep 5 ; done"]
          resources:
            requests:
              memory: 100Mi
              cpu: 1
      terminationGracePeriodSeconds: 0

PreferencePolicy=Respect (19m)

Screenshot 2025-03-25 at 12 00 12 AM

PreferencePolicy=Ignore (7m)

Screenshot 2025-03-26 at 12 25 06 AM

How was this change tested?

make presubmit

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@k8s-ci-robot
Copy link
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Apr 4, 2025
@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Apr 4, 2025
@jonathan-innis jonathan-innis force-pushed the preference-policy branch 3 times, most recently from 2987580 to 6e45b76 Compare April 4, 2025 07:18
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Apr 4, 2025
@coveralls
Copy link

coveralls commented Apr 4, 2025

Pull Request Test Coverage Report for Build 14392176282

Details

  • 51 of 53 (96.23%) changed or added relevant lines in 6 files are covered.
  • 13 unchanged lines in 3 files lost coverage.
  • Overall coverage decreased (-0.07%) to 81.871%

Changes Missing Coverage Covered Lines Changed/Added Lines %
pkg/operator/options/options.go 3 5 60.0%
Files with Coverage Reduction New Missed Lines %
pkg/controllers/disruption/drift.go 2 89.66%
pkg/controllers/disruption/consolidation.go 4 85.55%
pkg/controllers/provisioning/scheduling/preferences.go 7 88.76%
Totals Coverage Status
Change from base Build 14391983686: -0.07%
Covered Lines: 9881
Relevant Lines: 12069

💛 - Coveralls

@jonathan-innis jonathan-innis force-pushed the preference-policy branch 6 times, most recently from 44ffb28 to 38f54bd Compare April 9, 2025 06:33
@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Apr 9, 2025
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Apr 9, 2025
@jonathan-innis jonathan-innis marked this pull request as ready for review April 9, 2025 06:36
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 9, 2025
@jonathan-innis jonathan-innis force-pushed the preference-policy branch 2 times, most recently from 19fa576 to a1143d7 Compare April 9, 2025 07:07
@jonathan-innis jonathan-innis changed the title feat: Add PreferencePolicy as an environment variable option feat: Add PreferencePolicy as an environment variable option Apr 9, 2025
@k8s-ci-robot k8s-ci-robot removed the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Apr 9, 2025
@k8s-ci-robot k8s-ci-robot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Apr 9, 2025
@jonathan-innis jonathan-innis force-pushed the preference-policy branch 11 times, most recently from a2e8196 to 255a4f8 Compare April 9, 2025 18:15
Copy link
Contributor

@rschalo rschalo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 10, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jonathan-innis, rschalo

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit 2e769bd into kubernetes-sigs:main Apr 10, 2025
16 checks passed
@jonathan-innis jonathan-innis deleted the preference-policy branch April 11, 2025 00:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants