New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

docs: add consolidation policies rfc #1433

Open

cnmcavoy wants to merge 2 commits into kubernetes-sigs:main from cnmcavoy:cmcavoy/consolidation-policies-rfc

cnmcavoy commented Jul 16, 2024

Fixes #N/A

Description

RFC for #1429 and #1430

Let me know what topics or areas of interest need to be included or weren't fully covered. First time writing one of these.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.


          docs: add consolidation policies rfc

9c99261

Signed-off-by: Cameron McAvoy <[email protected]>

k8s-ci-robot added the cncf-cla: yes label

Contributor

k8s-ci-robot commented Jul 16, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: cnmcavoy
Once this PR has been reviewed and has the lgtm label, please assign jonathan-innis for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot requested review from engedaam and tallaxes

July 16, 2024 22:51

k8s-ci-robot added the size/M label

cnmcavoy mentioned this pull request

feat: Add consolidation policies for WhenCheaper, WhenUnderutilizedOrCheaper #1429

Closed

coveralls commented Jul 16, 2024 •

edited

Loading

Pull Request Test Coverage Report for Build 10460766566

Details

0 of 0 changed or added relevant lines in 0 files are covered.
6 unchanged lines in 2 files lost coverage.
Overall coverage decreased (-0.05%) to 77.766%

Files with Coverage Reduction	New Missed Lines	%
pkg/scheduling/requirements.go	2	98.01%
pkg/controllers/disruption/consolidation.go	4	87.25%

Totals
Change from base Build 10407510370:	-0.05%
Covered Lines:	8912
Relevant Lines:	11460

💛 - Coveralls

cnmcavoy force-pushed the cmcavoy/consolidation-policies-rfc branch from 0d420d0 to 92bbcf2 Compare

July 19, 2024 17:46


          Add new option for price threshold improvements

592cb39

Signed-off-by: Cameron McAvoy <[email protected]>

cnmcavoy force-pushed the cmcavoy/consolidation-policies-rfc branch from 92bbcf2 to 592cb39 Compare

July 19, 2024 17:49

This was referenced Jul 22, 2024

Price Improvement Threshold #1440

Open

feat: nodeclaim.spec.minimumPriceImprovementPercent #1454

Closed

github-actions bot commented Aug 3, 2024

This PR has been inactive for 14 days. StaleBot will close this stale PR after 14 more days of inactivity.

github-actions bot added the lifecycle/stale label

github-actions bot added the lifecycle/closed label

github-actions bot closed this

Member

jmdeal commented Aug 19, 2024

Sorry to have let this stale out, now that v1.0.0 is out the door we should all have more bandwidth for review. I'm planning on taking a look at this soon.

/reopen

k8s-ci-robot reopened this

Contributor

k8s-ci-robot commented Aug 19, 2024

@jmdeal: Reopened this PR.

In response to this:

Sorry to have let this stale out, now that v1.0.0 is out the door we should all have more bandwidth for review. I'm planning on taking a look at this soon.

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

github-actions bot removed lifecycle/closed lifecycle/stale labels

github-actions bot commented Sep 3, 2024

This PR has been inactive for 14 days. StaleBot will close this stale PR after 14 more days of inactivity.

github-actions bot added the lifecycle/stale label

njtran reviewed

View reviewed changes

designs/consolidation-policies.md


		Karpenter provides 3 forms of consolidation: single-node, multi-node, and emptiness consolidation. Single-node replaces expensive nodes with cheaper nodes that satisfy the workloads requirements. Multi-node replaces multiple nodes with a single, larger node that can satisfy workload requirements. Emptiness removes nodes that have no workloads that need to be relocated to reduce costs.

		Customers want to have control over the types of consolidation that can occur within a nodepool. Some nodepools may only be used for job type workloads (crons, jobs, spark, etc) where single-node consolidation is dangerous and wastes execution when it disrupts pods. Other nodepools may be used mostly by long-running daemons, where daemonset costs are more significant and multi-node consolidation to binpack into larger nodes would be preferred.

Contributor

njtran Sep 16, 2024

Ideally, multi and single node are just collapsed into the same type of consolidation, where users don't need to think about the fact that there are two concepts for consolidation.

designs/consolidation-policies.md


		Customers want to have control over the types of consolidation that can occur within a nodepool. Some nodepools may only be used for job type workloads (crons, jobs, spark, etc) where single-node consolidation is dangerous and wastes execution when it disrupts pods. Other nodepools may be used mostly by long-running daemons, where daemonset costs are more significant and multi-node consolidation to binpack into larger nodes would be preferred.

		Karpenter's consolidation considers the price of instances when deciding to consolidate, but does not consider the cost of disruption. This can harm cluster stability, if the price improvement of the node is small, or the workloads disrupted are very costly to rebuild to their previous running state (e.g. long-running job that must restart from scratch).

Contributor

njtran Sep 16, 2024

One suggestion I'd make is that we estimate the cost of disruption, and there are some (albeit lesser used) knobs that a user can use to tweak this in their favor.

designs/consolidation-policies.md

+              ### Option 1: Add price thresholds, merge consolidation controllers
+              The motivation for the additional controls is the high cost of disruption of certain workloads for small or negative cost savings. So instead of offering controls, the proposal is to add price improvement thresholds for consolidation, to avoid this scenario.
+              The proposal is to find a price improvement threshold which accurate represents the "cost" of disrupting the workloads. This threshold could be an arbitrary-fixed value (e.g. 20% price improvement), or heuristically computed by Karpenter based on the cluster shape (e.g. nodepool ttl's might be a heuristic input).

Contributor

njtran Sep 16, 2024

Out of curiosity, what do you think a reasonable default could be for this?

designs/consolidation-policies.md


		The proposal is to find a price improvement threshold which accurate represents the "cost" of disrupting the workloads. This threshold could be an arbitrary-fixed value (e.g. 20% price improvement), or heuristically computed by Karpenter based on the cluster shape (e.g. nodepool ttl's might be a heuristic input).

		The separation of the single-node and multi-node consolidation is arbitrary. The single-node consolidation finds the "least costly" eligible node to disrupt, and uses that as a solution, even if it's a poor solution. However, the least-costly eligible node to disrupt is not necessarily the most cost-savings node to disrupt, but the single-node consolidation stops after it has any solution.

Contributor

njtran Sep 16, 2024

but the single-node consolidation stops after it has any solution.

This is the main difference from multi-node today. Multi-node doesn't (and can't) go through all the potential options that single-node can, so it finds edge cases of cost savings that multi-node doesn't

designs/consolidation-policies.md


		The separation of the single-node and multi-node consolidation is arbitrary. The single-node consolidation finds the "least costly" eligible node to disrupt, and uses that as a solution, even if it's a poor solution. However, the least-costly eligible node to disrupt is not necessarily the most cost-savings node to disrupt, but the single-node consolidation stops after it has any solution.

		In contrast, the multi-node consolidation spends as much time as possible to find a solution. So while price thresholds could be implemented into each consolidation controller, it is simpler to consider merging their responsabilities into one shared controller based on multi-node consolidation. This combined consolidation controller could search as long as possible for the most effective consolidation outcome, from emptiness, multi-node or single-node replacements.

Contributor

njtran Sep 16, 2024

Multi-node has a timeout of 1 minute: https://github.com/kubernetes-sigs/karpenter/blob/main/pkg/controllers/disruption/multinodeconsolidation.go#L35, single node has 3 minutes.

Suggested change

      
            In contrast, the multi-node consolidation spends as much time as possible to find a solution. So while price thresholds could be implemented into each consolidation controller, it is simpler to consider merging their responsabilities into one shared controller based on multi-node consolidation. This combined consolidation controller could search as long as possible for the most effective consolidation outcome, from emptiness, multi-node or single-node replacements.
          
            So while price thresholds could be implemented into each consolidation controller, it is simpler to consider merging their responsabilities into one shared controller based on multi-node consolidation. This combined consolidation controller could search as long as possible for the most effective consolidation outcome, from emptiness, multi-node or single-node replacements.

Contributor

njtran Sep 16, 2024

This combined consolidation controller could search as long as possible for the most effective consolidation outcome

We've seen this break down at larger clusters, with complex scheduling constraints, for instance preferred anti affinity, where Karpenter has to try to schedule pods multiple times, once with the constraints treated as a hard-constraint, and once as a soft-constraint. We may be able to achieve a performance benefit here through hardening our story on preferences though.

designs/consolidation-policies.md

+              * 👎 High amount of engineering effort to implement.
+              ### Option 2: Refactor and Expand Consolidation Policies
+              Another approach would be to not make significant changes to consolidation, but expose controls to allow enabling or disabling various consolidation controllers. Karpenter already presents the binary consolidation options. The `NodePool` resource has a disruption field that exposes the `consolidationPolicy` field. This field is an enum with two supported values: `WhenEmpty` and `WhenUnderutilized`.

Contributor

njtran Sep 16, 2024

slightly updated since

Suggested change

      
            Another approach would be to not make significant changes to consolidation, but expose controls to allow enabling or disabling various consolidation controllers. Karpenter already presents the binary consolidation options. The `NodePool` resource has a disruption field that exposes the `consolidationPolicy` field. This field is an enum with two supported values: `WhenEmpty` and `WhenUnderutilized`.
          
            Another approach would be to not make significant changes to consolidation, but expose controls to allow enabling or disabling various consolidation controllers. Karpenter already presents the binary consolidation options. The `NodePool` resource has a disruption field that exposes the `consolidationPolicy` field. This field is an enum with two supported values: `WhenEmpty` and `WhenEmptyOrUnderutilized`.

designs/consolidation-policies.md

Comment on lines +38 to +40

		And change the semantics of the existing consolidation policy enum value:

		* `WhenUnderutilized`

Contributor

njtran Sep 16, 2024

Changing the semantic of the existing consolidation policy to resolve to something different is a breaking change, and could definitely surprise people who upgrade without changing their values to the right one. If we're going this route, I'd prefer we keep the existing enum val to mean the same thing, and add another one to portray the mode you're proposing

designs/consolidation-policies.md


		The semantics of each of the enum values matches their names, making it straightforward to explain to customers or users. `WhenUnderutilizedOrCheaper` becomes the new default. `WhenUnderutilizedOrCheaper` is the same as `WhenUnderutilized` previously. `WhenUnderutilized` semantics changes to only allow emptiness or multi-node consolidation to occur for a nodepool. `WhenCheaper` only allows emptiness or single-node consolidation to occur for a nodepool.

		#### Pros/Cons

Contributor

njtran Sep 16, 2024

Another con: more effort to understand the three different modes for users

designs/consolidation-policies.md

+              * 👍 Fine-grained controls for the various consolidation controllers.
+              * 👎 Complex configuration for the `NodePool` crd
+              * 👎 Complex implementation for the various consolidation controllers.
+              * 👎 Limits Karpenter's ability to change / remove existing consolidation controllers, as config will be tightly coupled.

Contributor

njtran Sep 16, 2024

pretty large downside for me

designs/consolidation-policies.md

+              * 👍 Simple implementation for the various consolidation controllers.
+              * 👎 If Karpenter adds consolidation modes with new purposes, more `consolidationPolicy` enum values will be needed to make them toggleable.
+              ### Option 3: Add Consolidation Configuration to NodePools

Contributor

njtran Sep 16, 2024

not sure if we can do this in a backwards compatible way that wouldn't require a v2 to the API.

github-actions bot removed the lifecycle/stale label

njtran mentioned this pull request

Ability to define threshold usage percentage for Karpenter to consider nodes for consolidation when using WhenEmptyOrUnderutilized #1686

Closed

github-actions bot commented Oct 1, 2024

This PR has been inactive for 14 days. StaleBot will close this stale PR after 14 more days of inactivity.

github-actions bot added the lifecycle/stale label

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes lifecycle/stale size/M