Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: NodeOverlay #1305

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Conversation

ellistarn
Copy link
Contributor

@ellistarn ellistarn commented Jun 6, 2024

Fixes:

Description

---
# Reduce on-demand prices to 90%
# https://github.com/aws/karpenter-provider-aws/issues/3860
# https://github.com/aws/karpenter-provider-aws/pull/4697
kind: NodeOverlay
metadata:
  name: discount
spec:
  selector:
    matchLabels:
      karpenter.sh/capacity-type: on-demand
  pricePercent: 90
---
# Support for extended resource types (e.g. smarter-devices/fuse)
# https://github.com/kubernetes-sigs/karpenter/issues/751
# https://github.com/kubernetes-sigs/karpenter/issues/729
kind: NodeOverlay
metadata:
  name: discount
spec:
  selector:
    matchLabels:
      karpenter.sh/capacity-type: on-demand
    matchExpressions:
    - key: node.kubernetes.io/instance-type
      operator: In
      values:
      - m5.large
      - m5.2xlarge
      - m5.4xlarge
      - m5.8xlarge
      - m5.12xlarge
  capacity:
    smarter-devices/fuse: 1
---
# Add memory overhead of 10Mi to all instances with 2Gi memory
# https://github.com/aws/karpenter-provider-aws/issues/5161
kind: NodeOverlay
metadata:
  name: discount
spec:
  selector:
    matchLabels:
      karpenter.k8s.aws/instance-memory: 2048
  overhead:
    memory: 10Mi
---

How was this change tested?

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@ellistarn ellistarn marked this pull request as draft June 6, 2024 22:18
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. labels Jun 6, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ellistarn

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 6, 2024
@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jun 6, 2024
@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jun 6, 2024
@ellistarn ellistarn changed the title feat: Implemented NodeOverlay with support for custom pricing feat: Implemented NodeOverlay with support for custom pricing, resources, and labels Jun 6, 2024
@ellistarn ellistarn changed the title feat: Implemented NodeOverlay with support for custom pricing, resources, and labels feat: Implemented NodeOverlay with support for custom pricing, capacity, overhead, and labels Jun 6, 2024
@ellistarn ellistarn changed the title feat: Implemented NodeOverlay with support for custom pricing, capacity, overhead, and labels feat: Implemented NodeOverlay with support for custom pricing, capacity, overhead, and requirements Jun 6, 2024
@sftim
Copy link

sftim commented Jun 10, 2024

(nit) In the PR description, the NodeOverlays all have the same .metadata.name

@sftim
Copy link

sftim commented Jun 10, 2024

Is this an overlay for Nodes, or an overlay for NodePools (maybe more than one NodePool, if we support matching via a selector)?

@jwcesign
Copy link
Contributor

In a real scenario with a saving plan, there is a commitment to a consistent amount of usage. Once the committed usage is exhausted, the price will revert to the on-demand rate. Does this PR take that into consideration?

What I expect is:

  1. When there is remaining committed usage, Karpenter should continue to select on-demand instances(instance type) under the saving plan.
  2. When the committed usage is exhausted, Karpenter should choose spot instances.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 11, 2024
@daimaxiaxie
Copy link
Contributor

@ellistarn We redefined the value of topology.kubernetes.io/zone. However, the current zone value is provided by the CloudProvider. Is it possible to support custom topology.kubernetes.io/zone and other WellKnownLabel values?

@sftim
Copy link

sftim commented Jun 12, 2024

@ellistarn We redefined the value of topology.kubernetes.io/zone. However, the current zone value is provided by the CloudProvider. Is it possible to support custom topology.kubernetes.io/zone and other WellKnownLabel values?

It sounds like you don't agree with the zone naming used in an existing integration. You can:

  • use a different label
    • this is the easy and recommended option
  • replace the cloud provider integration code with your own
    (Karpenter, cloud controller manager, etc)

I don't think this PR would need to change to accommodate the ask.

@GnatorX
Copy link

GnatorX commented Jun 12, 2024

Thanks for trying to tackle this!

I wonder if there is a cleaner way to represent the overlays to reduce sprawl.

Thinking more generally, each dimension can be additive, adding a new custom resource, and/or adjustment, price changes or resource modification. (Assuming we aren't supporting deleting properties at this time).

Price is a slight outlier from resources so we can separate out. Would it make sense to have at the highest level price and resources. Then subsection of adjustment and additions. Adjustment would support +/- percent or values and additions would just be a dict of new custom resources and values?

@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 12, 2024
@daimaxiaxie
Copy link
Contributor

@sftim I mean nodeoverlay can provide more overlay options, not just resources. For example, Lables.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 18, 2024
Comment on lines +7 to +9
### Price

Each instance offering comes with a price, which is used as a minimization function in Karpenter's scheduling algorithm. However, Karpenter cannot perfectly understand all factors that influence price, such as an external vendor's fee, a pricing discount, or an external motiviation such as carbon-offset.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be worth describing the potential interaction of client-side use (and customization) of pricing with provider-specific server-side price-sensitive allocation strategies (like those supported by EC2 Fleet).

@rlindsberg
Copy link

Hi all!
Do we have an estimation of when this feature will be publicly available? We have been waiting for it for over a year now..

Our use case is we run Circle CI runner jobs on a EKS cluster. And we have this limit for buildah to work:

resources:
  limits:
    github.com/fuse: 2

But we got the following error:

{
    "level": "ERROR",
    "logger": "controller.provisioner",
    "message": "Could not schedule pod, incompatible with nodepool \"default\", daemonset overhead={\"cpu\":\"180m\",\"memory\":\"120Mi\",\"pods\":\"8\"}, no instance type satisfied resources {\"cpu\":\"2180m\",\"github.com/fuse\":\"1\",\"memory\":\"8312Mi\",\"pods\":\"9\"} and requirements karpenter.k8s.aws/instance-category In [m t], karpenter.k8s.aws/instance-generation Exists >2, karpenter.sh/capacity-type In [on-demand], karpenter.sh/nodepool In [default], kubernetes.io/arch In [amd64], kubernetes.io/os In [linux] (no instance type has enough resources); incompatible with nodepool \"arm64\", daemonset overhead={\"cpu\":\"180m\",\"memory\":\"120Mi\",\"pods\":\"8\"}, no instance type satisfied resources {\"cpu\":\"2180m\",\"github.com/fuse\":\"1\",\"memory\":\"8312Mi\",\"pods\":\"9\"} and requirements karpenter.k8s.aws/instance-category In [c m t], karpenter.k8s.aws/instance-generation Exists >2, karpenter.sh/capacity-type In [on-demand], karpenter.sh/nodepool In [arm64], kubernetes.io/arch In [arm64], kubernetes.io/os In [linux] (no instance type has enough resources)"
}

operator: In
values: ["m5.large", "m5.2xlarge", "m5.4xlarge", "m5.8xlarge", "m5.12xlarge"]
capacity:
smarter-devices/fuse: 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear, the "extended resource support" in this design would only support statically applied extended resources on selected nodes, correct? This would not cover cases where dynamic extended resources are needed, for example setting an extended resource for available network bandwidth which varies by EC2 instance type.

Additionally, karpenter still needs to have its provisioning and deprovisioning logic updated to account for arbitrary extended resources other than the gpu resource, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would not cover cases where dynamic extended resources are needed, for example setting an extended resource for available network bandwidth which varies by EC2 instance type.

You could achieve through multiple NodeOverlays, that match different selectors.

e.g. m5.large has 2, m5.2xlarge has 4.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think defining a NodeOverlay for each network bandwidth amount is a reasonable solution for my use case. We use a wide variety of instance types and sizes.

Do you think it could be feasible to incorporate some level of templating for values that can be easily retrieved from the EC2 API?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For NetworkBandwidth, I'd love to just support this one first class.

We could always try to do something that attempts to model this like:

resourcesPerCPU: 
  network: 1Gi

But it's hard to capture all potential relationships without devolving into something like https://github.com/google/cel-spec

@coveralls
Copy link

Pull Request Test Coverage Report for Build 9882748974

Details

  • 0 of 60 (0.0%) changed or added relevant lines in 1 file are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage decreased (-0.4%) to 77.764%

Changes Missing Coverage Covered Lines Changed/Added Lines %
pkg/apis/v1alpha1/zz_generated.deepcopy.go 0 60 0.0%
Totals Coverage Status
Change from base Build 9880691959: -0.4%
Covered Lines: 8750
Relevant Lines: 11252

💛 - Coveralls

@daverin
Copy link

daverin commented Jul 18, 2024

Hi All

I'm a bit confused about the relationship between kubernetes-sigs/karpenter and aws/karpenter-provider-aws.

The lack of support for custom resource requests/limits is blocking me from autocalling with xilinx devices.

I want to understand how to deploy this branch to EKS.

Any help or understanding would be appreciated.

Copy link

github-actions bot commented Aug 2, 2024

This PR has been inactive for 14 days. StaleBot will close this stale PR after 14 more days of inactivity.

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 2, 2024
@sftim
Copy link

sftim commented Aug 2, 2024

I want to understand how to deploy this branch to EKS.

This branch adds a “request for comments“. @ellistarn should the RFC and implementation land as one commit? Seems unusual.

@github-actions github-actions bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 3, 2024
Copy link

This PR has been inactive for 14 days. StaleBot will close this stale PR after 14 more days of inactivity.

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 17, 2024
@github-actions github-actions bot closed this Sep 1, 2024
@zaafar
Copy link

zaafar commented Sep 1, 2024

i think this shouldn't be closed.

@njtran njtran reopened this Sep 4, 2024
@njtran
Copy link
Contributor

njtran commented Sep 4, 2024

/lifecycle frozen

@k8s-ci-robot
Copy link
Contributor

@njtran: The lifecycle/frozen label cannot be applied to Pull Requests.

In response to this:

/lifecycle frozen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@njtran njtran removed lifecycle/closed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Sep 4, 2024
@daimaxiaxie
Copy link
Contributor

What is the progress now? Why does it seems not to be updated.

Copy link

This PR has been inactive for 14 days. StaleBot will close this stale PR after 14 more days of inactivity.

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 28, 2024
@njtran njtran removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.