Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vmMemoryOverheadPercent design is causing over provision #6652

Open
bnetzi opened this issue Aug 5, 2024 · 7 comments
Open

vmMemoryOverheadPercent design is causing over provision #6652

bnetzi opened this issue Aug 5, 2024 · 7 comments
Labels
bug Something isn't working needs-triage Issues that need to be triaged

Comments

@bnetzi
Copy link

bnetzi commented Aug 5, 2024

Description

I created a new issue instead of Re-opening
#6611
As I don't have a permission to re-open it.

The reason I want to re-open it is documented in the comments there but I will submit it here as well:

I think the fact that karpenter is reducing 7.25% from the allocatable machine memory is a bug or at least a huge design flaw. The fact that karpenter can't know upfront the allocatable memory does not mean it should assume a constant overhead for all instances types and for all node pools. Just like it does not do that for the CPU.
It should be at least be a factor that depends on instance size - 7% overhead of the memory is acceptable for small instances, but for large instances it creates a huge overhead.
In my opinion the bare minimum would be to allow this to be configured per node pool.
Node overlay seems like an ok solution too but it would be harder to configure. Node pool seems like a better place (although I do get why it is harder to implement it there).

Also - I still don't understand, why does karpenter needs to know upfront? it can launch an instance, check if it fits, and if not - kill it and launch another instance. Another option is to cache the allocatable memory for each instance yype from each AMI used.

The bottom line is: In our env it created a 34% overhead. There is no way this should be acceptable as a default behavior, and I don't think most of the users understand the implications of this setting as it is really hard to measure over-provision, especially at scale.

@bnetzi bnetzi added bug Something isn't working needs-triage Issues that need to be triaged labels Aug 5, 2024
@Balraj06
Copy link

Balraj06 commented Aug 6, 2024

@bnetzi Setting vmMemoryOverheadPercent to zero can solve the issue of overestimating the overhead right? Did you give it a try?

@bnetzi
Copy link
Author

bnetzi commented Aug 6, 2024

@Balraj06 Hi, for sure tried and it helps for the specific use case I presented.
Having said that - it is still a bug due the following reasons:

  1. It is a deployment level variable, and with a mix environment with a variety of nodePools, pods and instance sizes - too large value is causing a waste of memory that we are not allocating / over-provision. A small value can cause pods that are pending forever as karpenter miss calculating the allocatable memory.
    My whole point is that it should be controlled per node size or in nodePool or per nodeClass (or with the future feature of node overlay as suggested in the comments of my previous issue)

  2. This behavior, although might be controlled to some degree, is not well documented and probably most of the people that are using large instance (Ones with more than 100G memory) - have over allocation and they don't even realize it as it is hard thing to measure. Only advanced metrics are showing it or a deep dive into the logs, events, pod and node specs.

@Balraj06
Copy link

Balraj06 commented Aug 7, 2024

@bnetzi Thank you for the brief explanation. Can you send me the solution of node overlay?

@bnetzi
Copy link
Author

bnetzi commented Aug 7, 2024

@jukie
Copy link

jukie commented Aug 8, 2024

I'd prefer something like VM_MEMORY_OVERHEAD_MB that supported setting a specific value rather than percentage.

@bnetzi
Copy link
Author

bnetzi commented Aug 8, 2024

@jukie good point

@njtran
Copy link
Contributor

njtran commented Aug 12, 2024

Another option is to cache the allocatable memory for each instance type from each AMI used.

The solutions for this problem are covered by #5161. Is it possible to discuss this there?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs-triage Issues that need to be triaged
Projects
None yet
Development

No branches or pull requests

4 participants