-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DRAFT: Propose an alternative to the types in capacity_types.go #8
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Kevin Klues <[email protected]>
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: klueska The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
This is what kubernetes/kubernetes#123699 depends upon: kubelet knows that drivers may access ResourceSlice and ResourceClaim through the REST proxy and denies access to everything else. When combining kubelet from 1.31 with a future DRA driver which wants to publish a "SomeFutureType" instead of a "ResourceSlice", the driver's request will get denied. I'm working on a KEP update for this issue and have a prototype implementation ready that I'll share together with it.
There's one additional indirection ("namedResources"), right? I don't find it hard to ignore that - but perhaps I am simply used to it. 🤷 |
I think where it got awkward was when I was tryin to adopt @johnbelamaric*s model to fit into this:
These 5 structs turn into one (with multiple instances of the last as top-level objects). |
I'm not saying we necessarily want to do this, but couldn't the kubelet be updated to whitlelist any new model objects we define (same as the scheduler needing to know to pull them in too)? |
Let's not forget we have versioning in the API. So, we start with v1 without the indirection. One option then is to add a new type; another option is to create DevicePool v2 which in turn has the indirection but roundtrips to v1. Kubelet can use an A couple questions as you think through that design:
|
kubelet still needs to check whether a request for some unknown resource should be allowed. Let's discuss in kubernetes/enhancements#4615 |
This doesn't work without the one-of-many concept already being in v1. A new field can be added to v1 to make the round-trip possible, but a client built against v1 without that field will never see it. We have to define in advance how such a client can detect that it doesn't have some information that it should have. |
*Claim Model* - `DeviceType` is gone. Instead, this version takes Tim's suggestion: - `DeviceClass` is mandatory for claims. - `DeviceClass` can refer either to the criteria that defines a set of devices (an individual class), OR to a label selector. - The label selector will be used to identify a set of classes. So, essentially, a class is now either a "leaf" class, or it is a set of classes. - If a claim specifies a "non leaf" class, then the scheduler must flatten that to a list of leaf classes, and can consider any of those classes as containing possible solutions. This is more expressive than DeviceType and fits better into the API conventions. - `DeviceClaimInstance.DeviceClaimDetail` is now a pointer so clients can check if it is nil or not, in order to differentiate between that and the `OneOf` form of `DeviceClaimInstance`. An alternative would be to add `AllOf` as a peer to `OneOf`. This is just a bit more awkward, with an additional YAML stanza with a single entry for the expected most common case. But it may match better with API conventions. - Moved AdminAccess from class to claim. This is necessary because if we are to control by quota, it's not clear that we can look up the AdminAccess boolean that is in the *class* when we apply quota to the *claim*. This still seems like the wrong way to do this, but it does avoid a separate top-level type. For AdminAccess, make it clear only the Constraints and Driver matter; the rest of the claim details do not. Additionally, ALL devices matching the Driver and Constraints will be associated with the claim and able to be added to a container. - Changed requests.count to requests.devices to be more explicit. Defaults to 1. - Removed Limits. Instead, we now have `MaxDevices`, which will default to the value of `requests.devices`. Thus, to get exactly N devices, you only need to specify `requests.devices = N` in the claim. If you want 1-N devices, you only need to specify `maxDevices = N`. If you want N-M devices, you would specify `requests.devices = N, maxDevices = M`. - `DeviceClaimStatus` now has an atomic listMap for allocations, owned by the scheduler, and a separate listMap with entries owned by the drivers to capture the results of the actuation of that allocation. *Capacity Model* - `DevicePool.Resources` => `DevicePool.SharedResources` (as in kubernetes-sigs#8) - `DevicePoolStatus` changes from kubernetes-sigs#8 - `Device.Requests` => `Device.ConsumesSharedResources` (similar to kubernetes-sigs#8) - `Device.Resources` => `Device.ProvidesClaimResources` *NVIDIA DGX A100 Example* - Added `memory` to the `Device.ProvidesClaimResources` *PodSpec Examples* - Still working on these. There are still some open questions that I will summarize in a comment.
// different types of resources this struct can represent. | ||
type DeviceResource struct { | ||
Name string `json:"name"` | ||
*DeviceResourceValue `json:",inline"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've noticed that this indirection (and the others like it) still haven't made it into the prototype or been discussed much. I really feel like limiting ourselves to just a quantity is going to bite us in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, this is probably needed. We'll need to convince @thockin that the complexity is worth it. I think that given that the "capacity" model is primarily for driver developers, a little more complexity is OK. Users will have to be able to understand it though for troubleshooting.
Can you make sure this covered in the "open questions" doc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand what this is?
The Kubernetes project currently lacks enough contributors to adequately respond to all PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
The Kubernetes project currently lacks enough active contributors to adequately respond to all PRs. This bot triages PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
I spent some time today going through the types in the file
pkg/api/capacity_types.go
in detail. I have proposed my changes to it as an entirely new file (instead of a diff to the original) so that we can more easily comment on it line-by-line without the diff getting in the way.The biggest thing to note is the following:
I'm coming around to the idea that
DevicePool
(or whatever future resource models we come up with) could betop level objects, rather than wrapping them in multiple-levels of indirection like we have today to support the
NamedResources
model:Having all of these levels of indirection doesn't actually add much value. The original idea was that the scheduler (or whoever else cares about
ResourceSlices
) only has one object type to subscribe to, and we can simply extend its embeddedResourceModel
with more types over time.This actually makes it awkward, however, when we need to split our
Instances
at the lowest level across multiple API server objects. Each object needs to be wrapped in multiple levels of indirection, making it a somewhat hard to reason about when trying to reconstruct all objects from a given driver.On the contrary, by making
DevicePool
(or whatever future resource models we come up with) a top-level API server object, we have a natural way of breaking things apart. Each pool is it's own self-contained object, so there is no ambiguity on where the object boundaries should be. After thinking about it for a while, this actually feels like the more natural and "Kubernetes-like" way of abstracting these objects.The major difference being that now the scheduler can't just subscribe to a single
ResourceSlices
object and have all supported model implementations from all drivers follow that. It potentially has to subscribe to multiple top-level objects representing different models and react to them accordingly. This seems like a reasonable trade-off since one would always have to write the code to be aware of the different models anyway.The second biggest thing to note is the following:
I have (slightly) renamed and "future-proofed" all of the types that represent named resources and their requests. Whether we want to do this is still being debated, but I wanted to get a concrete proposal for what it would look like as a diff to @johnbelamaric's prototype, rather than continuing to talk in "what-ifs".
The third biggest thing to note is the following:
We can now have a meaningful
Status
attached to the top-level type. In this case I have expanded it to include all allocated devices and the claims they have been allocated to.