Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ObservedObjectCollection API type #217

Merged
merged 5 commits into from
Apr 8, 2024

Conversation

pedjak
Copy link
Contributor

@pedjak pedjak commented Mar 28, 2024

Description of your changes

Objects in the collection are defined by:

  • GVK
  • optional namespace
  • label selector
apiVersion: kubernetes.crossplane.io/v1alpha1
kind: ObservedObjectCollection
metadata:
  name: jobs
  labels:
    foo: bar
spec:
  observeObjects:
    apiVersion: batch/v1
    kind: Job
    namespace: my-ns
    selector:
      matchLabels:
          foo: bar
  providerConfigRef:
     name: kubernetes-provider
  objectTemplate:
    metadata:
      labels:
        foo: bar

The provider reconcile the collection by fetching objects matching the selector and creates counterpart "observe-only" Objects in the local cluster. Additional labels and annotations can be added to them by specifying objectTemplate.metadata on the collection resource.

apiVersion: kubernetes.crossplane.io/v1alpha2
kind: Object
metadata:
  name: 00647af6-a82f-4344-98b3-4e5dcbbbfa8d
  labels:
    kubernetes.crossplane.io/owned-by-collection: jobs
    foo:bar
spec:
  managementPolicies: ["Observe"]
  forProvider:
    manifest:
      apiVersion: batch/v1
      kind: Job
      metadata:
        name: job1
        namespace: my-ns
  providerConfigRef:
    name: kubernetes-provider
apiVersion: kubernetes.crossplane.io/v1alpha2
kind: Object
metadata:
  name: ae2d702b-e468-41fa-b427-f96ee03928aa
  labels:
    kubernetes.crossplane.io/owned-by-collection: jobs
    foo: bar
spec:
  managementPolicies: ["Observe"]
  forProvider:
    manifest:
      apiVersion: batch/v1
      kind: Job
      metadata:
        name: job2
        namespace: my-ns
  providerConfigRef:
    name: kubernetes-provider
  • The created objects are owned by the collection resource and reconciled as usual by the provider.
  • Each created object is accompanied with kubernetes.crossplane.io/owned-by-collection label uniquely identifying the collection object, so that clients can use label to obtain all members of the given collection
  • That label is published on collection as well, under status.membershipLabel
apiVersion: kubernetes.crossplane.io/v1alpha1
kind: ObservedObjectCollection
metadata:
  name: jobs
spec:
  observeObjects:
    apiVersion: batch/v1
    kind: Job
    namespace: my-ns
    selector:
      matchLabels:
          foo: bar
  providerConfigRef:
     name: kubernetes-provider
status:
  membershipLabel:
    kubernetes.crossplane.io/owned-by-collection: jobs
flowchart LR;
collection["ObservedObjectCollection"]
provider{{provider-kubernetes}}
cluster((k8s cluster))
object1["Object

managementPolicies: Observe"]
object2["Object

managementPolicies: Observe"]
object3["Object

managementPolicies: Observe"]

provider-->|reconciles|collection
provider-->|list objects matching selector|cluster
provider-->|creates|object1
provider-->|creates|object2
provider-->|creates|object3
collection-->|refers|object1
collection-->|refers|object2
collection-->|refers|object3
object1-->|owned by|collection
object2-->|owned by|collection
object3-->|owned by|collection
Loading

NOTE: The new controller is implemented as a standard controller-runtime controller, which is the depart from managed.Reconciler pattern used across provider ecosystem. Given that the new API does not create any external resources, we were able to produce simpler code. The approach was discussed and approved by @negz and @turkenh.

Fixes #209

I have:

  • Read and followed Crossplane's [contribution process].
  • Run make reviewable test to ensure this PR is ready for review.

How has this code been tested

  • Added new unit tests
  • deploy a test resource with kubectl -f apply examples/collection/collection.yaml

apis/v1alpha1/types.go Outdated Show resolved Hide resolved
apis/v1alpha1/types.go Outdated Show resolved Hide resolved
apis/v1alpha1/types.go Outdated Show resolved Hide resolved
internal/clients/client.go Show resolved Hide resolved
internal/clients/client.go Outdated Show resolved Hide resolved
internal/controller/collection/reconciler.go Outdated Show resolved Hide resolved
internal/controller/collection/reconciler.go Outdated Show resolved Hide resolved
internal/controller/collection/reconciler.go Outdated Show resolved Hide resolved
internal/controller/collection/reconciler.go Outdated Show resolved Hide resolved
internal/controller/collection/reconciler.go Outdated Show resolved Hide resolved
}

// Remove objects that either do not exist anymore or are no match
oldRefs := sets.New[v1alpha1.ObservedObjectReference](collection.Status.Objects...)
Copy link
Collaborator

@turkenh turkenh Mar 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, we have two distinct sources (status.objects and objects owned by collection on etcd), and they could diverge for some reason.

I believe it would be more reliable if we lean on whatever on the API instead of using references in the status (also relying on status has its downsides). I mean, we could add labels to the created observe-only object resources (e.g. kubernetes.crossplane.io/owned-by-collection: jobs) and list them with labels to see what a collection owns and clean them if they no longer needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we introduce such label, then we implicitly limit collection names to 63 chars. Let me know if this is an important trade to be made. Also, ObservedObjectCollection instances are gonna be created mostly from within a composition, where their name is built from composite name and a prefix.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it would be more reliable if we lean on whatever on the API instead of using references in the status

Having refs in the status might be helpful for some, similar to how we have refs in the composite object.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, agree having refs in the status makes sense for visibility. But we should also add labels so that functions can avoid having to first get the collection and then requesting objects if they already know the collection, they should be able to request resources by label directly instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if they already know the collection, they should be able to request resources by label directly instead.

Users can do that already by adding an arbitrary number of labels through .spec.template.metadata.labels on ObserverdObjectCollection.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we introduce such label, then we implicitly limit collection names to 63 chars.

Also, ObservedObjectCollection instances are gonna be created mostly from within a composition, where their name is built from composite name and a prefix.

This is a good point. However, the same limitation holds for composites, doesn't it? They put crossplane.io/composite: <name-of-the-composite> to all resources they compose and they could be created from within other compositions as well.

I believe the reason why we maintain references there is that the composed resources could be of any apiVersion/kind and it is not possible to list all composed objects with a label filter. The situation is different here, we know all owned resources are Objects and we can list/filter them with labels.

I am totally fine with keeping them in status for visibility.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have introduced kubernetes.crossplane.io/owned-by-collection label that contains the name of the collection, if the name length is less than 63 chars. If greater, md5 hash of the name is used instead.

The label and its value is published under status.membershipLabel so that clients can know what label to use to obtain all items belonging to the given collection.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If greater, md5 hash of the name is used instead.

FWIW there's a potential collision here:

  • One collection has a long enough name that its objects refer to it as some-md5-hash
  • Another collection is actually named some-md5-hash

This probably doesn't matter - it would never happen outside of a collection actively trying to cause a collision.

Is it so bad to just limit collection names to 63 characters? I prefer predictable and human-readable labels to labels that are sometimes human readable, and other times an md5 hash. I think this would also allow us to avoid having to write a label to the status of the collection just to know what label to query for (e.g. hash or name).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, the same limitation holds for composites, doesn't it?

The limitation also holds for:

  • All packages - we label package revisions with package names (see here)
  • All compositions - we label composition revisions with composition names (see here)
  • All claims - we label XRs with claim names (see here)

Given that we have this 63 character limitation in so many places already, and that AFAIK it hasn't ever been a problem in practice, I don't think we should do anything different here.

If we did want to address this potential problem in a way that's backward compatible and consistent with the existing places where we label children with the parent's name we could:

  • Always write the parent's name to one label, and truncate it if longer than 63 characters
  • Write a hash1 of the parent's name that will always be 63 characters or less to another label

This way you get a human-readable label that's easily traced back to the parent resource 99% of the time, supported by a machine-readable label that can be used to list child resources 100% of the time.

Footnotes

  1. I think we should use sha256 unless there's a specific reason to prefer md5. It's what we've used to generate unique short names in other places, e.g. package revision and composition revision names.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would agree with @negz and have a similar comment here: #217 (comment)

apis/v1alpha1/types.go Outdated Show resolved Hide resolved
@pedjak
Copy link
Contributor Author

pedjak commented Apr 2, 2024

@turkenh I have addressed the comments, ptal.

internal/controller/observedobjectcollection/reconciler.go Outdated Show resolved Hide resolved
internal/controller/observedobjectcollection/reconciler.go Outdated Show resolved Hide resolved
return collection.GetName(), nil
}
// Otherwise, compute md5 hash of it, to reduce it length.
h := md5.New() //#nosec G401 -- used only for unique name generation
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just checked how Crossplane doing it with crossplane.io/composite label: https://github.com/crossplane/crossplane/blob/d35ecef8b3d9094261dbda64e72498cda1520797/internal/controller/apiextensions/composite/api.go#L401

Looks like we don't have special handling for long names rather try to add it as label, it fails and we return the error to the user so that they can adjust their naming. I remember I observed this while applying some composites (of platform-refs) and used shorter names with my compositions.

I would agree that this is not the best experience, but still would follow upstream implementation here as well. To keep the logic here simple and not have cryptic owner labels when they are long.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fine if we do not want to backup solution for longer names. Thus, I have limited the collection names to 63 chars max and added CRD validation rule for that.

FWIW user should not expect any particular semantic behind the value of the label that is used for selecting all collection members. It is just important that it is unique per collection, and discoverable by users (published at the moment under status.memebershipLabel

internal/controller/observedobjectcollection/reconciler.go Outdated Show resolved Hide resolved
Objects in the collection are defined by:
* GVK
* optional namespace
* label selector

The objects are fetched using the specified provider config
and for the matched objects the provider creates counterpart
observe-only objects in the local cluster.

The created objects are owned by the collection resource and
reconciled as usual by the provider. They are also referenced
in collection's `.status.objects` field.

Signed-off-by: Predrag Knezevic <[email protected]>
Signed-off-by: Predrag Knezevic <[email protected]>
* limited collection name to 63 chars max, allowing us to that name as the memebership label,
  without need to handle names longer than 63 chars
* remove `.status.objectRefs` fields
* addressed nits

Signed-off-by: Predrag Knezevic <[email protected]>
The names are computed as follows:

`COLLECTION_NAME+FIRST_56_BITS_SHA256_HASH_OF(OBJECT_GVK,NAMESPACE,NAME)`

Signed-off-by: Predrag Knezevic <[email protected]>
Copy link
Collaborator

@turkenh turkenh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @pedjak 🙌

// MembershipLabel is the label set on each member of this collection
// and can be used for fetching them.
// +optional
MembershipLabel map[string]string `json:"membershipLabel,omitempty"`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need this field in the status? AFAIU, it is always deterministic and we don't have to store this in status.

Copy link
Contributor Author

@pedjak pedjak Apr 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see #217 (comment)

IMHO, instead of relying on a convention, we can make the label discoverable by users.

@pedjak pedjak merged commit 753d884 into crossplane-contrib:main Apr 8, 2024
7 of 8 checks passed
@lnattrass
Copy link

lnattrass commented Apr 17, 2024

This is a weird question, but how is this intended to be used? 😅

Found this discussion..

mad01 pushed a commit to mad01/provider-kubernetes that referenced this pull request May 23, 2024
* Add `ObservedObjectCollection` API type

Objects in the collection are defined by:
* GVK
* optional namespace
* label selector

The objects are fetched using the specified provider config
and for the matched objects the provider creates counterpart
observe-only objects in the local cluster.

The created objects are owned by the collection resource and
reconciled as usual by the provider. They are labeled with a common label, so that they can be fetched easily.
The label is discoverable by reading  `.status.membershipLabel` field of `ObservedObjectCollection`.

Signed-off-by: Predrag Knezevic <[email protected]>
Signed-off-by: Alexander Brandstedt <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support observing child resources
8 participants