Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(authz): Authorino for Service Mesh #784

Merged
merged 37 commits into from
Feb 19, 2024

Conversation

israel-hdez
Copy link
Contributor

@israel-hdez israel-hdez commented Dec 11, 2023

Description

This first iteration is to cover authentication needs for KServe

  • Add templates to install Authorino
  • Add templates to configure Service Mesh to use Authorino to delegate Authorization
  • Add KServe-specific templates add ability to secure KServe Inference Services
  • Add relevant fields to DSCInitialization resource
  • Code for proper cleanup, in case of uninstalling

Most (if not all) of this code comes from pull request #605. Attribution to original authors: @bartoszmajsak, @aslakknutsen, @cam-garrison, et. al.

Related opendatahub-io/kserve#128
Fixes https://issues.redhat.com/browse/RHOAIENG-1856

How Has This Been Tested?

TODO

Merge criteria:

  • The commits are squashed in a cohesive manner and have meaningful messages.
  • Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
  • The developer has manually tested the changes and verified that the changes work

@zdtsw
Copy link
Member

zdtsw commented Dec 12, 2023

are we gonna have 605 for UI+workbench and this for kserve? for the overlapping part, should we have 605 in first then rebase for this one?

@israel-hdez
Copy link
Contributor Author

@zdtsw

are we gonna have 605 for UI+workbench and this for kserve? for the overlapping part, should we have 605 in first then rebase for this one?

The plan was to merge this first, and rebase 605. However, we are reviewing the direction. Hold-on for now.

pkg/deploy/setup.go Outdated Show resolved Hide resolved
@israel-hdez israel-hdez force-pushed the authorino-setup branch 2 times, most recently from 51c11a9 to 506c1c4 Compare January 16, 2024 19:14
controllers/dscinitialization/servicemesh_setup.go Outdated Show resolved Hide resolved
pkg/feature/builder.go Outdated Show resolved Hide resolved
Copy link
Contributor

@bartoszmajsak bartoszmajsak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for skillfully refining #605 and contributing those parts to incubation, @israel-hdez! 😊

I've reflected on the (mostly mine) code from a broader time perspective and shared a few insights. It would be fantastic if you could help tidy up the areas where I've left a bit of a mess behind.

components/kserve/kserve.go Outdated Show resolved Hide resolved
infrastructure/v1/servicemesh_types.go Show resolved Hide resolved
pkg/deploy/setup.go Outdated Show resolved Hide resolved
pkg/feature/builder.go Outdated Show resolved Hide resolved
pkg/feature/feature.go Outdated Show resolved Hide resolved
pkg/feature/servicemesh/cleanup.go Outdated Show resolved Hide resolved
pkg/feature/servicemesh/cleanup.go Outdated Show resolved Hide resolved
pkg/feature/types.go Outdated Show resolved Hide resolved
pkg/feature/types.go Outdated Show resolved Hide resolved
@bartoszmajsak bartoszmajsak requested review from aslakknutsen and removed request for etirelli January 17, 2024 17:21
@bartoszmajsak
Copy link
Contributor

Added @aslakknutsen as a reviewer so he can double check if service mesh resources for kserve are in the right state.

@bartoszmajsak
Copy link
Contributor

@zdtsw this PR depends on opendatahub-io/odh-model-controller#130

@israel-hdez
Copy link
Contributor Author

@bartoszmajsak Ready for another review round.

@@ -0,0 +1,18 @@
apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ReToCode @skonto So, this AuthorizationPolicy will allow to protect InferenceServices behind auth (per user desire). The pod(s) of the ISVC will be protected, with the exception of the two paths mentioned below.

Would this be OK for KNative? I saw the queue-proxy container in the pod and was wondering if there would be any issues with it, or something else. On my naive testings, everything worked fine, but I wanted to check with you about it.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this only going to run on the main containerPorts traffic (default 8080) or also on the other ones?

For example we also have traffic on:

  • admin port 8022
  • http2 on 8013
  • metrics ports 9090/9091
  • /debug/* (if you enable profiling)

Anything I've forgotten @skonto?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be intercepting everything.

So, I would infer that the default port of 8080 would be used only if the main container does not specify a port, right? From KNative upstream docs, I see that you can set a containerPort in the PodSpec... May that user-defined port be used transparently?? However, I see you mention 8013 for http2. Are apps required to use that for http2??? Just asking for my knowledge.

In any case, if the user can choose the port, that would be the motivation for the catch-all.

AFAIK, we can whitelist ports -- @aslakknutsen may confirm. That said, this is cluster-wide config. So, to minimize blackholes, we want to protect everything but ports/paths that would always work for infrastructure means.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ports I mentioned are on the Queue-Proxy sidecar. Knative will call the ports depending on the traffic type. So the actual user container will run as 8080 (or as defined by the user), but we'll send traffic to Queue-Proxy ports first. But if the rules are not port-related, we need to focus on the list in #784 (comment)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but we'll send traffic to Queue-Proxy ports first.

You mean that traffic will arrive first to queue-proxy's port 8080 for HTTP1, or 8013 for HTTP2; and later is forwarded to the user container to whatever port it is defined in its spec? Asking just for my knowledge.

I just updated the AuthorizationPolicy (as noted in my other comment). Please, check if that would be OK for you :-)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean that traffic will arrive first to queue-proxy's port 8080 for HTTP1, or 8013 for HTTP2; and later is forwarded to the user container to whatever port it is defined in its spec? Asking just for my knowledge.

Exactly.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anything I've forgotten @skonto?

That is fine. To add some details.

        // BackendHTTPPort is the backend, i.e. `targetPort` that we setup for HTTP/1 services.
	BackendHTTPPort = 8012

	// BackendHTTP2Port is the backend, i.e. `targetPort` that we setup for HTTP/2 services.
	BackendHTTP2Port = 8013

	// BackendHTTPSPort is the backend. i.e. `targetPort` that we setup for HTTPS services.
	BackendHTTPSPort = 8112

	// QueueAdminPort specifies the port number for
	// health check and lifecycle hooks for queue-proxy.
	QueueAdminPort = 8022

	// AutoscalingQueueMetricsPort specifies the port number for metrics emitted
	// by queue-proxy for autoscaler.
	AutoscalingQueueMetricsPort = 9090

	// UserQueueMetricsPort specifies the port number for metrics emitted
	// by queue-proxy for end user.
	UserQueueMetricsPort = 9091

BackendHTTPSPort is not being used with Istio, it is meant for internal encryption with Kourier.

For profiling if enabled:

	// ProfilingPort specifies the default port where profiling data is available when profiling is enabled
	ProfilingPort = 8008

@israel-hdez
Copy link
Contributor Author

are we gonna have 605 for UI+workbench and this for kserve? for the overlapping part, should we have 605 in first then rebase for this one?

The plan was to merge this first, and rebase 605. However, we are reviewing the direction. Hold-on for now.

BTW @zdtsw, this is alive again.
The commitment is to have this PR ready and merged by 2.7 timeline. It is desired to also deliver it to 2.7, but if timing is too tight, we may delay delivery to 2.8 -- regardless of the delay, the code would still be expected to be merged by 2.7 timeline.

AFAIK, there are still no plans to merge #605 within these timelines -- although @bartoszmajsak would correct me if I'm wrong.

Copy link
Contributor

@bartoszmajsak bartoszmajsak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few smaller things, but other than that /lgtm. Sorry for round-tripping with the naming, but it just occurred to me.

I will think of what kind of tests we should write, but we can work on this together as a separate PR.

components/kserve/kserve.go Outdated Show resolved Hide resolved
@bartoszmajsak
Copy link
Contributor

AFAIK, there are still no plans to merge #605 within these timelines -- although @bartoszmajsak would correct me if I'm wrong.

@israel-hdez we keep it as an organ donor and a rebase torture for now ;)

@israel-hdez
Copy link
Contributor Author

@israel-hdez could you add jira ticket into PR description for tracking ?

@zdtsw I just added it.
I hope we can catch the upcoming ODH community release 😉

controllers/dscinitialization/servicemesh_setup.go Outdated Show resolved Hide resolved
controllers/dscinitialization/servicemesh_setup.go Outdated Show resolved Hide resolved
components/kserve/servicemesh_setup.go Outdated Show resolved Hide resolved
return k.removeServiceMeshConfigurations(dscispec)
}

func (k *Kserve) removeServiceMeshConfigurations(dscispec *dsciv1.DSCInitializationSpec) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i did a test: to set servicemesh to Removed, but the FT "configure-kserve-for-external-authz" did not get deleted. is this correct?
till i set kserve to Removed, FT are gone.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it is correct nor incorrect, but the opposite to that 😄

i did a test: to set servicemesh to Removed

If there is no mesh, my expectation would be that the FT and its associated resources must be removed. I think we've been OK with dealing with the DSCI and the DSC separately. However, this is introducing some cohesion and, now, changes to the DSCI should trigger a reconcile to the DSC.

This is declared in the setup of the manager. I haven't reviewed that code.

You may do the following test: set ServiceMesh to Removed and, instead of setting KServe to Removed, simply add a dummy annotation to the DSC (to force its reconciliation). Most likely, you'll see that the FT will vanish.


Analyzing deeper, I think we shouldn't let the user to freely flip settings. E.g. setting the Mesh to Removed while setting Serverless to Managed should not be a valid state. With Vedant's PR #864 coming, the number of invalid states (and needed guards) increases.

I think we should be more proactive on helping the user and start thinking on adding a validating webhook, so that the user gets feedback on invalid configs when doing kubectl apply, rather than waiting for a reconciliation to happen and looking at the events/status of the DSCI/DSC. If you think this is worth to do, I can volunteer on doing the coding, since most (or all?) the things that require validation are on serving side.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With Vedant's PR #864 coming, the number of invalid states (and needed guards) increases.

We really need to bring validating webhook to handle such cases there instead. We could have a set of rules there and not even allow to save invalid DSC/DSCI on the cluster.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With Vedant's PR #864 coming, the number of invalid states (and needed guards) increases.

We really need to bring validating webhook to handle such cases there instead. We could have a set of rules there and not even allow to save invalid DSC/DSCI on the cluster.

webhook with one simple check has been in review for a while and you even looked in it (#711)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

webhook with one simple check has been in review for a while and you even looked in it (#711)

Ah, this is good.
So, once it is merged, I just need to add serving-specific validations.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

webhook with one simple check has been in review for a while and you even looked in it (#711)

Ah, this is good. So, once it is merged, I just need to add serving-specific validations.

Join the review ;)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#711 is now merged

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, too late to join the review 😅

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With Vedant's PR #864 coming, the number of invalid states (and needed guards) increases.

We really need to bring validating webhook to handle such cases there instead. We could have a set of rules there and not even allow to save invalid DSC/DSCI on the cluster.

webhook with one simple check has been in review for a while and you even looked in it (#711)

I know I did :) it's my subtle attempt to make it shipped to incubation faster. It seems it worked.

Copy link
Contributor

@bartoszmajsak bartoszmajsak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few small nits :) Thanks for your patience @israel-hdez.

Comment on lines +18 to +20
if dscispec.ServiceMesh.ManagementState == operatorv1.Unmanaged && k.GetManagementState() == operatorv1.Managed {
return nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't this be an unsupported state worth reporting somehow?

Copy link
Contributor Author

@israel-hdez israel-hdez Feb 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My rationale was for the case when the user started with a Managed Mesh. Then, the user wanted to flip from Managed to Unmanaged so that odh-operator steps down and the user takes over on the setup.

Although, thinking more, probably this should also be applicable for the case when the DSCI is created with an unmanaged mesh from the beginning (i.e. they bring their own mesh). If this is the case, I think the user would be responsible for doing the setup of Mesh+Authorino+the filters involved here.

name: activator-host-header
namespace: {{ .ControlPlane.Namespace }}
labels:
app.opendatahub.io/kserve: "true"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this required? IIRC this is to mark shared resources managed by the operator, but in this case I do not think there is more than one owner for this Fitler. cc @VaishnaviHire

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know. I simply looked at KServe installation, and saw it has this label in the resources.

It seems to be injected at deployment here:

if err := plugins.ApplyAddLabelsPlugin(componentName, resMap); err != nil {

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like this template is only used for servicemesh/kserve, if it is not gonna expanded for multiple component, we could skip this label here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So that label is added using ApplyAddLabelsPlugin to the manifests and as bartosz pointed out is useful when a resource is a shared resource.

@@ -21,6 +22,15 @@ const (
duration = 5 * time.Minute
)

func EnsureAuthNamespaceExists(f *feature.Feature) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@israel-hdez there is .TargetNamespace coming to the builder in #867 which will work for you, so depending which PR gets merged first, we should remember to clean it up

/cc @cam-garrison

type AuthSpec struct {
// Namespace where it is deployed. If not provided, the default is to
// use '-auth-provider' suffix on the ApplicationsNamespace of the DSCI (e.g. opendatahub-auth-provider).
// The '-applications' suffix is removed from ApplicationsNamespace.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm leaning towards the view that the additional lines of code introduced by ResolveAuthNamespace function might not justify the benefits, especially so if it comes down to personal preference. So if there's no requirement nor any actual problem we are solving with this maybe it's better to just add a suffix? Less code, less reasons to worry :)

"github.com/opendatahub-io/opendatahub-operator/v2/pkg/gvr"
)

var log = ctrlLog.Log.WithName("features")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have log in the feature itself now.

Suggested change
var log = ctrlLog.Log.WithName("features")

return err
}
if !found {
log.Info("no extension providers found", "feature", f.Name, "control-plane", mesh.Name, "namespace", mesh.Namespace)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
log.Info("no extension providers found", "feature", f.Name, "control-plane", mesh.Name, "namespace", mesh.Namespace)
f.Log.Info("no extension providers found", "control-plane", mesh.Name, "namespace", mesh.Namespace)

for i, v := range extensionProviders {
extensionProvider, ok := v.(map[string]interface{})
if !ok {
log.Info("WARN: Unexpected type for extensionProvider will not be removed")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
log.Info("WARN: Unexpected type for extensionProvider will not be removed")
f.Log.Info("WARN: Unexpected type for extensionProvider will not be removed")

Comment on lines 26 to 33
if err != nil {
if k8serrors.IsNotFound(err) {
// Since the configuration of the extension provider is a patch, it could happen that
// the SMCP is already gone, and there will be nothing to unpatch.
return nil
}
return err
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! So I will pay back for fixing my mistake with a little simplification :)

Suggested change
if err != nil {
if k8serrors.IsNotFound(err) {
// Since the configuration of the extension provider is a patch, it could happen that
// the SMCP is already gone, and there will be nothing to unpatch.
return nil
}
return err
}
if err != nil {
// Since the configuration of the extension provider is a patch, it could happen that
// the SMCP is already gone, and there will be nothing to unpatch.
return client.IgnoreNotFound(err)
}

Comment on lines 61 to 63
_, err = f.DynamicClient.Resource(gvr.SMCP).
Namespace(mesh.Namespace).
Update(context.TODO(), smcp, metav1.UpdateOptions{})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For transparency, this will have to change #868, but I've got your back here.

@bartoszmajsak
Copy link
Contributor

@zdtsw @VaishnaviHire there's one thing which is holding this PR from merging downstream. We haven't figured out yet how we want to install dependent Authorino Operator. This needs to be resolved in tandem. If we are fine with putting it in incubation and document how to use Community Operator somewhere in README here, then it's good to go.

@VaishnaviHire
Copy link
Member

@zdtsw @VaishnaviHire there's one thing which is holding this PR from merging downstream. We haven't figured out yet how we want to install dependent Authorino Operator. This needs to be resolved in tandem. If we are fine with putting it in incubation and document how to use Community Operator somewhere in README here, then it's good to go.

Yes that works. Maybe we can add a README on installation of dependent operators.

* Use feature logger
* Don't trim -applications suffix on ResolveAuthNamespace

Signed-off-by: Edgar Hernández <[email protected]>
@israel-hdez
Copy link
Contributor Author

@zdtsw @bartoszmajsak I fixed your feedback.

I'm leaning towards the view that the additional lines of code introduced by ResolveAuthNamespace function might not justify the benefits, especially so if it comes down to personal preference. So if there's no requirement nor any actual problem we are solving with this maybe it's better to just add a suffix? Less code, less reasons to worry :)

@bartoszmajsak OK, I removed the trim to the suffix.

@israel-hdez
Copy link
Contributor Author

/retest

Copy link

openshift-ci bot commented Feb 19, 2024

[APPROVALNOTIFIER] This PR is APPROVED

Approval requirements bypassed by manually added approval.

This pull-request has been approved by: zdtsw

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot removed the lgtm label Feb 19, 2024
Copy link

openshift-ci bot commented Feb 19, 2024

New changes are detected. LGTM label has been removed.

@VaishnaviHire VaishnaviHire merged commit e32a7c2 into opendatahub-io:incubation Feb 19, 2024
6 of 7 checks passed
@israel-hdez israel-hdez deleted the authorino-setup branch February 19, 2024 23:24
VaishnaviHire pushed a commit to VaishnaviHire/opendatahub-operator that referenced this pull request Mar 11, 2024
* feat(authz): Authorino for Service Mesh

This first iteration is to cover authentication needs for KServe

* Add templates to install Authorino
* Add templates to configure Service Mesh to use Authorino to delegate Authorization
* Add KServe-specific templates add ability to secure KServe Inference Services
* Add relevant fields to DSCInitialization resource
* Code for proper cleanup, in case of uninstalling

Most (if not all) of this code comes from pull request opendatahub-io#605. Attribution to original authors: @bartoszmajsak, @aslakknutsen, @cam-garrison, et. al.

Related opendatahub-io/kserve#128

Signed-off-by: Edgar Hernández <[email protected]>

* Fix linter issues

Signed-off-by: Edgar Hernández <[email protected]>

* Resolve feedback: Bartosz

Signed-off-by: Edgar Hernández <[email protected]>

* fix: Remove port from the authorization policy

Also, add `/metrics` to the ignored paths for auth.

Signed-off-by: Edgar Hernández <[email protected]>

* Fix feedback: Bartosz

Signed-off-by: Edgar Hernández <[email protected]>

* More feedback: Bartosz

Co-authored-by: Bartosz Majsak <[email protected]>

* Fix feedback: Reto - Adjust AuthorizationPolicy

Signed-off-by: Edgar Hernández <[email protected]>

* Fix more feedback: Bartosz

- Remove Authorino namespace field from DSCI.
- Move around some code in kserve.go to servicemesh_setup.go

Signed-off-by: Edgar Hernández <[email protected]>

* chore: adds sec. prefix to authorino label selector

* fix: adds base dir to manifest sources

* chore: uses security instead of sec as a prefix in authorino label

* fix: /healthz is called by _something_, skipp

* fix: adopt ODH-ADR-0006 for clean up label

* fix: uses correct CRD name for authconfigs

Co-authored-by: Cameron Garrison <[email protected]>

* Remove left-over file

Signed-off-by: Edgar Hernández <[email protected]>

* Feedback: remove auth-refs ConfigMap

Signed-off-by: Edgar Hernández <[email protected]>

* Add missing role.yaml changes

Signed-off-by: Edgar Hernández <[email protected]>

* Go back to installing Authorino on its own namespace

Signed-off-by: Edgar Hernández <[email protected]>

* Feedback: Add clean-up for KServe/OSSM-auth

Signed-off-by: Edgar Hernández <[email protected]>

* Feedback: Simplify namings

Signed-off-by: Edgar Hernández <[email protected]>

* fix: add auth-refs cm

* Feedback: adjust labels and a log message

Signed-off-by: Edgar Hernández <[email protected]>

* Bugfix: Extension provider terminating with error when SMCP is gone

Signed-off-by: Edgar Hernández <[email protected]>

* Fix: add missing RBAC for ConfigMaps func

Signed-off-by: Edgar Hernández <[email protected]>

* Fix: Run `make bundle` and commit resulting changes

Signed-off-by: Edgar Hernández <[email protected]>

* Feedback: Wen - Better feature namings

Signed-off-by: Edgar Hernández <[email protected]>

* Feedback: Bartosz

* Use feature logger
* Don't trim -applications suffix on ResolveAuthNamespace

Signed-off-by: Edgar Hernández <[email protected]>

* Feedback: Wen - revert image placeholder was replaced

Signed-off-by: Edgar Hernández <[email protected]>

---------

Signed-off-by: Edgar Hernández <[email protected]>
Co-authored-by: Bartosz Majsak <[email protected]>
Co-authored-by: Aslak Knutsen <[email protected]>
Co-authored-by: Cameron Garrison <[email protected]>
(cherry picked from commit e32a7c2)
zdtsw added a commit to red-hat-data-services/rhods-operator that referenced this pull request Mar 12, 2024
* Update bundle

* feat(authz): Authorino for Service Mesh (opendatahub-io#784)

* feat(authz): Authorino for Service Mesh

This first iteration is to cover authentication needs for KServe

* Add templates to install Authorino
* Add templates to configure Service Mesh to use Authorino to delegate Authorization
* Add KServe-specific templates add ability to secure KServe Inference Services
* Add relevant fields to DSCInitialization resource
* Code for proper cleanup, in case of uninstalling

Most (if not all) of this code comes from pull request opendatahub-io#605. Attribution to original authors: @bartoszmajsak, @aslakknutsen, @cam-garrison, et. al.

Related opendatahub-io/kserve#128

Signed-off-by: Edgar Hernández <[email protected]>

* Fix linter issues

Signed-off-by: Edgar Hernández <[email protected]>

* Resolve feedback: Bartosz

Signed-off-by: Edgar Hernández <[email protected]>

* fix: Remove port from the authorization policy

Also, add `/metrics` to the ignored paths for auth.

Signed-off-by: Edgar Hernández <[email protected]>

* Fix feedback: Bartosz

Signed-off-by: Edgar Hernández <[email protected]>

* More feedback: Bartosz

Co-authored-by: Bartosz Majsak <[email protected]>

* Fix feedback: Reto - Adjust AuthorizationPolicy

Signed-off-by: Edgar Hernández <[email protected]>

* Fix more feedback: Bartosz

- Remove Authorino namespace field from DSCI.
- Move around some code in kserve.go to servicemesh_setup.go

Signed-off-by: Edgar Hernández <[email protected]>

* chore: adds sec. prefix to authorino label selector

* fix: adds base dir to manifest sources

* chore: uses security instead of sec as a prefix in authorino label

* fix: /healthz is called by _something_, skipp

* fix: adopt ODH-ADR-0006 for clean up label

* fix: uses correct CRD name for authconfigs

Co-authored-by: Cameron Garrison <[email protected]>

* Remove left-over file

Signed-off-by: Edgar Hernández <[email protected]>

* Feedback: remove auth-refs ConfigMap

Signed-off-by: Edgar Hernández <[email protected]>

* Add missing role.yaml changes

Signed-off-by: Edgar Hernández <[email protected]>

* Go back to installing Authorino on its own namespace

Signed-off-by: Edgar Hernández <[email protected]>

* Feedback: Add clean-up for KServe/OSSM-auth

Signed-off-by: Edgar Hernández <[email protected]>

* Feedback: Simplify namings

Signed-off-by: Edgar Hernández <[email protected]>

* fix: add auth-refs cm

* Feedback: adjust labels and a log message

Signed-off-by: Edgar Hernández <[email protected]>

* Bugfix: Extension provider terminating with error when SMCP is gone

Signed-off-by: Edgar Hernández <[email protected]>

* Fix: add missing RBAC for ConfigMaps func

Signed-off-by: Edgar Hernández <[email protected]>

* Fix: Run `make bundle` and commit resulting changes

Signed-off-by: Edgar Hernández <[email protected]>

* Feedback: Wen - Better feature namings

Signed-off-by: Edgar Hernández <[email protected]>

* Feedback: Bartosz

* Use feature logger
* Don't trim -applications suffix on ResolveAuthNamespace

Signed-off-by: Edgar Hernández <[email protected]>

* Feedback: Wen - revert image placeholder was replaced

Signed-off-by: Edgar Hernández <[email protected]>

---------

Signed-off-by: Edgar Hernández <[email protected]>
Co-authored-by: Bartosz Majsak <[email protected]>
Co-authored-by: Aslak Knutsen <[email protected]>
Co-authored-by: Cameron Garrison <[email protected]>
(cherry picked from commit e32a7c2)

* fix(authz): Fix broken external auth configuration

There are two misconfigurations being fixed:
* In the SMCP, the service hostname of Authorino was coded with `-authorization` suffix, but the right suffix is `-authorino-authorization`.
* In the `kserve-predictor` AuthorizationPolicy, the hardcoded `opendatahub-odh-auth-provider` provider name was used, but it should have been the template `{{ .AppNamespace }}-auth-provider`.

In `pkg/feature/feature.go` the patch manifests (i.e. the ones containing `.patch` in the filename) are always applied. Thus, the first bullet is solved by fixing the patch file that adds the `extensionProvider` to the SMCP.

For the second bullet, the faulty AuthorizationPolicy is created with a regular manifest template which is only applied if the resource does not exist. Thus, a patch manifest is added to properly fix the faulty policy (including operator upgrades).

Signed-off-by: Edgar Hernández <[email protected]>
(cherry picked from commit e4252a0)

* fix: Rework operator precondition checks (opendatahub-io#899)

* init commit

* tmp: switch to subsciption

* tmp

* fix up testing

* linter on import

* minor self nits

* add bracket, make

* use found,err for checking subscription

Co-authored-by: Bartosz Majsak <[email protected]>

* fix import + test error expected outputs

* directly return errs rather than log and ret

Co-authored-by: Bartosz Majsak <[email protected]>

* remove unused log var from condiitons

* move const fixtures to separate package

* move creating op subscription to function

* rename noop features in testing

* remove redundant comments

Co-authored-by: Bartosz Majsak <[email protected]>

* move CreateSubscription to fixtures

---------

Co-authored-by: Bartosz Majsak <[email protected]>
(cherry picked from commit f44528e)

* chore: follow up review comments from previous PR (opendatahub-io#858)

* update: follow up comments

- cleanup commented out code
- rename function
- cleanup unnecessary sleep

Signed-off-by: Wen Zhou <[email protected]>

* update: add check on return err + remove apierrs.IsNotFound check

Signed-off-by: Wen Zhou <[email protected]>

* Update pkg/deploy/deploy.go

Co-authored-by: Bartosz Majsak <[email protected]>

* update(review): create new function DeleteSubscription

Signed-off-by: Wen Zhou <[email protected]>

* update: return for get and delete subscription

- get: return 'sub, nil' or 'nil, err' here error can be real one or
notfound

Signed-off-by: Wen Zhou <[email protected]>

* Update pkg/deploy/deploy.go

Co-authored-by: Bartosz Majsak <[email protected]>

* fix(linter)

Signed-off-by: Wen Zhou <[email protected]>

---------

Signed-off-by: Wen Zhou <[email protected]>
Co-authored-by: Bartosz Majsak <[email protected]>
(cherry picked from commit a81a3da)

* fix(authz): ensures extauthz provider is removed from control plane during cleanup (opendatahub-io#905)

### Renames migration folder
The reason for this is to have a simple naming convention instead of suggesting storing migration patches in dedicated folders named after tickets.

Additionally, the feature explicitly orders files instead of assuming that the underlying fsys implementation fulfills such a contract.

### Ports opendatahub-io#605 test for extension provider

This test ensures the addition of an extension provider for external authorization and that it is removed from the control plane properly using a custom cleanup function.
We have missed it in the original work.

### Fix: aligns provider name between template and cleanup logic

This is short-term fix for the existing codebase. In the long term (which is actively worked on) we need to improve the way of how we are storing config information to limit cases where we rely on pre/suffixes. Cases like this should be kept as its own thing instead, as it represents the concept in the infrastructure/authz setup.

* chore: indentation

Signed-off-by: Wen Zhou <[email protected]>

* fix: use old package path till we cherry-pick refactor commit

Signed-off-by: Wen Zhou <[email protected]>

---------

Signed-off-by: Wen Zhou <[email protected]>
Co-authored-by: Edgar Hernández <[email protected]>
Co-authored-by: Edgar Hernández <[email protected]>
Co-authored-by: Cameron Garrison <[email protected]>
Co-authored-by: Wen Zhou <[email protected]>
Co-authored-by: Bartosz Majsak <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.