-
Notifications
You must be signed in to change notification settings - Fork 437
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Faulty RemoteJwks first try will never be retried #7803
Comments
This might be a dupe |
We have a KCS article on zendesk for this. It is probably the "dupe" ;) |
Similar issue: #7528 |
Additional context: https://solo-io-corp.slack.com/archives/C02LQ0JCNLF/p1694578006362599 |
Can still reproduce this on GE 1.15.14. The problem seems to be that when the ExtAuth server loads with an AuthConfig that points to a JWKS endpoint that is not available/reachable, the config does not get loaded. And when the config does not get loaded, the
Note that when you bring down Keycloak after the AuthConfig has already been loaded, the ExtAuth server will start giving errors that it can't refresh, but when you bring Keycloak back up again, it's able to refresh again. The main question seems to be if we want to accept AuthConfigs that point to non-reachable endpoints. We can't really determine whether the AuthConfig is incorrect, or whether there is an issue with the target endpoint. |
Reproducer: https://github.com/DuncanDoyle/ge-gloo-7803 |
We should have a first time start up version of extauth that forces authconfigs to keep retrying and not fail like they normally would in a case where we are applying new configuration |
➤ Hanh Vu commented: ETA of 4/12 for design review. |
Outcome of design review was that the ideal approach for handling the situtation where the auth service is updated with a non-responding JWKs URL is to keep the new AuthService in a pending state until it can retrieve the URLs. This requires changes in how we generate/translate/communicate the new AuthConfigs. The plan is to implement these structural changes in a separate PR and then add the JWKs specific changes on top of that. This will require 3 rounds of PRs in the main branches:
|
@DuncanDoyle - The changes needed to implement this are the type of structural changes that we usually don't like to implement in backports. In this case we are making non-trivial modifications to the ExtAuth pod's xds event loop, and the alternative would involve breaking changes to the exported Generator or Translator interfaces that would normally only accompany a major version update. How big of an ask would it be to make these changes 1.17 only? @kcbabo - tagging you too while Duncan is on vacation. |
➤ Nathan F Solo commented: As there are some interstitial prs hence we are pushing the final due Keith Babo |
First solo-projects PR merged (SP1 from #7803 (comment)) merged, EXT1 in review |
The Ext Auth changes have been merged, functional changes for the last PR are in place, spiffing up the e2e tests. |
This has been merged to |
Since this materially changes our extauth service's behavior this has been merged to main and will not be backported to 1.15. |
Gloo Edge Version
1.13.x (latest stable)
Kubernetes Version
None
Describe the bug
If the
RemoteJwks
url is not reachable at theextauth
service launch, it will throw an errorfailed to fetch JWKS
but will never retried to get the the jwks even ifrefreshInterval
is set.Then every call through a route using the
AuthConfig
will get a403
error.On
ext-auth
logs :On
gateway-proxy
logs :Steps to reproduce the bug
AuthConfig
as-is :And a VirtualService that is using it :
failed to fetch JWKS
error on ext-auth logsrefreshInterval
UAEX
errors ongateway-proxy
pod andAuth Server does not contain auth configuration with the given ID
errors on ext-auth one(6. If you make keycloak available again and then restart the
ext-auth
pod, it will fix the issue)Expected Behavior
Ext-auth pod should retry to get the jwks based on
refreshInterval
value so we can get through the authentication process and end up with200
without having to restart the ext-auth pod.When the first try is faulty it looks like the "refresh loop" is not launch at all.
Additional Context
No response
Related Issues
┆Issue is synchronized with this Asana task by Unito
The text was updated successfully, but these errors were encountered: