Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(delta): force push EDS once CDS sent for Ads #981

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

lukidzi
Copy link

@lukidzi lukidzi commented Aug 6, 2024

This is a widely known issue (#583). In the case of SOTW, it can be fixed by using a custom callback to force a version change. However, in the case of the delta, it's not possible because the previous version is stored in the streamState and the current version is set based on the resource. Since there were no endpoint changes simple version overriding doesn't force an update.

I'm looking for feedback on this approach.

@lukidzi lukidzi changed the title fix(delta): force push EDS once CDS is sent but EDS didn't fix(delta): force push EDS once CDS is changed but not EDS Aug 7, 2024
Signed-off-by: Lukasz Dziedziak <[email protected]>
@lukidzi lukidzi marked this pull request as ready for review August 29, 2024 19:47
@lukidzi lukidzi changed the title fix(delta): force push EDS once CDS is changed but not EDS fix(delta): force push EDS/RDS once CDS/LDS sent for Ads Aug 29, 2024
Signed-off-by: Lukasz Dziedziak <[email protected]>
@lukidzi
Copy link
Author

lukidzi commented Aug 29, 2024

@valerian-roche Hi, would you have some time to review this approach? I'm looking for feedback, and if you think there’s a better way to do it, I’d be happy to make changes.

@valerian-roche
Copy link
Contributor

@valerian-roche Hi, would you have some time to review this approach? I'm looking for feedback, and if you think there’s a better way to do it, I’d be happy to make changes.

Hey, this issue for EDS has been addressed in envoy now (though behind a runtime flag). Our private fork of go-controlplane had a workaround for it we were able to remove with it.
I'll take a look, but I don't think repushing all endpoints each time is realistic for some users (e.g. we have gateways with 1k+ clusters and 10s of thousands of endpoints). Our implementation on delta would only resend on impacting clusters. For routes I believe it would be less of an issue.
Can you assess if you still need the EDS side of it or if the fix from @adisuissa in this PR is sufficient (just noticed it got enabled by default for the next release)? For RDS I'd need to go back to envoy code to assess if it is still valid as some code was reworked since this issue

@lukidzi
Copy link
Author

lukidzi commented Aug 29, 2024

@valerian-roche Hi, would you have some time to review this approach? I'm looking for feedback, and if you think there’s a better way to do it, I’d be happy to make changes.

Hey, this issue for EDS has been addressed in envoy now (though behind a runtime flag). Our private fork of go-controlplane had a workaround for it we were able to remove with it. I'll take a look, but I don't think repushing all endpoints each time is realistic for some users (e.g. we have gateways with 1k+ clusters and 10s of thousands of endpoints). Our implementation on delta would only resend on impacting clusters. For routes I believe it would be less of an issue. Can you assess if you still need the EDS side of it or if the fix from @adisuissa in this PR is sufficient (just noticed it got enabled by default for the next release)? For RDS I'd need to go back to envoy code to assess if it is still valid as some code was reworked since this issue

Thanks for the response. I’m going to test the Envoy change and see if it helps. If it doesn’t resolve the issue, I can update this PR to send only impacted clusters.

@lukidzi lukidzi changed the title fix(delta): force push EDS/RDS once CDS/LDS sent for Ads fix(delta): force push EDS once CDS sent for Ads Sep 3, 2024
@lukidzi
Copy link
Author

lukidzi commented Sep 3, 2024

I've added a change to push only relevant endpoints. It seems the Envoy change fixes the issue, but only when initial_fetch_timeout is set. If this value is 0 or too long, your cluster might stay in the warming phase for an extended period. If you could take a look @valerian-roche

@lukidzi
Copy link
Author

lukidzi commented Sep 9, 2024

@valerian-roche I've applied a suggested change with sending only changed endpoints if you can take a look would be great :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants