Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stability: Resolve Envoy Test Service Data Race #8468

Merged
merged 4 commits into from
Jul 14, 2023

Conversation

sam-heilbron
Copy link
Contributor

@sam-heilbron sam-heilbron commented Jul 14, 2023

Description

Resolve data race in test service used in e2e tests

Context

The envoy.Factory is responsible for producing an envoy.Instance with necessary configuration. Historically, it also tracked each instance that was created, and when the Factory was cleaned up, it would go to each instance it was aware of and call instance.Clean on that instance. Cleaning an Envoy instance amounts to tearing down the process that is running the service.

However, when an Envoy instance is Started, it is provided a context, and when that context is cancelled the service is auto-cleaned up (https://github.com/solo-io/gloo/blob/main/test/services/envoy/instance.go#L136). This way, we only have to worry about the test context, and all other resource lifecycles are tied to it.

The race occurred because the instances cleans itself up (when the test context is cancelled) AND the factory tries to call clean on the instance when the test suite is complete. Having both of these is actually unnecessary, so I just removed the implementation of the factory cleaning up the instances, since we expect instances to clean themselves up.

Checklist:

  • I included a concise, user-facing changelog (for details, see https://github.com/solo-io/go-utils/tree/main/changelogutils) which references the issue that is resolved.
  • If I updated APIs (our protos) or helm values, I ran make -B install-go-tools generated-code to ensure there will be no code diff
  • I followed guidelines laid out in the Gloo Edge contribution guide
  • I opened a draft PR or added the work in progress label if my PR is not ready for review
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have added tests that prove my fix is effective or that my feature works

BOT NOTES:
resolves https://github.com/solo-io/solo-projects/issues/5177

@solo-changelog-bot
Copy link

Issues linked to changelog:
https://github.com/solo-io/solo-projects/issues/5177

@github-actions github-actions bot added the keep pr updated signals bulldozer to keep pr up to date with base branch label Jul 14, 2023
@nfuden
Copy link
Contributor

nfuden commented Jul 14, 2023

While I like the general idea it seems correct that the manager would have some way to clean up its instances.
I guess this comes down to whether this is a simple factory that fires and forgets its envoy instances or a manager of them.

Is that something we are fine with?
if so cool, if not perhaps we just add another signal or some way for it to validate that it's subinstances are completely gone.

@github-actions
Copy link

Visit the preview URL for this PR (updated for commit 2e5ff81):

https://gloo-edge--pr8468-stability-envoy-data-b5lgslmg.web.app

(expires Fri, 21 Jul 2023 14:05:26 GMT)

🔥 via Firebase Hosting GitHub Action 🌎

Sign: 77c2b86e287749579b7ff9cadb81e099042ef677

Copy link
Contributor

@nfuden nfuden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently our test factories implicitly only manage a building pattern. Ie the factories have no lifecycle management and their responsibility ends once they create an instance that the test can use. Given this it seems this change is good and this also makes instance creation safe to run in parallel and for clean to be called multiple times

@soloio-bulldozer soloio-bulldozer bot merged commit 49949af into main Jul 14, 2023
11 checks passed
@soloio-bulldozer soloio-bulldozer bot deleted the stability/envoy-data-race branch July 14, 2023 14:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
keep pr updated signals bulldozer to keep pr up to date with base branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants