Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Api availability measurement #1162

Merged

Conversation

vamossagar12
Copy link
Contributor

This PR is for #1096 which created a new measurement to track the availability of api-server by probing the /healthz end point

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Apr 7, 2020
@k8s-ci-robot
Copy link
Contributor

Hi @vamossagar12. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Apr 7, 2020
@mm4tt
Copy link
Contributor

mm4tt commented Apr 7, 2020

/ok-to-test

Thanks, @vamossagar12!

I'll try to take a look tomorrow. Adding Wojtek in case he has capacity to take a look today
/assign @wojtek-t

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Apr 7, 2020
@oxddr
Copy link
Contributor

oxddr commented Apr 9, 2020

/uncc

@k8s-ci-robot k8s-ci-robot removed the request for review from oxddr April 9, 2020 11:20
@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Apr 24, 2020
@vamossagar12 vamossagar12 force-pushed the api-availability-measurement branch 2 times, most recently from beacee0 to c54da97 Compare April 24, 2020 16:12
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Apr 24, 2020
@wojtek-t
Copy link
Member

@vamossagar12 - sorry for delay, I'm pretty overloaded. I will try to look into it today or tomorrow

Copy link
Member

@wojtek-t wojtek-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vamossagar12 - made a pass on it, sorry for long delay

@vamossagar12
Copy link
Contributor Author

Thanks @wojtek-t I have more or less implemented the changes you suggested. Just had a couple of questions post which I can push the changes.

@vamossagar12
Copy link
Contributor Author

@wojtek-t i made the changes but not able to push them. Keep getting this error after rebasing with the latest from master:

.git/hooks/pre-push: line 11: ./verify/verify-boilerplate.sh: No such file or directory

I see some changes have been made recently in the /verify folder but not able to figure out what the problem is.

@wojtek-t
Copy link
Member

@vamossagar12 - the verify-boilerplace.sh script still exists and wasn't touched:
https://github.com/kubernetes/perf-tests/blob/master/verify/verify-boilerplate.sh

BTW - the hooks must be something in your local environment - I don't have them set up.

@vamossagar12
Copy link
Contributor Author

@vamossagar12 - the verify-boilerplace.sh script still exists and wasn't touched:
https://github.com/kubernetes/perf-tests/blob/master/verify/verify-boilerplate.sh

BTW - the hooks must be something in your local environment - I don't have them set up.

Well, actually I set the hooks up since they were mentioned in the README of perf-tests. If it isn't needed, I can remove it. Would it have any implications though?

@wojtek-t
Copy link
Member

Well, actually I set the hooks up since they were mentioned in the README of perf-tests. If it isn't needed, I can remove it. Would it have any implications though?

We have presubmits anyway - so these should be caught by them anyway.

@k8s-ci-robot k8s-ci-robot removed the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label May 21, 2020
Copy link
Member

@wojtek-t wojtek-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also please squash commits

@vamossagar12
Copy link
Contributor Author

Also please squash commits

Done. Had a couple of questions.

Copy link
Member

@wojtek-t wojtek-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for delay - I was on vacation for last 2.5 weeks.

I added some comments, but i'm mostly fine with what is now. Please change the PR out of Draft and let's proceed with it.

@vamossagar12 vamossagar12 marked this pull request as ready for review July 21, 2020 12:12
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 21, 2020
@vamossagar12
Copy link
Contributor Author

Sorry for delay - I was on vacation for last 2.5 weeks.

I added some comments, but i'm mostly fine with what is now. Please change the PR out of Draft and let's proceed with it.

No worries :) I have made the changes and also removed the PR out of draft.

content, err := util.PrettyPrintJSON(hostAvailabilityMetricsOutput)
if err != nil {
createMasterAvailabilitySummaryErrors.Append(err)
} else{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran gofmt -s for these 2 files but this error still got thrown. Is that the right command?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gofmt -w -s

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you.. ran the command and pushed again.

a.wg.Wait()

errList := errors.NewErrorList()
err := a.createClusterAvailabilitySummary(probeFrequency)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry - I should have caught that earlier.

Actually I would really prefer to have a single summary rather than N summaries for every master and cluster separately (see my comment below).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't yet addressed.

consecutiveFailedProbes int
}

type apiAvailabilityOutput struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I would prefer the output to be:

type apiAvailabilitySummary struct {
  IP                                           string
  AvailabilityPercentage           float32
  LongestUnavailableDuration time.Duration
}

type apiAvailabilityOutput struct {
  ClusterMetrics apiAvailabilitySummary
  MasterMetrics []apiAvailabilitySummary
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi @wojtek-t , slight confusion here. The final output for MasterMetrics is an array here. If you see the output of the measurement that would be returned, all outputs would be consolidated into 1.

https://github.com/kubernetes/perf-tests/pull/1162/files#diff-769e85d1bdfa456cbbe72288f2ba2ab5R54-R55

Do you think this isn't similar to what you prefer?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I care about 'availability' summary as a whole. Yes - I want a single summary, instead of N separate ones.
Alternatively, we may consider two measurements: one for the cluster, second for all masters.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I might want to make some tweaks to the apiAvailabilityMeasurement struct in that case. Currently it looks like this:

type apiAvailabilityMeasurement struct {
	isRunning           bool
	stopCh              chan struct{}
	hosts               []string
	summaries           []measurement.Summary
	clusterLevelMetrics *apiAvailabilityMetrics
	hostLevelMetrics    map[string]*apiAvailabilityMetrics
	wg                  sync.WaitGroup
}

So, if I understand correctly, instead of an array of summaries, you want 1 cluster level summary and array of MasterLevel summaries in a single object. I would also need to remove the clusterLevelMetrics and hostLevelMetrics from the above struct.

Regarding 2 separate measurements, do we expose them as separate measurements- which seems slightly confusing to me. With a unified measurement, a user would be able to get all availability metrics in one summary.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry - I meant "two summaries" - one for the cluster, second for all masters.
But, I think having a single summary is cleaner.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok. so in that case, i will make the code changes to have one summary in the format you explained above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wojtek-t I have made the code changes to expose only 1 summary for both ClusterLevel and MasterMetrics. Plz review when you get the chance..

@vamossagar12 vamossagar12 force-pushed the api-availability-measurement branch 2 times, most recently from 55c58b3 to 1aca309 Compare July 27, 2020 15:15
a.wg.Wait()

errList := errors.NewErrorList()
err := a.createClusterAvailabilitySummary(probeFrequency)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't yet addressed.

consecutiveFailedProbes int
}

type apiAvailabilityOutput struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I care about 'availability' summary as a whole. Yes - I want a single summary, instead of N separate ones.
Alternatively, we may consider two measurements: one for the cluster, second for all masters.

Copy link
Member

@wojtek-t wojtek-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vamossagar12 - thanks, it's pretty close now; added two more comments

@vamossagar12
Copy link
Contributor Author

Thanks @wojtek-t , I made those changes as well.

@wojtek-t
Copy link
Member

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 17, 2020
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: vamossagar12, wojtek-t

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 17, 2020
@k8s-ci-robot k8s-ci-robot merged commit a089530 into kubernetes:master Aug 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants