Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

koordlet: add libpfm4&perf group #1554

Merged
merged 8 commits into from
Sep 14, 2023

Conversation

bowen-intel
Copy link
Contributor

@bowen-intel bowen-intel commented Aug 17, 2023

Ⅰ. Describe what this PR does

This pull request enhances the precision of CPI calculation through the utilization of the perf group. Additionally, it introduces the integration of libpfm4 as an event parser, thereby expanding the potential to incorporate additional TMA analyses in subsequent developments.

Ⅱ. Does this pull request fix one issue?

fixes #1476 #1470

Ⅲ. Describe how to verify it

Use Prometheus.

Ⅳ. Special notes for reviews

This pr uses unsafe.Pointer to mock syscall.Syscall6 in unit test, so I have to add unsafeptr tag for go vet tool and ignore github.com/koordinator-sh/koordinator/pkg/koordlet/util/perf package in test since it use -race tag.
Any comments is welcome.

V. Checklist

  • I have written necessary docs and comments
  • I have added necessary unit tests and integration tests
  • All checks passed in make test

@eahydra eahydra changed the title feat: add libpfm4&perf group koordlet: add libpfm4&perf group Aug 17, 2023
@bowen-intel bowen-intel force-pushed the libpfm4 branch 3 times, most recently from cb88d52 to 6afb121 Compare August 17, 2023 07:42
@codecov
Copy link

codecov bot commented Aug 17, 2023

Codecov Report

Patch coverage: 55.83% and project coverage change: +0.53% 🎉

Comparison is base (3fcac00) 65.19% compared to head (b24b776) 65.72%.
Report is 53 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1554      +/-   ##
==========================================
+ Coverage   65.19%   65.72%   +0.53%     
==========================================
  Files         352      376      +24     
  Lines       36326    39842    +3516     
==========================================
+ Hits        23681    26186    +2505     
- Misses      10901    11745     +844     
- Partials     1744     1911     +167     
Flag Coverage Δ
unittests 65.72% <55.83%> (+0.53%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Changed Coverage Δ
pkg/features/koordlet_features.go 100.00% <ø> (ø)
pkg/koordlet/util/perf/perf_linux.go 46.73% <0.00%> (ø)
...lectors/performance/performance_collector_linux.go 52.83% <21.42%> (-3.17%) ⬇️
pkg/koordlet/util/stat.go 52.85% <45.00%> (-22.15%) ⬇️
pkg/koordlet/util/perf_group/perf_group_linux.go 62.84% <62.84%> (ø)

... and 99 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@bowen-intel bowen-intel force-pushed the libpfm4 branch 3 times, most recently from cdd92e7 to 8c74079 Compare August 18, 2023 09:40
Signed-off-by: bowen-intel <[email protected]>
Signed-off-by: bowen-intel <[email protected]>
@bowen-intel
Copy link
Contributor Author

bowen-intel commented Aug 28, 2023

Make a A/B test for two method:

workload

100 nginx

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-nginx
  namespace: nginx
  labels:
    run: my-nginx
spec:
  replicas: 100
  selector:
    matchLabels:
      run: my-nginx
  template:
    metadata:
      labels:
        run: my-nginx
    spec:
      nodeName: inspur-icx-1
      containers:
        - name: my-nginx
          image: nginx:latest
          ports:
            - containerPort: 80
          resources:
            requests:
              cpu: "1"
              memory: 1Gi
            limits:
              cpu: "1"
              memory: 1Gi
---
apiVersion: v1
kind: Service
metadata:
  name: my-nginx
  namespace: nginx
  labels:
    run: my-nginx
spec:
  ports:
    - port: 80
      protocol: TCP
      targetPort: 80
  selector:
    run: my-nginx

environment

CPU Intel(R) Xeon(R) Gold 6240 CPU @ 2.60GHz
Memory 186Gi
Ubuntu 22.04
containerd github.com/containerd/containerd v1.6.8 9cd3357b7fd7218e4aec3eae239db1f68a5a6ec6
runc version 1.1.4
koordinator(based on v1.3.0)
kubernetes v1.27.2

libpfm4

image

perf-util

image

@bowen-intel
Copy link
Contributor Author

Any comments and questions is welcome! 😀 @zwzhang0107 @songtao98

Copy link
Contributor

@songtao98 songtao98 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

Signed-off-by: bowen-intel <[email protected]>
Makefile Outdated Show resolved Hide resolved
Makefile Outdated Show resolved Hide resolved
docker/koordlet.dockerfile Outdated Show resolved Hide resolved
docker/koordlet.dockerfile Outdated Show resolved Hide resolved
pkg/koordlet/util/perf/perf_linux.go Outdated Show resolved Hide resolved
pkg/koordlet/util/perf/perf_linux.go Outdated Show resolved Hide resolved
pkg/koordlet/util/perf/perf_linux.go Outdated Show resolved Hide resolved
pkg/koordlet/util/perf/perf_linux.go Outdated Show resolved Hide resolved
pkg/koordlet/util/perf/perf_linux.go Outdated Show resolved Hide resolved
@zwzhang0107
Copy link
Contributor

Make a A/B test for two method:

workload

100 nginx

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-nginx
  namespace: nginx
  labels:
    run: my-nginx
spec:
  replicas: 100
  selector:
    matchLabels:
      run: my-nginx
  template:
    metadata:
      labels:
        run: my-nginx
    spec:
      nodeName: inspur-icx-1
      containers:
        - name: my-nginx
          image: nginx:latest
          ports:
            - containerPort: 80
          resources:
            requests:
              cpu: "1"
              memory: 1Gi
            limits:
              cpu: "1"
              memory: 1Gi
---
apiVersion: v1
kind: Service
metadata:
  name: my-nginx
  namespace: nginx
  labels:
    run: my-nginx
spec:
  ports:
    - port: 80
      protocol: TCP
      targetPort: 80
  selector:
    run: my-nginx

environment

CPU Intel(R) Xeon(R) Gold 6240 CPU @ 2.60GHz Memory 186Gi Ubuntu 22.04 containerd github.com/containerd/containerd v1.6.8 9cd3357b7fd7218e4aec3eae239db1f68a5a6ec6 runc version 1.1.4 koordinator(based on v1.3.0) kubernetes v1.27.2

libpfm4

image

perf-util

image

this looks great!

it will be better if we have the comparison on accuracy between perf and libpfm4

Signed-off-by: bowen-intel <[email protected]>
Signed-off-by: bowen-intel <[email protected]>
@bowen-intel
Copy link
Contributor Author

Make a A/B test for two method:

workload

100 nginx

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-nginx
  namespace: nginx
  labels:
    run: my-nginx
spec:
  replicas: 100
  selector:
    matchLabels:
      run: my-nginx
  template:
    metadata:
      labels:
        run: my-nginx
    spec:
      nodeName: inspur-icx-1
      containers:
        - name: my-nginx
          image: nginx:latest
          ports:
            - containerPort: 80
          resources:
            requests:
              cpu: "1"
              memory: 1Gi
            limits:
              cpu: "1"
              memory: 1Gi
---
apiVersion: v1
kind: Service
metadata:
  name: my-nginx
  namespace: nginx
  labels:
    run: my-nginx
spec:
  ports:
    - port: 80
      protocol: TCP
      targetPort: 80
  selector:
    run: my-nginx

environment

CPU Intel(R) Xeon(R) Gold 6240 CPU @ 2.60GHz Memory 186Gi Ubuntu 22.04 containerd github.com/containerd/containerd v1.6.8 9cd3357b7fd7218e4aec3eae239db1f68a5a6ec6 runc version 1.1.4 koordinator(based on v1.3.0) kubernetes v1.27.2

libpfm4

image

perf-util

image

this looks great!

it will be better if we have the comparison on accuracy between perf and libpfm4

libpfm4

image cycles: 9694773088 instructions: 2817277063

perf

perf stat -e '{cycles,instructions}' -G xxx.scope -a sleep 10
image

Signed-off-by: bowen-intel <[email protected]>
@zwzhang0107
Copy link
Contributor

/lgtm

@koordinator-bot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: hormes

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@koordinator-bot koordinator-bot bot merged commit 876e878 into koordinator-sh:main Sep 14, 2023
19 checks passed
ls-2018 pushed a commit to ls-2018/koordinator that referenced this pull request Mar 25, 2024
ls-2018 pushed a commit to ls-2018/koordinator that referenced this pull request Mar 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[proposal] Use libpfm4 in perf collector
5 participants