Migrated metrics to prometheus #434

nastassia-dailidava · 2024-09-27T17:56:14Z

No description provided.

docs/deployment/observability.md

kozjan · 2024-10-01T06:12:37Z

...re/src/main/kotlin/pl/allegro/tech/servicemesh/envoycontrol/snapshot/EnvoySnapshotFactory.kt

-        sample.stop(meterRegistry.timer("snapshot-factory.new-snapshot.time"))
+        sample.stop(
+            meterRegistry.timer(
+                "snapshot-factory.seconds",


maybe snapshot_factory to be consistent with other metrics?

yea, I was looking at some our other metrics, but looks like this way would be better, I changed to dots everywhere

kozjan · 2024-10-01T06:13:04Z

...re/src/main/kotlin/pl/allegro/tech/servicemesh/envoycontrol/snapshot/EnvoySnapshotFactory.kt

-        groupSample.stop(meterRegistry.timer("snapshot-factory.get-snapshot-for-group.time"))
+        groupSample.stop(
+            meterRegistry.timer(
+                "snapshot-factory.seconds",


same here - snapshot_factory for consistency

kozjan · 2024-10-01T06:17:56Z

...ol-core/src/main/kotlin/pl/allegro/tech/servicemesh/envoycontrol/snapshot/SnapshotUpdater.kt

            .checkpoint("snapshot-updater-groups-published")
-            .name("snapshot-updater-groups-published").metrics()
+            .name("snapshot-updater.count.total")


snapshot_updater/snapshot.updater - . is mapped to _, right?

. is, - isn't

kozjan · 2024-10-01T06:22:17Z

...main/kotlin/pl/allegro/tech/servicemesh/envoycontrol/consul/services/ConsulServiceChanges.kt

                            .record(
                                stopTimer - startTimer,
-                                TimeUnit.MILLISECONDS
+                                TimeUnit.SECONDS


why change the unit here?

prometheus has some naming conventions and default units that are recommended to be used for consistency, so according to them it's better to use seconds

...pl/allegro/tech/servicemesh/envoycontrol/server/callbacks/MetricsDiscoveryServerCallbacks.kt

Ferdudas97 · 2024-10-01T06:26:51Z

...ol-core/src/main/kotlin/pl/allegro/tech/servicemesh/envoycontrol/snapshot/SnapshotUpdater.kt

-            .name("snapshot-updater-merged").metrics()
+            .name("snapshot.updater.count.total")
+            .tag("status", "merged")
+            .tag("type", "global")


what this type means?

so as there were separate metrics for groups and global snapshot-updater I introduced it as a type label, so this type is like "snapshot-type"

Co-authored-by: kozjan <[email protected]>

…iew fixes)

KSmigielski · 2024-10-01T10:47:44Z

...rc/main/kotlin/pl/allegro/tech/servicemesh/envoycontrol/infrastructure/ControlPlaneConfig.kt

+            meterRegistry.gauge(metricName, Tags.of("status", "instance-changed"), it.instanceChanges)
+            meterRegistry.gauge(metricName, Tags.of("status", "snapshot-changed"), it.snapshotChanges)
+            meterRegistry.gauge("cache.groups.count", it.cacheGroupsCount)
+            it.meterRegistry.more().counter("services.watch.errors.total", listOf(), it.errorWatchingServices)


I think it would be better to have consistent name. So if we have watched-services, then it should be watched-services.errors.total

KSmigielski · 2024-10-01T10:50:36Z

...e/src/main/kotlin/pl/allegro/tech/servicemesh/envoycontrol/synchronization/RemoteServices.kt

            }, 0, interval, TimeUnit.SECONDS)
        }, FluxSink.OverflowStrategy.LATEST)
        return aclFlux.doOnCancel {
-            meterRegistry.counter("cross-dc-synchronization.cancelled").increment()
+            meterRegistry.counter("cross.dc.synchronization.cancelled").increment()


Why there is not total at the end of metric name?

KSmigielski · 2024-10-01T10:56:07Z

...main/kotlin/pl/allegro/tech/servicemesh/envoycontrol/consul/services/ConsulServiceChanges.kt

@@ -226,10 +226,10 @@ class ConsulServiceChanges(
                    if (ready) {
                        val stopTimer = System.currentTimeMillis()
                        readinessStateHandler.ready()
-                        metrics.meterRegistry.timer("envoy-control.warmup.time")
+                        metrics.meterRegistry.timer("envoy-control.warmup.seconds")


Both - and . will be replaced by _ in final metric name. So I suggest to use _ as separator in all metrics names instead - and .

nastassia-dailidava added 4 commits September 27, 2024 18:00

allegro-internal/flex-roadmap#819 Migrated metrics to prometheus

df11439

allegro-internal/flex-roadmap#819 Migrated metrics to prometheus

e678791

allegro-internal/flex-roadmap#819 Migrated metrics to prometheus

2516f18

allegro-internal/flex-roadmap#819 Migrated metrics to prometheus

22123bb

kozjan reviewed Oct 1, 2024

View reviewed changes

docs/deployment/observability.md Outdated Show resolved Hide resolved

kozjan reviewed Oct 1, 2024

View reviewed changes

Ferdudas97 requested changes Oct 1, 2024

View reviewed changes

nastassia-dailidava and others added 6 commits October 1, 2024 11:35

Update docs/deployment/observability.md

638ec83

Co-authored-by: kozjan <[email protected]>

allegro-internal/flex-roadmap#819 Migrated metrics to prometheus (rev…

3c93684

…iew fixes)

allegro-internal/flex-roadmap#819 Migrated metrics to prometheus (rev…

e5efd7b

…iew fixes)

allegro-internal/flex-roadmap#819 Migrated metrics to prometheus (lint)

c435189

allegro-internal/flex-roadmap#819 Migrated metrics to prometheus (lint)

4416446

allegro-internal/flex-roadmap#819 Migrated metrics to prometheus (lint)

5edcb15

KSmigielski reviewed Oct 1, 2024

View reviewed changes

KSmigielski deployed to ci October 1, 2024 11:03 — with GitHub Actions Active

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Migrated metrics to prometheus #434

Migrated metrics to prometheus #434

nastassia-dailidava commented Sep 27, 2024

kozjan Oct 1, 2024

nastassia-dailidava Oct 1, 2024

kozjan Oct 1, 2024

kozjan Oct 1, 2024

nastassia-dailidava Oct 1, 2024

kozjan Oct 1, 2024

nastassia-dailidava Oct 1, 2024

Ferdudas97 Oct 1, 2024

nastassia-dailidava Oct 1, 2024

KSmigielski Oct 1, 2024

KSmigielski Oct 1, 2024

KSmigielski Oct 1, 2024

Migrated metrics to prometheus #434

Are you sure you want to change the base?

Migrated metrics to prometheus #434

Conversation

nastassia-dailidava commented Sep 27, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment