Add alerts for testnet services

2025-04-04 18:06:53 +05:30 · 2025-04-04 18:06:53 +05:30 · fb0138e975
commit fb0138e975
parent 27412519b4
2 changed files with 82 additions and 2 deletions
--- a/stack_orchestrator/data/config/monitoring/testnet-alert-rules.yml
+++ b/stack_orchestrator/data/config/monitoring/testnet-alert-rules.yml
@ -0,0 +1,64 @@
+apiVersion: 1
+
+groups:
+  - orgId: 1
+    name: testnet
+    folder: TestnetAlerts
+    interval: 30s
+    rules:
+      - uid: endpoint_down
+        title: endpoint_down
+        condition: condition
+        data:
+          - refId: probe_success
+            relativeTimeRange:
+              from: 600
+              to: 0
+            datasourceUid: PBFA97CFB590B2093
+            model:
+              datasource:
+                type: prometheus
+                uid: PBFA97CFB590B2093
+              editorMode: code
+              expr: probe_success{job="blackbox"}
+              instant: true
+              intervalMs: 1000
+              legendFormat: __auto
+              maxDataPoints: 43200
+              range: false
+              refId: probe_success
+          - refId: condition
+            relativeTimeRange:
+              from: 600
+              to: 0
+            datasourceUid: __expr__
+            model:
+              conditions:
+                - evaluator:
+                    params:
+                      - 0
+                      - 0
+                    type: eq
+                  operator:
+                    type: and
+                  query:
+                    params: []
+                  reducer:
+                    params: []
+                    type: avg
+                  type: query
+              datasource:
+                name: Expression
+                type: __expr__
+                uid: __expr__
+              expression: ${probe_success} == 0
+              intervalMs: 1000
+              maxDataPoints: 43200
+              refId: condition
+              type: math
+        noDataState: Alerting
+        execErrState: Alerting
+        for: 5m
+        annotations:
+          summary: Endpoint {{ $labels.instance }} is down
+        isPaused: false
--- a/stack_orchestrator/data/stacks/monitoring/monitoring-testnet.md
+++ b/stack_orchestrator/data/stacks/monitoring/monitoring-testnet.md
@ -4,7 +4,7 @@ Instructions to setup and run monitoring stack for testnet services

 ## Create a deployment

-After completing [setup](./README.md#setup), create a spec file for the deployment, which will map the stack's ports and volumes to the host:
+Create a spec file for the deployment, which will map the stack's ports and volumes to the host:

 ```bash
 laconic-so --stack monitoring deploy init --output monitoring-testnet-spec.yml
@ -37,7 +37,7 @@ laconic-so --stack monitoring deploy create --spec-file monitoring-testnet-spec.

 ### Prometheus scrape config

-Add the following scrape configs to prometheus config file (`monitoring-testnet-deployment/config/monitoring/prometheus/prometheus.yml`) in the deployment folder:
+- Setup the following scrape configs in prometheus config file (`monitoring-testnet-deployment/config/monitoring/prometheus/prometheus.yml`) in the deployment folder:

  ```yml
  ...
@ -62,6 +62,22 @@ Add the following scrape configs to prometheus config file (`monitoring-testnet-
      # Example: 'host.docker.internal:3317'
  ```

+- Remove docker compose services which are not required in `monitoring-testnet-deployment/compose/docker-compose-prom-server.yml`
+  - `ethereum-chain-head-exporter`
+  - `filecoin-chain-head-exporter`
+  - `graph-node-upstream-head-exporter`
+  - `postgres-exporter`
+
+### Grafana dashboards
+
+Remove some of the existing dashboards which are not required in monitoring testnet
+```
+cd monitoring-testnet-deployment/config/monitoring/grafana/dashboards
+rm postgres-dashboard.json subgraphs-dashboard.json watcher-dashboard.json
+cd -
+```
+<!-- TODO: Check node-exporter-full.json, nodejs-app-dashboard.json -->
+
 ### Grafana alerts config

 Place the pre-configured alerts rules in Grafana provisioning directory: