diff --git a/stack_orchestrator/data/config/monitoring/testnet-alert-rules.yml b/stack_orchestrator/data/config/monitoring/testnet-alert-rules.yml new file mode 100644 index 00000000..60d77bd1 --- /dev/null +++ b/stack_orchestrator/data/config/monitoring/testnet-alert-rules.yml @@ -0,0 +1,64 @@ +apiVersion: 1 + +groups: + - orgId: 1 + name: testnet + folder: TestnetAlerts + interval: 30s + rules: + - uid: endpoint_down + title: endpoint_down + condition: condition + data: + - refId: probe_success + relativeTimeRange: + from: 600 + to: 0 + datasourceUid: PBFA97CFB590B2093 + model: + datasource: + type: prometheus + uid: PBFA97CFB590B2093 + editorMode: code + expr: probe_success{job="blackbox"} + instant: true + intervalMs: 1000 + legendFormat: __auto + maxDataPoints: 43200 + range: false + refId: probe_success + - refId: condition + relativeTimeRange: + from: 600 + to: 0 + datasourceUid: __expr__ + model: + conditions: + - evaluator: + params: + - 0 + - 0 + type: eq + operator: + type: and + query: + params: [] + reducer: + params: [] + type: avg + type: query + datasource: + name: Expression + type: __expr__ + uid: __expr__ + expression: ${probe_success} == 0 + intervalMs: 1000 + maxDataPoints: 43200 + refId: condition + type: math + noDataState: Alerting + execErrState: Alerting + for: 5m + annotations: + summary: Endpoint {{ $labels.instance }} is down + isPaused: false diff --git a/stack_orchestrator/data/stacks/monitoring/monitoring-testnet.md b/stack_orchestrator/data/stacks/monitoring/monitoring-testnet.md index e20fab13..64399995 100644 --- a/stack_orchestrator/data/stacks/monitoring/monitoring-testnet.md +++ b/stack_orchestrator/data/stacks/monitoring/monitoring-testnet.md @@ -4,7 +4,7 @@ Instructions to setup and run monitoring stack for testnet services ## Create a deployment -After completing [setup](./README.md#setup), create a spec file for the deployment, which will map the stack's ports and volumes to the host: +Create a spec file for the deployment, which will map the stack's ports and volumes to the host: ```bash laconic-so --stack monitoring deploy init --output monitoring-testnet-spec.yml @@ -37,7 +37,7 @@ laconic-so --stack monitoring deploy create --spec-file monitoring-testnet-spec. ### Prometheus scrape config -Add the following scrape configs to prometheus config file (`monitoring-testnet-deployment/config/monitoring/prometheus/prometheus.yml`) in the deployment folder: +- Setup the following scrape configs in prometheus config file (`monitoring-testnet-deployment/config/monitoring/prometheus/prometheus.yml`) in the deployment folder: ```yml ... @@ -62,6 +62,22 @@ Add the following scrape configs to prometheus config file (`monitoring-testnet- # Example: 'host.docker.internal:3317' ``` +- Remove docker compose services which are not required in `monitoring-testnet-deployment/compose/docker-compose-prom-server.yml` + - `ethereum-chain-head-exporter` + - `filecoin-chain-head-exporter` + - `graph-node-upstream-head-exporter` + - `postgres-exporter` + +### Grafana dashboards + +Remove some of the existing dashboards which are not required in monitoring testnet +``` +cd monitoring-testnet-deployment/config/monitoring/grafana/dashboards +rm postgres-dashboard.json subgraphs-dashboard.json watcher-dashboard.json +cd - +``` + + ### Grafana alerts config Place the pre-configured alerts rules in Grafana provisioning directory: