Add alerts for graph-node subgraphs (#821)
Some checks failed
Lint Checks / Run linter (push) Successful in 45s
Publish / Build and publish (push) Successful in 1m27s
Webapp Test / Run webapp test suite (push) Successful in 3m50s
Deploy Test / Run deploy test suite (push) Successful in 5m23s
Smoke Test / Run basic test suite (push) Successful in 4m40s
Database Test / Run database hosting test on kind/k8s (push) Successful in 12m11s
Container Registry Test / Run contaier registry hosting test on kind/k8s (push) Successful in 4m34s
External Stack Test / Run external stack test suite (push) Successful in 4m51s
Fixturenet-Laconicd-Test / Run Laconicd fixturenet and Laconic CLI tests (push) Successful in 15m20s
K8s Deploy Test / Run deploy test suite on kind/k8s (push) Successful in 11m10s
Fixturenet-Eth-Plugeth-Arm-Test / Run an Ethereum plugeth fixturenet test (push) Failing after 3h12m58s
Fixturenet-Eth-Plugeth-Test / Run an Ethereum plugeth fixturenet test (push) Failing after 3h13m0s

Part of [Deploy v2 and updated v3 sushiswap subgraphs](https://www.notion.so/Deploy-v2-and-updated-v3-sushiswap-subgraphs-e331945fdeea487c890706fc22c6cc94)

Reviewed-on: #821
Co-authored-by: Prathamesh Musale <prathamesh.musale0@gmail.com>
Co-committed-by: Prathamesh Musale <prathamesh.musale0@gmail.com>
This commit is contained in:
Prathamesh Musale 2024-05-17 04:01:41 +00:00 committed by ashwin
parent 254f95e59f
commit 8f2da38183
4 changed files with 93 additions and 3 deletions

View File

@ -6,10 +6,16 @@ services:
restart: always
environment:
GF_SERVER_ROOT_URL: ${GF_SERVER_ROOT_URL}
CERC_GRAFANA_ALERTS_SUBGRAPH_IDS: ${CERC_GRAFANA_ALERTS_SUBGRAPH_IDS}
volumes:
- ../config/monitoring/grafana/provisioning:/etc/grafana/provisioning
- ../config/monitoring/grafana/dashboards:/etc/grafana/dashboards
- ../config/monitoring/update-grafana-alerts-config.sh:/update-grafana-alerts-config.sh
- grafana_storage:/var/lib/grafana
user: root
entrypoint: ["bash", "-c"]
command: |
"/update-grafana-alerts-config.sh && /run.sh"
ports:
- "3000"
healthcheck:

View File

@ -0,0 +1,64 @@
apiVersion: 1
groups:
- orgId: 1
name: subgraph
folder: SubgraphAlerts
interval: 30s
rules:
- uid: b2a9144b-6104-46fc-92b5-352f4e643c4c
title: subgraph_head_tracking
condition: condition
data:
- refId: diff
relativeTimeRange:
from: 600
to: 0
datasourceUid: PBFA97CFB590B2093
model:
datasource:
type: prometheus
uid: PBFA97CFB590B2093
editorMode: code
expr: ethereum_chain_head_number - on(network) group_right deployment_head{deployment=~"REPLACE_WITH_SUBGRAPH_IDS"}
instant: true
intervalMs: 1000
legendFormat: __auto
maxDataPoints: 43200
range: false
refId: diff
- refId: condition
relativeTimeRange:
from: 600
to: 0
datasourceUid: __expr__
model:
conditions:
- evaluator:
params:
- 15
- 0
type: gt
operator:
type: and
query:
params: []
reducer:
params: []
type: avg
type: query
datasource:
name: Expression
type: __expr__
uid: __expr__
expression: diff
intervalMs: 1000
maxDataPoints: 43200
refId: condition
type: threshold
noDataState: OK
execErrState: Alerting
for: 5m
annotations:
summary: Subgraph deployment {{ index $labels "deployment" }} is falling behind head by {{ index $values "diff" }}
labels: {}
isPaused: false

View File

@ -0,0 +1,7 @@
#!/bin/bash
echo Using CERC_GRAFANA_ALERTS_SUBGRAPH_IDS ${CERC_GRAFANA_ALERTS_SUBGRAPH_IDS}
# Replace subgraph ids in subgraph alerting config
# Note: Requires the grafana container to be run with user root
sed -i "s/REPLACE_WITH_SUBGRAPH_IDS/$CERC_GRAFANA_ALERTS_SUBGRAPH_IDS/g" /etc/grafana/provisioning/alerting/subgraph-alert-rules.yml

View File

@ -113,21 +113,31 @@ Add the following scrape configs to prometheus config file (`monitoring-watchers
labels:
instance: 'ajna'
chain: 'filecoin'
- job_name: graph-node
metrics_path: /metrics
scrape_interval: 30s
static_configs:
- targets: ['GRAPH_NODE_HOST:GRAPH_NODE_HOST_METRICS_PORT']
```
Add scrape config as done above for any additional watcher to add it to the Watchers dashboard.
### Grafana alerts config
Place the pre-configured watcher alerts rules in Grafana provisioning directory:
Place the pre-configured alerts rules in Grafana provisioning directory:
```bash
# watcher alert rules
cp monitoring-watchers-deployment/config/monitoring/watcher-alert-rules.yml monitoring-watchers-deployment/config/monitoring/grafana/provisioning/alerting/
# subgraph alert rules
cp monitoring-watchers-deployment/config/monitoring/subgraph-alert-rules.yml monitoring-watchers-deployment/config/monitoring/grafana/provisioning/alerting/
```
Update the alerting contact points config (`monitoring-watchers-deployment/config/monitoring/grafana/provisioning/alerting/contactpoints.yml`) with desired contact points
Add corresponding routes to the notification policies config (`monitoring-watchers-deployment/monitoring/grafana/provisioning/alerting/policies.yaml`) with appropriate object-matchers:
Add corresponding routes to the notification policies config (`monitoring-watchers-deployment/config/monitoring/grafana/provisioning/alerting/policies.yml`) with appropriate object-matchers:
```yml
...
@ -135,7 +145,7 @@ Add corresponding routes to the notification policies config (`monitoring-watche
- receiver: SlackNotifier
object_matchers:
# Add matchers below
- ['grafana_folder', '=', 'WatcherAlerts']
- ['grafana_folder', '=~', 'WatcherAlerts|SubgraphAlerts']
```
### Env
@ -149,6 +159,9 @@ Set the following env variables in the deployment env config file (`monitoring-w
# Grafana server host URL to be used
# (Optional, default: http://localhost:3000)
GF_SERVER_ROOT_URL=
# List of subgraph ids to configure alerts for (separated by |)
CERC_GRAFANA_ALERTS_SUBGRAPH_IDS=
```
## Start the stack