stack-orchestrator/stack_orchestrator/data/stacks/monitoring
Prathamesh Musale 8f2da38183
Some checks failed
Lint Checks / Run linter (push) Successful in 45s
Publish / Build and publish (push) Successful in 1m27s
Webapp Test / Run webapp test suite (push) Successful in 3m50s
Deploy Test / Run deploy test suite (push) Successful in 5m23s
Smoke Test / Run basic test suite (push) Successful in 4m40s
Database Test / Run database hosting test on kind/k8s (push) Successful in 12m11s
Container Registry Test / Run contaier registry hosting test on kind/k8s (push) Successful in 4m34s
External Stack Test / Run external stack test suite (push) Successful in 4m51s
Fixturenet-Laconicd-Test / Run Laconicd fixturenet and Laconic CLI tests (push) Successful in 15m20s
K8s Deploy Test / Run deploy test suite on kind/k8s (push) Successful in 11m10s
Fixturenet-Eth-Plugeth-Arm-Test / Run an Ethereum plugeth fixturenet test (push) Failing after 3h12m58s
Fixturenet-Eth-Plugeth-Test / Run an Ethereum plugeth fixturenet test (push) Failing after 3h13m0s
Add alerts for graph-node subgraphs (#821)
Part of [Deploy v2 and updated v3 sushiswap subgraphs](https://www.notion.so/Deploy-v2-and-updated-v3-sushiswap-subgraphs-e331945fdeea487c890706fc22c6cc94)

Reviewed-on: #821
Co-authored-by: Prathamesh Musale <prathamesh.musale0@gmail.com>
Co-committed-by: Prathamesh Musale <prathamesh.musale0@gmail.com>
2024-05-17 04:01:41 +00:00
..
monitoring-watchers.md Add alerts for graph-node subgraphs (#821) 2024-05-17 04:01:41 +00:00
README.md Add a laconicd Grafana dashboard to monitoring stack (#799) 2024-04-11 07:59:36 +00:00
stack.yml Remove latest indexed block value from watcher alerts data (#780) 2024-03-13 07:16:15 +00:00

monitoring

  • Instructions to setup and run a Prometheus server and Grafana dashboard
  • Comes with the following built-in exporters / dashboards:
  • See monitoring-watchers.md for an example usage of the stack with pre-configured dashboards for watchers

Setup

Clone required repositories:

laconic-so --stack monitoring setup-repositories --git-ssh --pull

Build the container images:

laconic-so --stack monitoring build-containers

Create a deployment

First, create a spec file for the deployment, which will map the stack's ports and volumes to the host:

laconic-so --stack monitoring deploy init --output monitoring-spec.yml

Ports

Edit network in spec file to map container ports to same ports in host:

...
network:
  ports:
    prometheus:
      - '9090:9090'
    grafana:
      - '3000:3000'
...

Data volumes

Container data volumes are bind-mounted to specified paths in the host filesystem. The default setup (generated by laconic-so deploy init) places the volumes in the ./data subdirectory of the deployment directory. The default mappings can be customized by editing the "spec" file generated by laconic-so deploy init.


Once you've made any needed changes to the spec file, create a deployment from it:

laconic-so --stack monitoring deploy create --spec-file monitoring-spec.yml --deployment-dir monitoring-deployment

Configure

Prometheus Config

  • Add desired scrape configs to prometheus config file (monitoring-deployment/config/monitoring/prometheus/prometheus.yml) in the deployment folder; for example:

    ...
    - job_name: <JOB_NAME>
      metrics_path: /metrics/path
      scheme: http
      static_configs:
        - targets: ['<METRICS_ENDPOINT_HOST>:<METRICS_ENDPOINT_PORT>']
    
  • Node exporter: update the node job to add any node-exporter targets to be monitored:

    ...
    - job_name: 'node'
      ...
      static_configs:
        # Add node-exporter targets to be monitored below
        - targets: [example-host:9100]
          labels:
            instance: 'my-host'
    
  • Blackbox (in-stack exporter): update the blackbox job to add any endpoints to be monitored on the Blackbox dashboard:

    ...
    - job_name: 'blackbox'
      ...
      static_configs:
        # Add URLs to be monitored below
        - targets:
          - <HTTP_ENDPOINT_1>
          - <HTTP_ENDPOINT_2>
          - <LACONICD_GQL_ENDPOINT>
    
  • Postgres (in-stack exporter):

    • Update the postgres job to add Postgres db targets to be monitored:

      ...
      - job_name: 'postgres'
        ...
        static_configs:
          # Add DB targets below
          - targets: [example-server:5432]
            labels:
              instance: 'example-db'
      
    • Add database credentials to be used in auth_modules in the postgres-exporter config file (monitoring-deployment/config/monitoring/postgres-exporter.yml)

  • laconicd: update the laconicd job with a laconicd node's REST endpoint host and port:

    ...
    - job_name: laconicd
    static_configs:
      - targets: ['example-host:1317']
    ...
    

Note: Use host.docker.internal as host to access ports on the host machine

Grafana Config

Place the dashboard json files in grafana dashboards config directory (monitoring-deployment/config/monitoring/grafana/dashboards) in the deployment folder

Env

Set the following env variables in the deployment env config file (monitoring-deployment/config.env):

# For chain-head exporter

# External ETH RPC endpoint (ethereum)
# (Optional, default: https://mainnet.infura.io/v3)
CERC_ETH_RPC_ENDPOINT=

# Infura key to be used
# (Optional, used with ETH_RPC_ENDPOINT if provided)
CERC_INFURA_KEY=

# External ETH RPC endpoint (filecoin)
# (Optional, default: https://api.node.glif.io/rpc/v1)
CERC_FIL_RPC_ENDPOINT=

# Grafana server host URL (used in various links in alerts, etc.)
# (Optional, default: http://localhost:3000)
GF_SERVER_ROOT_URL=

Start the stack

Start the deployment:

laconic-so deployment --dir monitoring-deployment start
  • List and check the health status of all the containers using docker ps and wait for them to be healthy

  • Grafana should now be visible at http://localhost:3000 with configured dashboards

Clean up

To stop monitoring services running in the background, while preserving data:

# Only stop the docker containers
laconic-so deployment --dir monitoring-deployment stop

# Run 'start' to restart the deployment

To stop monitoring services and also delete data:

# Stop the docker containers
laconic-so deployment --dir monitoring-deployment stop --delete-volumes

# Remove deployment directory (deployment will have to be recreated for a re-run)
rm -rf monitoring-deployment