Add instructions for setting up Grafana+Prometheus

This PR also includes location where to put our grafana dashboards
which we should maintain in repo.
This commit is contained in:
Fridrik Asmundsson 2023-09-19 16:00:12 +00:00
parent 8aaa8de975
commit 26e9703548
4 changed files with 378 additions and 0 deletions

View File

@ -7,6 +7,7 @@
## Improvements ## Improvements
- fix: Add time slicing to splitstore purging step during compaction to reduce lock congestion [filecoin-project/lotus#11269](https://github.com/filecoin-project/lotus/pull/11269) - fix: Add time slicing to splitstore purging step during compaction to reduce lock congestion [filecoin-project/lotus#11269](https://github.com/filecoin-project/lotus/pull/11269)
- feat: Added instructions on how to setup Prometheus/Grafana for monitoring a local Lotus node [filecoin-project/lotus#11276](https://github.com/filecoin-project/lotus/pull/11276)
# v1.23.3 / 2023-08-01 # v1.23.3 / 2023-08-01

128
metrics/README.md Normal file
View File

@ -0,0 +1,128 @@
# Setting Up Prometheus and Grafana
Lotus supports exporting a wide range of metrics, enabling users to gain insights into its behavior and effectively analyze performance issues. These metrics can be conveniently utilized with aggregation and visualization tools for in-depth analysis. In this document, we show how you can set up Prometheus and Grafana for monitoring and visualizing these metrics:
- **Prometheus**: Prometheus is an open-source monitoring and alerting toolkit designed for collecting and storing time-series data from various systems and applications. It provides a robust querying language (PromQL) and a web-based interface for analyzing and visualizing metrics.
- **Grafana**: Grafana is an open-source platform for creating, sharing, and visualizing interactive dashboards and graphs. It integrates with various data sources, including Prometheus, to help users create meaningful visual representations of their data and set up alerting based on specific conditions.
## Prerequisites
- You have a Linux or Mac based system.
- You have root access to install software
- You have lotus node already running
## Install and start Prometheus
### On Ubuntu:
```
# install prometheus
sudo apt-get install prometheus
# copy the prometheus.yml config to the correct directory
cp lotus/metrics/prometheus.yml /etc/prometheus/prometheus.yml
# start prometheus
sudo systemctl start prometheus
# enable prometheus on boot (optional)
sudo systemctl enable prometheus
```
### On Mac:
```
# install prometheus
brew install prometheus
# start prometheus
prometheus --config.file=lotus/metrics/prometheus.yml
```
## Install and start Grafana
### On Ubuntu:
```
# install grafana
sudo apt-get install grafana
# start grafana
sudo systemctl start grafana-server
# start grafana on boot (optional)
sudo systemctl enable grafana-server
```
### On Mac:
```
brew install grafana
brew services start grafana
```
You should now have Prometheus and Grafana running on your machine where Promotheus is already collecting metrics from your running Lotus node and saving it to a database.
You can confirm everything is setup correctly by visiting:
- Prometheus (http://localhost:9090): You can open the metric explorer and view any of the aggregated metrics scraped from Lotus
- Grafana (http://localhost:3000): Default username/password is admin/admin, remember to change it after login.
## Add Prometheus as datasource in Grafana
1. Log in to Grafana using the web interface.
2. Navigate to "Home" > "Connections" > "Data Sources."
3. Click "Add data source."
4. Choose "Prometheus."
5. In the "HTTP" section, set the URL to http://localhost:9090.
6. Click "Save & Test" to verify the connection.
## Import one of the existing dashboards in lotus/metrics/grafana
1. Log in to Grafana using the web interface.
2. Navigate to "Home" > "Dashboards" > Click the drop down menu in the "New" button and select "Import"
3. Paste any of the existing dashboards in lotus/metrics/grafana into the "Import via panel json" panel.
4. Click "Load"
# Collect system metrics using node_exporter
Although Lotus includes many useful metrics it does not include system metrics such as information about cpu, memory, disk, network, etc. If you are investigating an issue and have Lotus metrics available, its often very useful to correlate certain events or behaviour with general system metrics.
## Install node_exporter
If you have followed this guide so far and have Prometheus and Grafana already running, you can run the following commands to also aggregate the system metrics:
Ubuntu:
```
```
Mac:
```
# install node_exporter
brew install node_exporter
# run node_exporter
node_exporter
```
## Update prometheus config to include node_exporter
Add the following to the prometheus config and then restart prometheus:
```
- job_name: node_exporter
static_configs:
- targets: ['localhost:9100']
```
## Import system dashboard
1. Download the most recent dashboard from https://grafana.com/grafana/dashboards/1860-node-exporter-full/
2. Log in to Grafana (http://localhost:3000) using the web interface.
3. Navigate to "Home" > "Dashboards" > Click the drop down menu in the "New" button and select "Import"
4. Paste any of the existing dashboards in lotus/metrics/grafana into the "Import via panel json" panel.
5. Click "Load"
6. Select the Prometheus datasource you created earlier
7. Click "Import"

View File

@ -0,0 +1,241 @@
{
"__inputs": [
{
"name": "DS_PROMETHEUS",
"label": "Prometheus",
"description": "",
"type": "datasource",
"pluginId": "prometheus",
"pluginName": "Prometheus"
}
],
"__elements": {},
"__requires": [
{
"type": "grafana",
"id": "grafana",
"name": "Grafana",
"version": "10.1.1"
},
{
"type": "datasource",
"id": "prometheus",
"name": "Prometheus",
"version": "1.0.0"
},
{
"type": "panel",
"id": "timeseries",
"name": "Time series",
"version": ""
}
],
"annotations": {
"list": [
{
"builtIn": 1,
"datasource": {
"type": "grafana",
"uid": "-- Grafana --"
},
"enable": true,
"hide": true,
"iconColor": "rgba(0, 211, 255, 1)",
"name": "Annotations & Alerts",
"type": "dashboard"
}
]
},
"editable": true,
"fiscalYearStartMonth": 0,
"graphTooltip": 0,
"id": null,
"links": [],
"liveNow": false,
"panels": [
{
"datasource": {
"type": "prometheus",
"uid": "${DS_PROMETHEUS}"
},
"description": "Understand where time is spent in ApplyBlocks which is executed as part of ExecuteTipSet, its metric include:\n\n- applyblocks_total_ms (total): The total time spent in Applyblocks\n- applyblocks_cron (cron): Time spent in cron\n- applyblocks_early (early): Time spent in early apply-blocks (null cron, upgrades)\n- applyblocks_flush (flush): Time spent flushing vm state\n- applyblocks_messages (apply messages): Time spent applying block messages\n",
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"axisCenteredZero": false,
"axisColorMode": "text",
"axisLabel": "Time in MS",
"axisPlacement": "auto",
"barAlignment": 0,
"drawStyle": "line",
"fillOpacity": 0,
"gradientMode": "none",
"hideFrom": {
"legend": false,
"tooltip": false,
"viz": false
},
"insertNulls": false,
"lineInterpolation": "smooth",
"lineWidth": 1,
"pointSize": 5,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "auto",
"spanNulls": 60000,
"stacking": {
"group": "A",
"mode": "none"
},
"thresholdsStyle": {
"mode": "off"
}
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "red",
"value": 80
}
]
}
},
"overrides": []
},
"gridPos": {
"h": 10,
"w": 12,
"x": 0,
"y": 0
},
"id": 1,
"options": {
"legend": {
"calcs": [],
"displayMode": "list",
"placement": "bottom",
"showLegend": true
},
"tooltip": {
"mode": "single",
"sort": "none"
}
},
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "${DS_PROMETHEUS}"
},
"disableTextWrap": false,
"editorMode": "builder",
"expr": "histogram_quantile(0.99, sum by(le) (rate(lotus_vm_applyblocks_total_ms_bucket[$__rate_interval])))",
"fullMetaSearch": false,
"includeNullMetadata": false,
"instant": false,
"legendFormat": "Total",
"range": true,
"refId": "A",
"useBackend": false
},
{
"datasource": {
"type": "prometheus",
"uid": "${DS_PROMETHEUS}"
},
"disableTextWrap": false,
"editorMode": "builder",
"expr": "histogram_quantile(0.99, sum by(le) (rate(lotus_vm_applyblocks_cron_bucket[$__rate_interval])))",
"fullMetaSearch": false,
"hide": false,
"includeNullMetadata": false,
"instant": false,
"legendFormat": "Cron",
"range": true,
"refId": "B",
"useBackend": false
},
{
"datasource": {
"type": "prometheus",
"uid": "${DS_PROMETHEUS}"
},
"disableTextWrap": false,
"editorMode": "builder",
"expr": "histogram_quantile(0.99, sum by(le) (rate(lotus_vm_applyblocks_early_bucket[$__rate_interval])))",
"fullMetaSearch": false,
"hide": false,
"includeNullMetadata": false,
"instant": false,
"legendFormat": "Early",
"range": true,
"refId": "C",
"useBackend": false
},
{
"datasource": {
"type": "prometheus",
"uid": "${DS_PROMETHEUS}"
},
"disableTextWrap": false,
"editorMode": "builder",
"expr": "histogram_quantile(0.99, sum by(le) (rate(lotus_vm_applyblocks_flush_bucket[$__rate_interval])))",
"fullMetaSearch": false,
"hide": false,
"includeNullMetadata": false,
"instant": false,
"legendFormat": "Flush",
"range": true,
"refId": "D",
"useBackend": false
},
{
"datasource": {
"type": "prometheus",
"uid": "${DS_PROMETHEUS}"
},
"disableTextWrap": false,
"editorMode": "builder",
"expr": "histogram_quantile(0.99, sum by(le) (rate(lotus_vm_applyblocks_messages_bucket[$__rate_interval])))",
"fullMetaSearch": false,
"hide": false,
"includeNullMetadata": false,
"instant": false,
"legendFormat": "Apply messages",
"range": true,
"refId": "E",
"useBackend": false
}
],
"title": "ApplyBlocks (ms)",
"type": "timeseries"
}
],
"refresh": "",
"schemaVersion": 38,
"style": "dark",
"tags": [],
"templating": {
"list": []
},
"time": {
"from": "now-5m",
"to": "now"
},
"timepicker": {},
"timezone": "",
"title": "Lotus Message Execution",
"uid": "a7bacd0e-f7a1-418f-98e5-3469c5e0b6ea",
"version": 5,
"weekStart": ""
}

8
metrics/prometheus.yml Normal file
View File

@ -0,0 +1,8 @@
global:
scrape_interval: 10s
scrape_configs:
- job_name: 'lotus'
metrics_path: '/debug/metrics'
static_configs:
- targets: ['localhost:1234']