lighthouse/beacon_node/http_metrics/src/metrics.rs
Paul Hauner 456b313665 Tune GNU malloc (#2299)
## Issue Addressed

NA

## Proposed Changes

Modify the configuration of [GNU malloc](https://www.gnu.org/software/libc/manual/html_node/The-GNU-Allocator.html) to reduce memory footprint.

- Set `M_ARENA_MAX` to 4.
    - This reduces memory fragmentation at the cost of contention between threads.
- Set `M_MMAP_THRESHOLD` to 2mb
    - This means that any allocation >= 2mb is allocated via an anonymous mmap, instead of on the heap/arena. This reduces memory fragmentation since we don't need to keep growing the heap to find big contiguous slabs of free memory.
- ~~Run `malloc_trim` every 60 seconds.~~
    - ~~This shaves unused memory from the top of the heap, preventing the heap from constantly growing.~~
    - Removed, see: https://github.com/sigp/lighthouse/pull/2299#issuecomment-825322646

*Note: this only provides memory savings on the Linux (glibc) platform.*
    
## Additional Info

I'm going to close #2288 in favor of this for the following reasons:

- I've managed to get the memory footprint *smaller* here than with jemalloc.
- This PR seems to be less of a dramatic change than bringing in the jemalloc dep.
- The changes in this PR are strictly runtime changes, so we can create CLI flags which disable them completely. Since this change is wide-reaching and complex, it's nice to have an easy "escape hatch" if there are undesired consequences.

## TODO

- [x] Allow configuration via CLI flags
- [x] Test on Mac
- [x] Test on RasPi.
- [x] Determine if GNU malloc is present?
    - I'm not quite sure how to detect for glibc.. This issue suggests we can't really: https://github.com/rust-lang/rust/issues/33244
- [x] Make a clear argument regarding the affect of this on CPU utilization.
- [x] Test with higher `M_ARENA_MAX` values.
- [x] Test with longer trim intervals
- [x] Add some stats about memory savings
- [x] Remove `malloc_trim` calls & code
2021-05-28 05:59:45 +00:00

57 lines
2.1 KiB
Rust

use crate::Context;
use beacon_chain::BeaconChainTypes;
use lighthouse_metrics::{Encoder, TextEncoder};
use malloc_utils::scrape_allocator_metrics;
pub use lighthouse_metrics::*;
pub fn gather_prometheus_metrics<T: BeaconChainTypes>(
ctx: &Context<T>,
) -> std::result::Result<String, String> {
let mut buffer = vec![];
let encoder = TextEncoder::new();
// There are two categories of metrics:
//
// - Dynamically updated: things like histograms and event counters that are updated on the
// fly.
// - Statically updated: things which are only updated at the time of the scrape (used where we
// can avoid cluttering up code with metrics calls).
//
// The `lighthouse_metrics` crate has a `DEFAULT_REGISTRY` global singleton (via `lazy_static`)
// which keeps the state of all the metrics. Dynamically updated things will already be
// up-to-date in the registry (because they update themselves) however statically updated
// things need to be "scraped".
//
// We proceed by, first updating all the static metrics using `scrape_for_metrics(..)`. Then,
// using `lighthouse_metrics::gather(..)` to collect the global `DEFAULT_REGISTRY` metrics into
// a string that can be returned via HTTP.
if let Some(beacon_chain) = ctx.chain.as_ref() {
slot_clock::scrape_for_metrics::<T::EthSpec, T::SlotClock>(&beacon_chain.slot_clock);
beacon_chain::scrape_for_metrics(beacon_chain);
}
if let (Some(db_path), Some(freezer_db_path)) =
(ctx.db_path.as_ref(), ctx.freezer_db_path.as_ref())
{
store::scrape_for_metrics(db_path, freezer_db_path);
}
eth2_libp2p::scrape_discovery_metrics();
warp_utils::metrics::scrape_health_metrics();
// It's important to ensure these metrics are explicitly enabled in the case that users aren't
// using glibc and this function causes panics.
if ctx.config.allocator_metrics_enabled {
scrape_allocator_metrics();
}
encoder
.encode(&lighthouse_metrics::gather(), &mut buffer)
.unwrap();
String::from_utf8(buffer).map_err(|e| format!("Failed to encode prometheus info: {:?}", e))
}