This change switches to use the smaller influxdata/influxdb1-client package instead of depending on the whole infuxdb package. The new smaller client is very similar to the influxdb-v2 client, which made it possible to refactor the two reporters to reuse code a lot more.
This PR adds counter metrics for the CPU system and the Geth process.
Currently the only metrics available for these items are gauges. Gauges are
fine when the consumer scrapes metrics data at the same interval as Geth
produces new values (every 3 seconds), but it is likely that most consumers
will not scrape that often. Intervals of 10, 15, or maybe even 30 seconds
are probably more common.
So the problem is, how does the consumer estimate what the CPU was doing in
between scrapes. With a counter, it's easy ... you just subtract two
successive values and divide by the time to get a nice, accurate average.
But with a gauge, you can't do that. A gauge reading is an instantaneous
picture of what was happening at that moment, but it gives you no idea
about what was going on between scrapes. Taking an average of values is
meaningless.
This changes how we read performance metrics from the Go runtime. Instead
of using runtime.ReadMemStats, we now rely on the API provided by package
runtime/metrics.
runtime/metrics provides more accurate information. For example, the new
interface has better reporting of memory use. In my testing, the reported
value of held memory more accurately reflects the usage reported by the OS.
The semantics of metrics system/memory/allocs and system/memory/frees have
changed to report amounts in bytes. ReadMemStats only reported the count of
allocations in number-of-objects. This is imprecise: 'tiny objects' are not
counted because the runtime allocates them in batches; and certain
improvements in allocation behavior, such as struct size optimizations,
will be less visible when the number of allocs doesn't change.
Changing allocation reports to be in bytes makes it appear in graphs that
lots more is being allocated. I don't think that's a problem because this
metric is primarily interesting for geth developers.
The metric system/memory/pauses has been changed to report statistical
values from the histogram provided by the runtime. Its name in influxdb has
changed from geth.system/memory/pauses.meter to
geth.system/memory/pauses.histogram.
We also have a new histogram metric, system/cpu/schedlatency, reporting the
Go scheduler latency.
This enables the following linters
- typecheck
- unused
- staticcheck
- bidichk
- durationcheck
- exportloopref
- gosec
WIth a few exceptions.
- We use a deprecated protobuf in trezor. I didn't want to mess with that, since I cannot meaningfully test any changes there.
- The deprecated TypeMux is used in a few places still, so the warning for it is silenced for now.
- Using string type in context.WithValue is apparently wrong, one should use a custom type, to prevent collisions between different places in the hierarchy of callers. That should be fixed at some point, but may require some attention.
- The warnings for using weak random generator are squashed, since we use a lot of random without need for cryptographic guarantees.