Commit Graph

89 Commits

Author SHA1 Message Date
Péter Szilágyi
65a17c00c7
metrics: add support for enabling metrics from env vars (#28118) 2023-09-14 13:56:06 +03:00
Martin Holst Swende
8b6cf128af
metrics: refactor metrics (#28035)
This change includes a lot of things, listed below. 

### Split up interfaces, write vs read

The interfaces have been split up into one write-interface and one read-interface, with `Snapshot` being the gateway from write to read. This simplifies the semantics _a lot_. 

Example of splitting up an interface into one readonly 'snapshot' part, and one updatable writeonly part: 

```golang
type MeterSnapshot interface {
	Count() int64
	Rate1() float64
	Rate5() float64
	Rate15() float64
	RateMean() float64
}

// Meters count events to produce exponentially-weighted moving average rates
// at one-, five-, and fifteen-minutes and a mean rate.
type Meter interface {
	Mark(int64)
	Snapshot() MeterSnapshot
	Stop()
}
```

### A note about concurrency

This PR makes the concurrency model clearer. We have actual meters and snapshot of meters. The `meter` is the thing which can be accessed from the registry, and updates can be made to it. 

- For all `meters`, (`Gauge`, `Timer` etc), it is assumed that they are accessed by different threads, making updates. Therefore, all `meters` update-methods (`Inc`, `Add`, `Update`, `Clear` etc) need to be concurrency-safe. 
- All `meters` have a `Snapshot()` method. This method is _usually_ called from one thread, a backend-exporter. But it's fully possible to have several exporters simultaneously: therefore this method should also be concurrency-safe. 

TLDR: `meter`s are accessible via registry, all their methods must be concurrency-safe. 

For all `Snapshot`s, it is assumed that an individual exporter-thread has obtained a `meter` from the registry, and called the `Snapshot` method to obtain a readonly snapshot. This snapshot is _not_ guaranteed to be concurrency-safe. There's no need for a snapshot to be concurrency-safe, since exporters should not share snapshots. 

Note, though: that by happenstance a lot of the snapshots _are_ concurrency-safe, being unmutable minimal representations of a value. Only the more complex ones are _not_ threadsafe, those that lazily calculate things like `Variance()`, `Mean()`.

Example of how a background exporter typically works, obtaining the snapshot and sequentially accessing the non-threadsafe methods in it: 
```golang
		ms := metric.Snapshot()
                ...
		fields := map[string]interface{}{
			"count":    ms.Count(),
			"max":      ms.Max(),
			"mean":     ms.Mean(),
			"min":      ms.Min(),
			"stddev":   ms.StdDev(),
			"variance": ms.Variance(),
```

TLDR: `snapshots` are not guaranteed to be concurrency-safe (but often are).

### Sample changes

I also changed the `Sample` type: previously, it iterated the samples fully every time `Mean()`,`Sum()`, `Min()` or `Max()` was invoked. Since we now have readonly base data, we can just iterate it once, in the constructor, and set all four values at once. 

The same thing has been done for runtimehistogram. 

### ResettingTimer API

Back when ResettingTImer was implemented, as part of https://github.com/ethereum/go-ethereum/pull/15910, Anton implemented a `Percentiles` on the new type. However, the method did not conform to the other existing types which also had a `Percentiles`. 

1. The existing ones, on input, took `0.5` to mean `50%`. Anton used `50` to mean `50%`. 
2. The existing ones returned `float64` outputs, thus interpolating between values. A value-set of `0, 10`, at `50%` would return `5`, whereas Anton's would return either `0` or `10`. 

This PR removes the 'new' version, and uses only the 'legacy' percentiles, also for the ResettingTimer type. 

The resetting timer snapshot was also defined so that it would expose the internal values. This has been removed, and getters for `Max, Min, Mean` have been added instead. 

### Unexport types

A lot of types were exported, but do not need to be. This PR unexports quite a lot of them.
2023-09-13 13:13:47 -04:00
Jorge
53f3c2ae65
metrics, cmd/geth: informational metrics (prometheus, influxdb, opentsb) (#24877)
This chang creates a GaugeInfo metrics type for registering informational (textual) metrics, e.g. geth version number. It also improves the testing for backend-exporters, and uses a shared subpackage in 'internal' to provide sample datasets and ordered registry. 

Implements #21783

---------

Co-authored-by: Martin Holst Swende <martin@swende.se>
2023-08-31 13:37:17 -04:00
Péter Szilágyi
be65b47645
all: update golang/x/ext and fix slice sorting fallout (#27909)
The Go authors updated golang/x/ext to change the function signature of the slices sort method. 
It's an entire shitshow now because x/ext is not tagged, so everyone's codebase just 
picked a new version that some other dep depends on, causing our code to fail building.

This PR updates the dep on our code too and does all the refactorings to follow upstream...
2023-08-12 00:04:12 +02:00
Ömer Faruk Irmak
34d5072159
metrics: NilResettingTimer.Time should execute the timed function (#27724) 2023-07-14 19:19:03 +02:00
Ömer Faruk Irmak
13c0305106
metrics: NilTimer should still run the function to be timed (#27723) 2023-07-14 18:10:16 +02:00
Dan Laine
4367ab499f
metrics: use slices package for sorting (#27493)
Co-authored-by: Felix Lange <fjl@twurst.com>
2023-06-19 08:53:15 +02:00
Exca-DK
a340721aa9
metrics: use sync.map in registry (#27159) 2023-05-11 05:39:13 -04:00
s7v7nislands
ae93e0b484
metrics: use atomic type (#27121) 2023-04-20 03:36:54 -04:00
Exca-DK
b4dcd1a391
metrics: make gauge_float64 and counter_float64 lock free (#27025)
Makes the float-gauges lock-free

name                      old time/op  new time/op  delta
CounterFloat64Parallel-8  1.45µs ±10%  0.85µs ± 6%  -41.65%  (p=0.008 n=5+5)

---------

Co-authored-by: Exca-DK <dev@DESKTOP-RI45P4J.localdomain>
Co-authored-by: Martin Holst Swende <martin@swende.se>
2023-04-04 09:53:44 -04:00
Delweng
117530b0e6
metrics/librato: ensure resp.body closed (#26969)
This change ensures that we call Close on a http response body, in various places in the source code (mostly tests)
2023-03-27 07:44:41 -04:00
Martin Holst Swende
f6c3a534a4
metrics/influxdb: use smaller dependency and reuse code between v1 and v2 reporters (#26963)
This change switches to use the smaller influxdata/influxdb1-client package instead of depending on the whole infuxdb package. The new smaller client is very similar to the influxdb-v2 client, which made it possible to refactor the two reporters to reuse code a lot more.
2023-03-23 15:12:32 -04:00
turboboost55
7dc100714d
metrics: add cpu counters (#26796)
This PR adds counter metrics for the CPU system and the Geth process.
Currently the only metrics available for these items are gauges. Gauges are
fine when the consumer scrapes metrics data at the same interval as Geth
produces new values (every 3 seconds), but it is likely that most consumers
will not scrape that often. Intervals of 10, 15, or maybe even 30 seconds
are probably more common.

So the problem is, how does the consumer estimate what the CPU was doing in
between scrapes. With a counter, it's easy ... you just subtract two
successive values and divide by the time to get a nice, accurate average.
But with a gauge, you can't do that. A gauge reading is an instantaneous
picture of what was happening at that moment, but it gives you no idea
about what was going on between scrapes. Taking an average of values is
meaningless.
2023-03-23 14:13:50 +01:00
turboboost55
544e4a700b
metrics: improve accuracy of CPU gauges (#26793)
This PR changes metrics collection to actually measure the time interval between collections, rather
than assume 3 seconds. I did some ad hoc profiling, and on slower hardware (eg, my Raspberry Pi 4)
I routinely saw intervals between 3.3 - 3.5 seconds, with some being as high as 4.5 seconds. This
will generally cause the CPU gauge readings to be too high, and in some cases can cause impossibly
large values for the CPU load metrics (eg. greater than 400 for a 4 core CPU).

---------

Co-authored-by: Felix Lange <fjl@twurst.com>
2023-03-07 00:29:48 +01:00
Martin Holst Swende
4d3525610e
all: remove deprecated uses of math.rand (#26710)
This PR is a (superior) alternative to https://github.com/ethereum/go-ethereum/pull/26708, it handles deprecation, primarily two specific cases. 

`rand.Seed` is typically used in two ways
- `rand.Seed(time.Now().UnixNano())` -- we seed it, just to be sure to get some random, and not always get the same thing on every run. This is not needed, with global seeding, so those are just removed. 
- `rand.Seed(1)` this is typically done to ensure we have a stable test. If we rely on this, we need to fix up the tests to use a deterministic prng-source. A few occurrences like this has been replaced with a proper custom source. 

`rand.Read` has been replaced by `crypto/rand`.`Read` in this PR.
2023-02-16 14:36:58 -05:00
Shude Li
163e996d0e
all: use http package to replace http method names (#26535) 2023-01-24 11:12:25 +02:00
ucwong
297ec0669d
metrics/influxdb: fix time ticker leaks (#26507) 2023-01-17 13:45:35 +01:00
Felix Lange
c539bda166
metrics: improve reading Go runtime metrics (#25886)
This changes how we read performance metrics from the Go runtime. Instead
of using runtime.ReadMemStats, we now rely on the API provided by package
runtime/metrics.

runtime/metrics provides more accurate information. For example, the new
interface has better reporting of memory use. In my testing, the reported
value of held memory more accurately reflects the usage reported by the OS.

The semantics of metrics system/memory/allocs and system/memory/frees have
changed to report amounts in bytes. ReadMemStats only reported the count of
allocations in number-of-objects. This is imprecise: 'tiny objects' are not
counted because the runtime allocates them in batches; and certain
improvements in allocation behavior, such as struct size optimizations,
will be less visible when the number of allocs doesn't change.

Changing allocation reports to be in bytes makes it appear in graphs that
lots more is being allocated. I don't think that's a problem because this
metric is primarily interesting for geth developers.

The metric system/memory/pauses has been changed to report statistical
values from the histogram provided by the runtime. Its name in influxdb has
changed from geth.system/memory/pauses.meter to
geth.system/memory/pauses.histogram.

We also have a new histogram metric, system/cpu/schedlatency, reporting the
Go scheduler latency.
2022-11-11 13:16:13 +01:00
Felix Lange
b628d72766
build: upgrade to go 1.19 (#25726)
This changes the CI / release builds to use the latest Go version. It also
upgrades golangci-lint to a newer version compatible with Go 1.19.

In Go 1.19, godoc has gained official support for links and lists. The
syntax for code blocks in doc comments has changed and now requires a
leading tab character. gofmt adapts comments to the new syntax
automatically, so there are a lot of comment re-formatting changes in this
PR. We need to apply the new format in order to pass the CI lint stage with
Go 1.19.

With the linter upgrade, I have decided to disable 'gosec' - it produces
too many false-positive warnings. The 'deadcode' and 'varcheck' linters
have also been removed because golangci-lint warns about them being
unmaintained. 'unused' provides similar coverage and we already have it
enabled, so we don't lose much with this change.
2022-09-10 13:25:40 +02:00
Justin Traglia
2c5648d891
all: fix some typos (#25551)
* Fix some typos

* Fix some mistakes

* Revert 4byte.json

* Fix an incorrect fix

* Change files to fails
2022-08-19 09:00:21 +03:00
Delweng
b196ad1c16
all: add whitespace linter (#25312)
* golangci: typo

Signed-off-by: Delweng <delweng@gmail.com>

* golangci: add whietspace

Signed-off-by: Delweng <delweng@gmail.com>

* *: rm whitesapce using golangci-lint

Signed-off-by: Delweng <delweng@gmail.com>

* cmd/puppeth: revert accidental resurrection

Co-authored-by: Péter Szilágyi <peterke@gmail.com>
2022-07-25 13:14:03 +03:00
Martin Holst Swende
a907d7e81a
all: more linters (#24783)
This enables the following linters

- typecheck
- unused
- staticcheck
- bidichk
- durationcheck
- exportloopref
- gosec

WIth a few exceptions.

- We use a deprecated protobuf in trezor. I didn't want to mess with that, since I cannot meaningfully test any changes there.
- The deprecated TypeMux is used in a few places still, so the warning for it is silenced for now.
- Using string type in context.WithValue is apparently wrong, one should use a custom type, to prevent collisions between different places in the hierarchy of callers. That should be fixed at some point, but may require some attention.
- The warnings for using weak random generator are squashed, since we use a lot of random without need for cryptographic guarantees.
2022-06-13 16:24:45 +02:00
rjl493456442
59ac229f87
core/state/snapshot: detect and clean up dangling storage snapshot in generation (#24811)
* core/state/snapshot: check dangling storages when generating snapshot

* core/state/snapshot: polish

* core/state/snapshot: wipe the last part of the dangling storages

* core/state/snapshot: fix and add tests

* core/state/snapshot: fix comment

* README: remove mentions of fast sync (#24656)

Co-authored-by: Marius van der Wijden <m.vanderwijden@live.de>

* core, cmd: expose dangling storage detector for wider usage

* core/state/snapshot: rename variable

* core, ethdb: use global iterators for snapshot generation

* core/state/snapshot: polish

* cmd, core/state/snapshot: polish

* core/state/snapshot: polish

* Update core/state/snapshot/generate.go

Co-authored-by: Martin Holst Swende <martin@swende.se>

* ethdb: extend db test suite and fix memorydb iterator

* ethdb/dbtest: rollback changes

* ethdb/memorydb: simplify iteration

* core/state/snapshot: update dangling counter

* core/state/snapshot: release iterators

* core/state/snapshot: update metrics

* core/state/snapshot: update time metrics

* metrics/influxdb: temp solution to present counter meaningfully, remove it

* add debug log, revert later

* core/state/snapshot: fix iterator panic

* all: customized snapshot iterator for backward iteration

* core, ethdb: polish

* core/state/snapshot: remove debug log

* core/state/snapshot: address comments from peter

* core/state/snapshot: reopen the iterator at the next position

* ethdb, core/state/snapshot: address comment from peter

* core/state/snapshot: reopen exhausted iterators

Co-authored-by: Tbnoapi <63448616+nuoomnoy02@users.noreply.github.com>
Co-authored-by: Marius van der Wijden <m.vanderwijden@live.de>
Co-authored-by: Martin Holst Swende <martin@swende.se>
2022-05-23 13:26:22 +03:00
Håvard Anda Estensen
07508ac0e9
all: replace uses of ioutil with io and os (#24869) 2022-05-16 11:59:35 +02:00
s7v7nislands
7caa2d8163
all: replace strings.Replace with string.ReplaceAll (#24835) 2022-05-09 13:13:23 +03:00
Felix Lange
8a134014b4
all: add go:build lines (#23468)
Generated by go1.17 fmt ./...
2021-08-25 18:46:29 +02:00
Felix Lange
a789dcc978
metrics: fix compilation for GOOS=js (#23449) 2021-08-24 21:54:55 +03:00
jwasinger
6902485767
cmd, metrics: add support for influxdb-v2 (cherry-picking from italoacasas' changes), leave existing support for v1 to maintain backwards-compatibility. (#23194)
This PR adds flag to enable InfluxDB v2 (--metrics.influxdbv2), flags for v2-specific features (--metrics.influxdb.token, --metrics.influxdb.bucket), also carries over addition of support for specifying organization (--metrics.influxdb.organization), but still retains backwards compatibility with InfluxDB v1.
2021-08-17 18:40:14 +02:00
Mathijs de Bruin
2dee31930c
metrics: use golang.org/x/sys/unix to support Solaris (#22584)
Fixes #11113

Co-authored-by: rene <41963722+renaynay@users.noreply.github.com>
2021-06-01 10:50:54 +02:00
Péter Szilágyi
62379f02c6
metrics/influxdb: don't push empty histograms, no measurement != 0 2021-03-26 21:13:52 +02:00
Péter Szilágyi
2550e46269
eth/protocols, metrics: use resetting histograms for rare packets 2021-03-26 16:14:12 +02:00
Péter Szilágyi
6d7ff6acea
eth/protocols, metrics, p2p: add handler performance metrics 2021-03-26 14:00:06 +02:00
isdyaufh8o7cq
477fd420b3
metrics: fix cast omission in cpu_syscall.go (#22262)
fixes an regression which caused build failure on certain platforms
2021-02-08 11:36:49 +01:00
Alex Prut
ef84da8481
all: remove unneeded parentheses (#21921)
* remove uneeded convertion type

* remove redundant type in composite literal

* omit explicit type where implicit

* remove unused redundant parenthesis

* remove redundant import alias duktape
2021-02-02 11:32:44 +02:00
Marius van der Wijden
10555d4684
cmd/geth: dump config for metrics (#22083)
* cmd/geth: dump config

* cmd/geth: dump config

* cmd/geth: properly read config again

* cmd/geth: override metrics if flags are set

* cmd/geth: write metrics regardless if enabled

* cmd/geth: renamed to metricsfromcliargs

* metrics: add default configuration
2021-01-18 14:36:05 +01:00
gary rong
b9ff57c59e
metrics: fix the panic for reading empty cpu stats (#21864) 2020-11-18 21:50:11 +01:00
Marius van der Wijden
4e54b1a45e
metrics: zero temp variable in updateMeter (#21470)
* metrics: zero temp variable in  updateMeter

Previously the temp variable was not updated properly after summing it to count.
This meant we had astronomically high metrics, now we zero out the temp whenever we
sum it onto the snapshot count

* metrics: move temp variable to be aligned, unit tests

Moves the temp variable in MeterSnapshot to be 64-bit aligned because of the atomic bug.
Adds a unit test, that catches the previous bug.
2020-08-21 11:04:36 +03:00
Marius van der Wijden
f3bafecef7
metrics: make meter updates lock-free (#21446) 2020-08-18 11:27:04 +02:00
meowsbits
490b380a04
cmd/geth: allow configuring metrics HTTP server on separate endpoint (#21290)
Exposing /debug/metrics and /debug/metrics/prometheus was dependent
on --pprof, which also exposes other HTTP APIs. This change makes it possible
to run the metrics server on an independent endpoint without enabling pprof.
2020-07-03 19:12:22 +02:00
rene
a35382de94
metrics: replace gosigar with gopsutil (#21041)
* replace gosigar with gopsutil

* removed check for whether GOOS is openbsd

* removed accidental import of runtime

* potential fix for difference in units between gosig and gopsutil

* fixed lint error

* remove multiplication factor

* uses cpu.ClocksPerSec as the multiplication factor

* changed dependency from shirou to renaynay (#20)

* updated dep

* switching back from using renaynay fork to using upstream as PRs were merged on upstream

* removed empty line

* optimized imports

* tidied go mod
2020-06-02 12:08:33 +03:00
Richard Patel
2f66a8d614
metrics/prometheus: define TYPE once, add tests (#21068)
* metrics/prometheus: define type once for histograms

* metrics/prometheus: test collector
2020-05-26 12:00:09 +03:00
ucwong
53e034ce0b
metrics: add missing calls to Ticker.Stop in tests (#20866) 2020-04-02 16:01:18 +02:00
Martin Holst Swende
32d31c31af
metrics: improve TestTimerFunc (#20818)
The test failed due to what appears to be fluctuations in time.Sleep, which is
not the actual method under test. This change modifies it so we compare the
metered Max to the actual time instead of the desired time.
2020-03-31 15:01:16 +02:00
Péter Szilágyi
42e02ac03b
metrics: disable CPU stats (gosigar) on iOS 2020-03-26 11:24:58 +02:00
Guillaume Ballet
58f2ce8671 metrics: fix issues reported by staticcheck (#20365) 2019-11-22 16:04:35 +01:00
Felix Lange
afe0b65405
dashboard: remove the dashboard (#20279)
This removes the dashboard project. The dashboard was an experimental
browser UI for geth which displayed metrics and chain information in
real time. We are removing it because it has marginal utility and nobody
on the team can maintain it.

Removing the dashboard removes a lot of dependency code and shaves
6 MB off the geth binary size.
2019-11-14 10:04:16 +01:00
Guillaume Ballet
de2259d27c travis: enable test suite on ARM64 (#20219)
* travis: Enable ARM support

* Include fixes from 20039

* Add a trace to debug the invalid lookup issue

* Try increasing the timeout to see if the arm test passes

* Investigate the resolver issue

* Increase arm64 timeout for clique test

* increase timeout in tests for arm64

* Only test the failing tests

* Review feedback: don't export epsilon

* Remove investigation tricks+include fjl's feeback

* Revert the retry ahead of using the mock resolver

* Fix rebase errors
2019-11-08 10:58:57 +02:00
Marius Kjærstad
08953e42c1 metrics: change links in README.md to https (#20182) 2019-10-20 12:25:25 +02:00
Péter Szilágyi
72d5a27a39
core, metrics, p2p: switch some invalid counters to gauges 2019-09-10 14:39:07 +03:00
Péter Szilágyi
5298eb7519
metrics: gather and export threads and goroutines 2019-06-17 10:53:17 +03:00