Commit Graph

4146 Commits

Author SHA1 Message Date
Paul Hauner
2b2a358522 Detailed validator monitoring (#2151)
## Issue Addressed

- Resolves #2064

## Proposed Changes

Adds a `ValidatorMonitor` struct which provides additional logging and Grafana metrics for specific validators.

Use `lighthouse bn --validator-monitor` to automatically enable monitoring for any validator that hits the [subnet subscription](https://ethereum.github.io/eth2.0-APIs/#/Validator/prepareBeaconCommitteeSubnet) HTTP API endpoint.

Also, use `lighthouse bn --validator-monitor-pubkeys` to supply a list of validators which will always be monitored.

See the new docs included in this PR for more info.

## TODO

- [x] Track validator balance, `slashed` status, etc.
- [x] ~~Register slashings in current epoch, not offense epoch~~
- [ ] Publish Grafana dashboard, update TODO link in docs
- [x] ~~#2130 is merged into this branch, resolve that~~
2021-01-20 19:19:38 +00:00
Paul Hauner
1eb0915301 Fix bug from #2163 (#2165)
## Issue Addressed

NA

## Proposed Changes

Fixes a bug that I missed during a review in #2163. I found this bug by observing that nodes were receiving far less attestations (~1/2 of previous).

I'm not certain on *exactly* how this mistake manifested in a reduction in attestations, but the mistake touches so much code that I think it's reasonable to declare that this it the cause of the observed issue (drop in attestations).

## Additional Info

NA
2021-01-20 10:28:12 +00:00
Paul Hauner
b06559ae97 Disallow attestation production earlier than head (#2130)
## Issue Addressed

The non-finality period on Pyrmont between epochs [`9114`](https://pyrmont.beaconcha.in/epoch/9114) and [`9182`](https://pyrmont.beaconcha.in/epoch/9182) was contributed to by all the `lighthouse_team` validators going down. The nodes saw excessive CPU and RAM usage, resulting in the system to kill the `lighthouse bn` process. The `Restart=on-failure` directive for `systemd` caused the process to bounce in ~10-30m intervals.

Diagnosis with `heaptrack` showed that the `BeaconChain::produce_unaggregated_attestation` function was calling `store::beacon_state::get_full_state` and sometimes resulting in a tree hash cache allocation. These allocations were approximately the size of the hosts physical memory and still allocated when `lighthouse bn` was killed by the OS.

There was no CPU analysis (e.g., `perf`), but the `BeaconChain::produce_unaggregated_attestation` is very CPU-heavy so it is reasonable to assume it is the cause of the excessive CPU usage, too.

## Proposed Changes

`BeaconChain::produce_unaggregated_attestation` has two paths:

1. Fast path: attesting to the head slot or later.
2. Slow path: attesting to a slot earlier than the head block.

Path (2) is the only path that calls `store::beacon_state::get_full_state`, therefore it is the path causing this excessive CPU/RAM usage.

This PR removes the current functionality of path (2) and replaces it with a static error (`BeaconChainError::AttestingPriorToHead`).

This change reduces the generality of `BeaconChain::produce_unaggregated_attestation` (and therefore [`/eth/v1/validator/attestation_data`](https://ethereum.github.io/eth2.0-APIs/#/Validator/produceAttestationData)), but I argue that this functionality is an edge-case and arguably a violation of the [Honest Validator spec](https://github.com/ethereum/eth2.0-specs/blob/dev/specs/phase0/validator.md).

It's possible that a validator goes back to a prior slot to "catch up" and submit some missed attestations. This change would prevent such behaviour, returning an error. My concerns with this catch-up behaviour is that it is:

- Not specified as "honest validator" attesting behaviour.
- Is behaviour that is risky for slashing (although, all validator clients *should* have slashing protection and will eventually fail if they do not).
- It disguises clock-sync issues between a BN and VC.

## Additional Info

It's likely feasible to implement path (2) if we implement some sort of caching mechanism. This would be a multi-week task and this PR gets the issue patched in the short term. I haven't created an issue to add path (2), instead I think we should implement it if we get user-demand.
2021-01-20 06:52:37 +00:00
Paul Hauner
d9f940613f Represent slots in secs instead of millisecs (#2163)
## Issue Addressed

NA

## Proposed Changes

Copied from #2083, changes the config milliseconds_per_slot to seconds_per_slot to avoid errors when slot duration is not a multiple of a second. To avoid deserializing old serialized data (with milliseconds instead of seconds) the Serialize and Deserialize derive got removed from the Spec struct (isn't currently used anyway).

This PR replaces #2083 for the purpose of fixing a merge conflict without requiring the input of @blacktemplar.

## Additional Info

NA


Co-authored-by: blacktemplar <blacktemplar@a1.net>
2021-01-19 09:39:51 +00:00
Paul Hauner
46cb6e204c Add lcli command to replace state pubkeys (#1999)
## Issue Addressed

NA

## Proposed Changes

Adds a command to replace all the pubkeys in a state with one generated from a mnemonic.

## Additional Info

This is not production code, it's only for testing.
2021-01-19 08:42:30 +00:00
Paul Hauner
805e152f66 Simplify enum -> str with strum (#2164)
## Issue Addressed

NA

## Proposed Changes

As per #2100, uses derives from the sturm library to implement AsRef<str> and AsStaticRef to easily get str values from enums without creating new Strings. Furthermore unifies all attestation error counter into one IntCounterVec vector.

These works are originally by @blacktemplar, I've just created this PR so I can resolve some merge conflicts.

## Additional Info

NA


Co-authored-by: blacktemplar <blacktemplar@a1.net>
2021-01-19 06:33:58 +00:00
Paul Hauner
8892114f52 Modify proto array loop (#2154)
## Issue Addressed

NA

## Proposed Changes

As discussed with @protolambda, add an additional loop inside proto_array to ensure weights are coherent.

## Additional Info

NA
2021-01-19 03:50:12 +00:00
realbigsean
51f7724c76 Automate docker version tag (#2150)
## Issue Addressed

N/A

## Proposed Changes

On any tag formatted `v*`, a full multi-arch docker build will be kicked off and automatically pushed to docker hub with the version tag.

This is a bit repetitive, because the image built will usually be the same as the image built on pushes to `stable`, but it seems like the simplest way to go about it and this will also work if we incorporate a workflow with `vX.X.X-rc` tags. 

## Additional Info

This may also need to wait for env variable updates: https://github.com/sigp/lighthouse/pull/2135#issuecomment-754977433

Co-authored-by: realbigsean <seananderson33@gmail.com>
2021-01-19 03:50:10 +00:00
Taneli Hukkinen
9cdfa94ba4 Update docs: Change --beacon-node to --beacon-nodes (#2145)
## Issue Addressed

The docs use the deprecated `--beacon-node` flag

## Proposed Changes

Reference the new `--beacon-nodes` flag in docs
2021-01-19 03:50:08 +00:00
Akihito Nakano
3d07934ca0 Fix: end_slot returns incorrect value (#2138)
## Issue Addressed

`Epoch::end_slot()` returns incorrect value when the epoch is the last epoch which can be represented by u64.

```rust
        let slots_per_epoch = 32;

        // The last epoch which can be represented by u64.
        let epoch = Epoch::new(u64::max_value() / slots_per_epoch);

        println!("{}", epoch.end_slot(slots_per_epoch));
       // Slot(18446744073709551614)
       // -> correctly, the result should be `Slot(18446744073709551615)`.
```
2021-01-19 03:50:06 +00:00
Akihito Nakano
a8d040c821 Fix timing issue in obtaining the Fork (#2158)
## Issue Addressed

Related PR: https://github.com/sigp/lighthouse/pull/2137#issuecomment-754712492

The Fork is required for VC to perform signing. Currently, it is not guaranteed that the Fork has been obtained at the point of the signing as the Fork is obtained at after ForkService starts. We will see the [error](851a4dca3c/validator_client/src/validator_store.rs (L127)) if VC could not perform the signing due to the timing issue.

> Unable to get Fork for signing

## Proposed Changes

Obtain the Fork on `init_from_beacon_node` to fix the timing issue.
2021-01-19 02:54:18 +00:00
realbigsean
908c8eadf3 remove protected environment (#2135)
## Issue Addressed

N/A

## Proposed Changes

Remove Github Action environments

## Additional Info

N/A


Co-authored-by: realbigsean <seananderson33@gmail.com>
2021-01-19 01:29:06 +00:00
realbigsean
7a71977987 Clippy 1.49.0 updates and dht persistence test fix (#2156)
## Issue Addressed

`test_dht_persistence` failing

## Proposed Changes

Bind `NetworkService::start` to an underscore prefixed variable rather than `_`.  `_` was causing it to be dropped immediately

This was failing 5/100 times before this update, but I haven't been able to get it to fail after updating it

Co-authored-by: realbigsean <seananderson33@gmail.com>
2021-01-19 00:34:28 +00:00
Akihito Nakano
e5b1a37110 [simulator] Fix race condition when creating LocalBeaconNode (#2137)
## Issue Addressed

We have a race condition when counting the number of beacon nodes. The user could end up seeing a duplicated service name (node_N).

## Proposed Changes

I have updated to acquire write lock before counting the number of beacon nodes.
2021-01-14 00:04:18 +00:00
Pawan Dhananjay
28238d97b1 Disconnect from peers quicker on internet issues (#2147)
## Issue Addressed

Fixes #2146 

## Proposed Changes

Change ping timeout errors to return `LowToleranceErrors` so that we disconnect faster on internet failures/changes.
2021-01-13 08:09:10 +00:00
realbigsean
14df5d5c32 Use cross in linux x86 64 release flow (#2136)
## Issue Addressed

Resolves  #2120

## Proposed Changes

This updates github actions to use `cross` when compiling linux x86_64 binaries.

## Additional Info

I think we could alternatively be explicit with the version of macOS or ubuntu we are running actions on and that could solve #2120. I'm not sure which method is preferred here though. Github actions supports Ubuntu 16.04

Co-authored-by: realbigsean <seananderson33@gmail.com>
2021-01-12 06:38:22 +00:00
Paul Hauner
1d535659d6 Add docs about redundancy (#2142)
## Issue Addressed

- Resolves #2140

## Proposed Changes

Adds some documentation on the topic of "redundancy".

## Additional Info

NA
2021-01-12 00:26:22 +00:00
realbigsean
423dea169c update smallvec (#2152)
## Issue Addressed

`cargo audit` is failing because of a potential for an overflow in the version of `smallvec` we're using

## Proposed Changes

Update to the latest version of `smallvec`, which has the fix


Co-authored-by: realbigsean <seananderson33@gmail.com>
2021-01-11 23:32:11 +00:00
Arthur Woimbée
851a4dca3c replace tempdir by tempfile (#2143)
## Issue Addressed

Fixes #2141 
Remove [tempdir](https://docs.rs/tempdir/0.3.7/tempdir/) in favor of [tempfile](https://docs.rs/tempfile/3.1.0/tempfile/).

## Proposed Changes

`tempfile` has a slightly different api that makes creating temp folders with a name prefix a chore (`tempdir::TempDir::new("toto")` => `tempfile::Builder::new().prefix("toto").tempdir()`).

So I removed temp folder name prefix where I deemed it not useful.

Otherwise, the functionality is the same.
2021-01-06 06:36:11 +00:00
Age Manning
7e4b190df0 Reduce ping interval (#2132)
## Issue Addressed

#2123

## Description

Reduces the TCP ping interval to increase our responsiveness to peer liveness changes.
2021-01-06 04:35:52 +00:00
Paul Hauner
c2eac8e5bd Remove duplicate log in BN fallback (#2116)
## Issue Addressed

NA

## Proposed Changes

- Removes a duplicated log in the fallback code for the VC.
- Updates the text in the remaining de-duped log.

## Additional Info

Example

```
Dec 23 05:19:54.003 WARN Beacon node is syncing                  endpoint: http://xxxx:5052/, head_slot: 88224, sync_distance: 161774
Dec 23 05:19:54.003 WARN Beacon node is not synced               endpoint: http://xxxxx:5052/
```
2021-01-06 03:01:48 +00:00
realbigsean
588b90157d Ssz state api endpoint (#2111)
## Issue Addressed

Catching up to a recently merged API spec PR: https://github.com/ethereum/eth2.0-APIs/pull/119

## Proposed Changes

- Return an SSZ beacon state on `/eth/v1/debug/beacon/states/{stateId}` when passed this header: `accept: application/octet-stream`.
- requests to this endpoint with no  `accept` header or an `accept` header and a value of `application/json` or `*/*` , or will result in a JSON response

## Additional Info


Co-authored-by: realbigsean <seananderson33@gmail.com>
2021-01-06 03:01:46 +00:00
Samuel E. Moelius
939fa717fd test_decode_malicious_status_message improvements (#2104)
## Issue Addressed

None

## Proposed Changes

* Correct typo in one comment, elaborate some others.
* Add asserts to ensure comments match code.
* Eliminate one unnecessary `clone`.

## Additional Info

None
2021-01-06 01:10:26 +00:00
Samuel E. Moelius
0245ddd37b Fix typo in ssz_snappy.rs comment (#2103)
## Issue Addressed

None

## Proposed Changes

Correct a typo in `ssz_snappy.rs`.

## Additional Info

Pedantry at it finest.
2021-01-06 01:10:24 +00:00
Paul Hauner
f183af20e3 Version v1.0.6 (#2126)
## Issue Addressed

NA

## Proposed Changes

- Bump versions
- Run `cargo update`

## Additional Info

NA
2020-12-28 23:38:02 +00:00
Pawan Dhananjay
32a60578fe Remove default beacon node value from clap (#2121)
## Issue Addressed

Fixes #2118 

## Proposed Changes

Removes the default value in clap for `--beacon-nodes`. 
This was causing issues with cli picking `--beacon-nodes` default even when not specified and overriding `--beacon-node`.
Seems like it was more evident with docker setups because it doesn't use the default `http://localhost:5052` option.

Edit: we already set the default to `http://localhost:5052` here so this shouldn't break any existing setups.
9ed65a64f8/validator_client/src/config.rs (L58) 

## Additional info
Tested this with docker-compose and binaries. Works as expected in both cases.
2020-12-28 08:23:59 +00:00
Michael Sproul
43ac3f7209 Fix slasher database schema migration to v2 (#2125)
## Issue Addressed

Closes #2119

## Proposed Changes

Update the slasher schema version to v2 for the breaking changes to the config introduced in #2079. Implement a migration from v1 to v2 so that users can seamlessly upgrade from any version of Lighthouse <=1.0.5.

Users who deleted their database for v1.0.5 can upgrade to a release including this patch without any manual intervention. Similarly, any users still on v1.0.4 or earlier can now upgrade without having to drop their database.
2020-12-28 05:09:19 +00:00
Akihito Nakano
78d17c3255 Tweak error messages for ease of investigation (#2122)
## Proposed Changes

<!-- Please list or describe the changes introduced by this PR. -->

Tweaked the error message for ease of investigation as `Failed to update eth1 cache` is used in multiple places. 😃
2020-12-28 01:25:33 +00:00
Paul Hauner
9ed65a64f8 Version v1.0.5 (#2117)
## Issue Addressed

NA

## Proposed Changes

- Bump versions to `v1.0.5`
- Run `cargo update`

## Additional Info

NA
2020-12-23 18:52:48 +00:00
Michael Sproul
c5f03f7d56 Tidy slasher logs for known slashings (#2108)
## Proposed Changes

This quiets the slasher logs when ingesting slashings that are already known. Previously we would log an `ERRO` when a slashing was rediscovered locally but had already been submitted on-chain. This is to be expected from time to time, as different users' slashers will run at different times, and it's likely that slashings will make it on-chain before all users have detected them locally.
2020-12-23 07:53:38 +00:00
Age Manning
2931b05582 Update libp2p (#2101)
This is a little bit of a tip-of-the-iceberg PR. It houses a lot of code changes in the libp2p dependency. 

This needs a bit of thorough testing before merging. 

The primary code changes are:
- General libp2p dependency update
- Gossipsub refactor to shift compression into gossipsub providing performance improvements and improved API for handling compression



Co-authored-by: Paul Hauner <paul@paulhauner.com>
2020-12-23 07:53:36 +00:00
realbigsean
b5e81eb6b2 add automated release workflow (#2077)
## Issue Addressed

Resolves #1674 

## Proposed Changes

- Whenever a tag is pushed with the prefix `v` this workflow is triggered
- creates portable and non-portable binaries for linux x86_64, linux aarch64, macOS
  - an attempt at using github actions caching
- signs each binary using GPG
- auto-generates full changelog based on commit messages since the last release
- creates a **draft** release
- hot new formatting (preview [here](https://github.com/realbigsean/lighthouse/releases/tag/v0.9.23))
- has been taking around 35 minutes

## Additional Info

TODOs:
- Figure out how we should automate dockerhub's version tag. 
  - It'd be quickest just to tag `latest`, but we'd need to make sure the docker workflow completes before this starts
- we do the same cross-compile in the `docker` workflow, we could try to use the same binary
- integrate a similar flow for unstable binaries (`-rc` tag?)
- improve caching, potentially use sccache
- if we start using a self-hosted runner this'll require some re-working

Need to add the following secrets to Github: 

- `GPG_PASSPHRASE`
- ~~`GPG_PUBLIC_KEY`~~ hard-coded this, because it was tough manage as a secret
- `GPG_SIGNING_KEY` 


Co-authored-by: realbigsean <seananderson33@gmail.com>
2020-12-23 07:53:34 +00:00
Samuel E. Moelius
3381266998 Eliminate uses of expect in ssz_snappy.rs (#2105)
## Issue Addressed

None

## Proposed Changes

Eliminate three uses of `expect` in `ssz_snappy.rs`.

## Additional Info

None
2020-12-22 02:28:37 +00:00
Pawan Dhananjay
166f617b19 Add docs for /lighthouse/validators/keystore api (#2071)
## Issue Addressed

Resolves #2061 
Resolves #2066 

## Proposed Changes

Document the `/lighthouse/validators/keystore` validator api method. 
The newly generated/imported keystore is always added to the key cache from this function call
65dcdc361b/validator_client/src/validator_store.rs (L105-L109)

which eventually invokes `KeyCache::add` here if enabled
65dcdc361b/validator_client/src/initialized_validators.rs (L192)
2020-12-21 07:43:04 +00:00
Michael Sproul
e5bf2576f1 Optimise tree hash caching for block production (#2106)
## Proposed Changes

`@potuz` on the Eth R&D Discord observed that Lighthouse blocks on Pyrmont were always arriving at other nodes after at least 1 second. Part of this could be due to processing and slow propagation, but metrics also revealed that the Lighthouse nodes were usually taking 400-600ms to even just produce a block before broadcasting it.

I tracked the slowness down to the lack of a pre-built tree hash cache (THC) on the states being used for block production. This was due to using the head state for block production, which lacks a THC in order to keep fork choice fast (cloning a THC takes at least 30ms for 100k validators). This PR modifies block production to clone a state from the snapshot cache rather than the head, which speeds things up by 200-400ms by avoiding the tree hash cache rebuild. In practice this seems to have cut block production time down to 300ms or less. Ideally we could _remove_ the snapshot from the cache (and save the 30ms), but it is required for when we re-process the block after signing it with the validator client.

## Alternatives

I experimented with 2 alternatives to this approach, before deciding on it:

* Alternative 1: ensure the `head` has a tree hash cache. This is too slow, as it imposes a +30ms hit on fork choice, which currently takes ~5ms (with occasional spikes).
* Alternative 2: use `Arc<BeaconSnapshot>` in the snapshot cache and share snapshots between the cache and the `head`. This made fork choice blazing fast (1ms), and block production the same as in this PR, but had a negative impact on block processing which I don't think is worth it. It ended up being necessary to clone the full state from the snapshot cache during block production, imposing the +30ms penalty there _as well_ as in block production.

In contract, the approach in this PR should only impact block production, and it improves it! Yay for pareto improvements 🎉

## Additional Info

This commit (ac59dfa) is currently running on all the Lighthouse Pyrmont nodes, and I've added a dashboard to the Pyrmont grafana instance with the metrics.

In future work we should optimise the attestation packing, which consumes around 30-60ms and is now a substantial contributor to the total.
2020-12-21 06:29:39 +00:00
Paul Hauner
a62dc65ca4 BN Fallback v2 (#2080)
## Issue Addressed

- Resolves #1883

## Proposed Changes

This follows on from @blacktemplar's work in #2018.

- Allows the VC to connect to multiple BN for redundancy.
  - Update the simulator so some nodes always need to rely on their fallback.
- Adds some extra deprecation warnings for `--eth1-endpoint`
- Pass `SignatureBytes` as a reference instead of by value.

## Additional Info

NA

Co-authored-by: blacktemplar <blacktemplar@a1.net>
2020-12-18 09:17:03 +00:00
Pawan Dhananjay
f998eff7ce Subnet discovery fixes (#2095)
## Issue Addressed

N/A

## Proposed Changes

Fixes multiple issues related to discovering of subnet peers.
1. Subnet discovery retries after yielding no results
2. Metadata updates if peer send older metadata
3. peerdb stores the peer subscriptions from gossipsub
2020-12-17 00:39:15 +00:00
realbigsean
ca08fc7831 Revert "add caching to test suite (#2089)" (#2098)
## Issue Addressed

N/A

## Proposed Changes

I didn't realize the `PORTABLE` env variable is only picked up by `install` in the `Makefile` so we are still getting `SIGILL`s:

https://github.com/sigp/lighthouse/runs/1565004525?check_suite_focus=true

## Additional Info



Co-authored-by: realbigsean <seananderson33@gmail.com>
2020-12-16 23:29:07 +00:00
blacktemplar
3fcc517993 Fix Syncing Simulator (#2049)
## Issue Addressed

NA

## Proposed Changes

Fixes problems with slot times below 1 second which got revealed by running the syncing simulator with the default speedup time.
2020-12-16 05:37:38 +00:00
Michael Sproul
da1c5fe69d Delete uncompressed genesis states (#2092)
## Issue Addressed

Replaces #2091

## Proposed Changes

* Delete the uncompressed genesis states from `eth2_network_config` after they were merged accidentally in #2029.
* Tweak the build script to not overwrite `genesis.ssz` on every build, which caused spurious rebuilds.
2020-12-16 03:44:05 +00:00
realbigsean
80f47fcfff add caching to test suite (#2089)
## Issue Addressed

N/A

## Proposed Changes

Add some caching to the test suite and to the aarch64 cross-compile in the docker build. 

## Additional Info

Cache hits only occur if the Cargo.lock file is unchanged, Github Actions runner OS matches, and the cache is "in scope". Some documentation on github actions cache scoping is here:

https://docs.github.com/en/free-pro-team@latest/actions/guides/caching-dependencies-to-speed-up-workflows#matching-a-cache-key

I'm not sure how frequently we'll get cache hits, I imagine only on smaller PR's or updates to the same PR.  And there is a cache size limit that we may end up reaching quickly.  But Github actions handles evictions if we go over that limit. 

Not sure how much of an impact this will end up having but I don't really see a downside to trying it out.

Co-authored-by: realbigsean <seananderson33@gmail.com>
2020-12-16 03:44:03 +00:00
Michael Sproul
0c529b8d52 Add slasher broadcast (#2079)
## Issue Addressed

Closes #2048

## Proposed Changes

* Broadcast slashings when the `--slasher-broadcast` flag is provided.
* In the process of implementing this I refactored the slasher service into its own crate so that it could access the network code without creating a circular dependency. I moved the responsibility for putting slashings into the op pool into the service as well, as it makes sense for it to handle the whole slashing lifecycle.
2020-12-16 03:44:01 +00:00
Pawan Dhananjay
63eeb14a81 Improve eth1 fallback logging (#2096)
## Issue Addressed

N/A

## Proposed Changes

There seemed to be confusion among discord users on the eth1 fallback logging
```
WARN Error connecting to eth1 node. Trying fallback ..., endpoint: http://127.0.0.1:8545/, service: eth1_rpc
```
The assumption users seem to be making here is that it is trying the fallback and fallback=endpoint in the log.

This PR improves the logging to be like
```
WARN Error connecting to eth1 node endpoint, endpoint: http://127.0.0.1:8545/, action: trying fallbacks, service: eth1_rpc
```

I think this is a bit more clear that the endpoint that failed is the one in the log.
2020-12-16 02:39:09 +00:00
divma
11c299cbf6 impl Resource Unavailable RPC error (#2072)
## Issue Addressed

Related to #1891, The error is not in the spec yet (see ethereum/eth2.0-specs#2131)

## Proposed Changes

Implement the proposed error, banning peers that send it

## Additional Info

NA
2020-12-15 00:17:32 +00:00
blacktemplar
701843aaa0 Update dependencies (#2084)
## Issue Addressed

Partially addresses dependencies mentioned in issue #1712.

## Proposed Changes

Updates dependencies (including an update avoiding a vulnerability) + add tokio compatibility to `remote_signer_test`
2020-12-14 02:28:19 +00:00
realbigsean
c1e27f4c89 Improve docker auto builds (#2078)
## Issue Addressed

N/A

## Proposed Changes

- hardcode `ubuntu-18.04` -- I don't think this was causing us issues, but github actions is in the process of migrating `ubuntu-latest` from Ubuntu 18 -> 20.. so just in case
- different source of emulation dependencies -> https://github.com/tonistiigi/binfmt 
  - this one is explicitly referenced in the `buildx` github docs
- install emulation dependencies and run `docker buildx` in the same `run` command
- enable `buildx` with  `DOCKER_CLI_EXPERIMENTAL: enabled` rather than re-building it

## Additional Info

N/A


Co-authored-by: realbigsean <seananderson33@gmail.com>
2020-12-11 00:19:35 +00:00
Michael Sproul
1abc70e815 Version v1.0.4 (#2073)
## Proposed Changes

Run cargo update and bump version in prep for v1.0.4 release

## Additional Info

Planning to merge this commit to `unstable`, test on Pyrmont and canary nodes, then push to `stable`.
2020-12-10 04:01:40 +00:00
Age Manning
dfb588e521 Softer penalties for missing blocks (#2075)
## Issue Addressed

Users are reporting errors for sending attestations to peers. If the clock sync is a little out or we receive attestations before blocks, peers are being too harshly penalized. They can get scored many times per missing block and we typically need these peers on subnets. 


## Proposed Changes

This removes the penalization for missing blocks with attestations. The penalty should be handled when #635 gets built as it will allow us to group attestations per missing block and penalize once.
2020-12-10 00:40:12 +00:00
realbigsean
adbd49ddc6 Multiarch docker GitHub actions (#2065)
## Issue Addressed

Resolves #1512

## Proposed Changes

- Adds a new docker Github Actions workflow  
- Removes the Dockerhub hook
- Adds a new Dockerfile for use with pre-existing cross-compiled binaries 
- on pushes to `unstable` 
  - builds an ARM64 image and tags it `latest-arm64-unstable`
  - builds an AMD64 image and tags it `latest-amd64-unstable`
  - builds an multiarch image by creating a manifest list referencing the prior two images and tags it `latest-unstable`
- on pushes to `stable` 
  - builds an ARM64 image and tags it `latest-arm64`
  - builds an AMD64 image and tags it `latest-amd64`
  - builds an multiarch image by creating a manifest list referencing the prior two images and tags it `latest`

## Additional Info
- for ARM64, first `cross` is used to cross compile the `lighthouse` and  `lcli` binaries, then `docker buildx` is installed to actually build the docker image for the correct target platform. The image build pretty much just copies the binaries from local into the docker image (thanks @michaelsproul :) )
- The AMD64 and ARM64 builds run in parallel, in total it's been taking around 45mins on a local runner
- This PR does **not** cover version tags on docker images at the moment

Co-authored-by: realbigsean <seananderson33@gmail.com>
2020-12-09 06:06:37 +00:00
Michael Sproul
aa45fa3ff7 Revert fork choice if disk write fails (#2068)
## Issue Addressed

Closes #2028
Replaces #2059

## Proposed Changes

If writing to the database fails while importing a block, revert fork choice to the last version stored on disk. This prevents fork choice from being ahead of the blocks on disk. Having fork choice ahead is particularly bad if it is later successfully written to disk, because it renders the database corrupt (see #2028).

## Additional Info

* This mitigation might fail if the head+fork choice haven't been persisted yet, which can only happen at first startup (see #2067)
* This relies on it being OK for the head tracker to be ahead of fork choice. I figure this is tolerable because blocks only get added to the head tracker after successfully being written on disk _and_ to fork choice, so even if fork choice reverts a little bit, when the pruning algorithm runs, those blocks will still be on disk and OK to prune. The pruning algorithm also doesn't rely on heads being unique, technically it's OK for multiple blocks from the same linear chain segment to be present in the head tracker. This begs the question of #1785 (i.e. things would be simpler with the head tracker out of the way). Alternatively, this PR could just revert the head tracker as well (I'll look into this tomorrow).
2020-12-09 05:10:34 +00:00