lighthouse

Author	SHA1	Message	Date
Age Manning	cc44a64d15	Limit parallelism of head chain sync (#1527 ) ## Description Currently lighthouse load-balances across peers a single finalized chain. The chain is selected via the most peers. Once synced to the latest finalized epoch Lighthouse creates chains amongst its peers and syncs them all in parallel amongst each peer (grouped by their current head block). This is typically fast and relatively efficient under normal operations. However if the chain has not finalized in a long time, the head chains can grow quite long. Peer's head chains will update every slot as new blocks are added to the head. Syncing all head chains in parallel is a bottleneck and highly inefficient in block duplication leads to RPC timeouts when attempting to handle all new heads chains at once. This PR limits the parallelism of head syncing chains to 2. We now sync at most two head chains at a time. This allows for the possiblity of sync progressing alongside a peer being slow and holding up one chain via RPC timeouts.	2020-08-18 02:49:24 +00:00
divma	46dbf027af	Do not reset batch ids & redownload out of range batches (#1528 ) The changes are somewhat simple but should solve two issues: - When quickly changing between chains once and a second time back again, batchIds would collide and cause havoc. - If we got an out of range response from a peer, sync would remain in syncing but without advancing Changes: - remove the batch id. Identify each batch (inside a chain) by its starting epoch. Target epochs for downloading and processing now advance by EPOCHS_PER_BATCH - for the same reason, move the "to_be_downloaded_id" to be an epoch - remove a sneaky line that dropped an out of range batch without downloading it - bonus: put the chain_id in the log given to the chain. This is why explicitly logging the chain_id is removed	2020-08-18 01:29:51 +00:00
Paul Hauner	9a97a0b14f	Prepare for v0.2.4 (#1533 ) ## Issue Addressed NA ## Proposed Changes NA ## Additional Info NA	2020-08-17 12:13:42 +00:00
Michael Sproul	719a69aee0	Ignore blocks that skip a large distance from their parent (#1530 ) ## Proposed Changes To mitigate the impact of minority forks on RAM and disk usage, this change rejects blocks whose parent lies more than 320 slots (10 epochs, ~1 hour) in the past. The behaviour is configurable via `lighthouse bn --max-skip-slots N`, and can be turned off entirely using `--max-skip-slots none`. Co-authored-by: Paul Hauner <paul@paulhauner.com>	2020-08-17 10:54:58 +00:00
Paul Hauner	a58aa6ee55	Revert back to discv5 alpha 8 to maintain ARM support (#1531 ) ## Issue Addressed NA ## Proposed Changes See title. ## Additional Info NA	2020-08-17 10:06:08 +00:00
Paul Hauner	f85485884f	Process gossip blocks on the GossipProcessor (#1523 ) ## Issue Addressed NA ## Proposed Changes Moves beacon block processing over to the newly-added `GossipProcessor`. This moves the task off the core executor onto the blocking one. ## Additional Info - With this PR, gossip blocks are being ignored during sync.	2020-08-17 09:20:27 +00:00
Paul Hauner	61d5b592cb	Memory usage reduction (#1522 ) ## Issue Addressed NA ## Proposed Changes - Adds a new function to allow getting a state with a bad state root history for attestation verification. This reduces unnecessary tree hashing during attestation processing, which accounted for 23% of memory allocations (by bytes) in a recent `heaptrack` observation. - Don't clone caches on intermediate epoch-boundary states during block processing. - Reject blocks that are known to fork choice earlier during gossip processing, instead of waiting until after state has been loaded (this only happens in edge-case). - Avoid multiple re-allocations by creating a "forced" exact size iterator. ## Additional Info NA	2020-08-17 08:05:13 +00:00
Age Manning	3c689a6837	Remove yamux support (#1526 ) ## Issue Addressed There is currently an issue with yamux when connecting to prysm peers. The source of the issue is currently unknown. This PR removes yamux support to force mplex negotation. We can add back yamux support once we have isolated and corrected the issue.	2020-08-17 05:05:06 +00:00
Age Manning	afdc4fea1d	Correct logic for peer sync identification (#1525 ) Fix a small sync bug which can mis-classify newly connected peers.	2020-08-17 03:00:10 +00:00
Pawan Dhananjay	850a2d5985	Persist metadata and enr across restarts (#1513 ) ## Issue Addressed Resolves #1489 ## Proposed Changes - Change starting metadata seq num to 0 according to the [spec](https://github.com/ethereum/eth2.0-specs/blob/dev/specs/phase0/p2p-interface.md#metadata). - Remove metadata field from `NetworkGlobals` - Persist metadata to disk on every update - Load metadata seq number from disk on restart - Persist enr to disk on update to ensure enr sequence number increments are persisted as well. ## Additional info Since we modified starting metadata seq num to 0 from 1, we might still see `Invalid Sequence number provided` like in #1489 from prysm nodes if they have our metadata cached.	2020-08-17 02:13:28 +00:00
divma	113b40f321	Add multiaddr support in bootnodes (#1481 ) ## Issue Addressed #1384 Only catch, as currently implemented, when dialing the multiaddr nodes, there is no way to ask the peer manager if they are already connected or dialing	2020-08-17 02:13:26 +00:00
Age Manning	99acfb50f2	Update gossipsub duplicate cache (#1524 ) This potentially handles memory leak issues by preventing adding references to already seen gossipsub messages.	2020-08-17 01:27:33 +00:00
Age Manning	c75c06cf16	Update discv5 to alpha.9 (#1517 ) ## Discovery v5 update In this update we remove the openssl dependency in favour of rust-crypto. The update also removes a series of unnecessary async functions which may improve some of the issues we have been experiencing.	2020-08-15 04:02:14 +00:00
Paul Hauner	f4a7311008	Update to v0.2.3 (#1519 ) ## Issue Addressed NA ## Proposed Changes Bump versions to v0.2.3. ## Additional Info NA	2020-08-14 08:32:31 +00:00
Paul Hauner	619ad106cf	Restrict fork choice getters to finalized blocks (#1475 ) ## Issue Addressed - Resolves #1451 ## Proposed Changes - Restricts the `contains_block` and `contains_block` so they only indicate a block is present if it descends from the finalized root. This helps to ensure that fork choice never points to a block that has been pruned from the database. - Resolves #1451 - Before importing a block, double-check that its parent is known and a descendant of the finalized root. - Split a big, monolithic block verification test into smaller tests. ## Additional Notes I suspect there would be a craftier way to do the `is_descendant_of_finalized` check, but we're a bit tight on time now and we can optimize later if it starts showing in benches. ## TODO - [x] Tests	2020-08-14 06:36:38 +00:00
Paul Hauner	b0a3731fff	Introduce a queue for attestations from the network (#1511 ) ## Issue Addressed N/A ## Proposed Changes Introduces the `GossipProcessor`, a multi-threaded (multi-tasked?), non-blocking processor for some messages from the network which require verification and import into the `BeaconChain`. Initial testing indicates that this massively improves system stability by (a) moving block tasks from the normal executor (b) spreading out attestation load. ## Additional Info TBC	2020-08-14 04:38:45 +00:00
Adam Szkoda	05a8399769	Wind down the SSE thread when the client disconnects (#1514 ) These started to appear when I `^C` `curl -N http://localhost:5052/beacon/fork/stream`: `Aug 12 13:00:01.539 ERRO Couldn't stream piece hyper::Error(ChannelClosed), service: http` Something must have changed in hyper since SSE has been implemented because I'm sure I haven't seen those errors before. This PR properly detects a closed SSE stream and cleans up.	2020-08-13 06:12:18 +00:00
Adam Szkoda	8a1a4051cf	Fix a bug in fork pruning (#1507 ) Extracted from https://github.com/sigp/lighthouse/pull/1380 because merging #1380 proves to be contentious. Co-authored-by: Michael Sproul <michael@sigmaprime.io>	2020-08-12 07:00:00 +00:00
Paul Hauner	b063df5bf9	Cross-compile to vendored x86_84, aarch64 (Raspberry Pi 4) (#1497 ) ## Issue Addressed NA ## Proposed Changes Adds support for using the [`cross`](https://github.com/rust-embedded/cross) project to produce cross-compiled binaries using Docker images. Provides quite clean and simple cross-compiles cause all the complexity is hidden in Dockerfiles. It does require you to be in the `docker` group though. ## Details - Adds shortcut commands to `Makefile` - Ensures `reqwest` and `discv5` use vendored openssl libs (i.e., static not shared). - Switches to a [commit](`284f705964`) of blst that has a renamed C function to avoid a collision with openssl (upstream issue: https://github.com/supranational/blst/issues/21). - Updates `ring` to the latest satisfiable version, since an earlier version was causing issues with `cross`. - Off-topic, but adds extra message about Windows support as suggested by Discord user. ## Additional Info - ~~Blocked on #1495~~ - There are no tests in CI for this yet for a few reasons: - I'm hesitant to add more long-running tasks. - Short-term bitrot should be avoided since we'll use it each release. - In the long term I think it would be good to automate binary creation on a release. - I observed the binaries increase in size from 50mb to 52mb after these changes.	2020-08-11 05:16:30 +00:00
divma	1a67d15701	Mitigate too many outgoing connections (#1469 ) limit simultaneous outgoing connections attempts to a reasonable top as an extra layer of protection also shift the keep alive logic of the rpc handler to avoid needing to update it by hand. I think In rare cases this could make shutting down a connection a bit faster.	2020-08-11 02:16:31 +00:00
realbigsean	ec84183e05	Add graffiti cli flag to the validator client. (#1425 ) ## Issue Addressed #1419 ## Proposed Changes Creates a `--graffiti` cli flag in the validator client. If the flag is set, it overrides graffiti in the beacon node. ## Additional Info	2020-08-11 02:16:29 +00:00
divma	95b55d7170	Block error display (#1503 ) ## Issue Addressed #1486	2020-08-11 01:30:26 +00:00
Age Manning	134676fd6f	Version bump to v0.2.2 (#1496 ) Version bump to v0.2.2	2020-08-10 06:49:03 +00:00
Age Manning	cbfae87aa6	Upgrade logs (#1495 ) ## Issue Addressed #1483 ## Proposed Changes Upgrades the log to a critical if a listener fails. We are able to listen on many interfaces so a single instance is not critical. We should however gracefully shutdown the client if we have no listeners, although the client can still function solely on outgoing connections. For now a critical is raised and I leave #1494 for more sophisticated handling of this. This also updates discv5 to handle errors of binding to a UDP socket such that lighthouse is now able to handle them.	2020-08-10 05:19:51 +00:00
Age Manning	04e4389efe	Patch gossipsub (#1490 ) ## Issue Addressed Some nodes not following head, high CPU usage and HTTP API delays ## Proposed Changes Patches gossipsub. Gossipsub was using an `lru_time_cache` to check for duplicates. This contained an `O(N)` lookup for every gossipsub message to update the time cache. This was causing high cpu usage and blocking network threads. This PR introduces a custom cache without `O(N)` inserts. This also adds built in safety mechanisms to prevent gossipsub from excessively retrying connections upon failure. A maximum limit is set after which we disconnect from the node from too many failed substream connections.	2020-08-08 08:09:04 +00:00
Age Manning	08a31c5a1a	Disconnect peers (#1484 ) ## Issue Addressed Peers that connected after the peer limit may remain connected in some circumstances. This ensures peers not in the peer manager's list get disconnected. Further logging is also added to track this behaviour.	2020-08-08 06:08:44 +00:00
Age Manning	a1f9769040	Libp2p update (#1482 ) Updates to latest libp2p master. This now has native noise support. This PR - Removes secio support - Prioritises mplex over yamux	2020-08-08 02:17:32 +00:00
Paul Hauner	0b287f6ece	Push naive attestations into op pool (#1466 ) ## Issue Addressed NA ## Proposed Changes - When producing a block, go and ensure every attestation in the naive aggregation pool is included in the operation pool. This should help us increase the number of useful attestations in a block. - Lift the `RwLock`s inside `NaiveAggregationPool` up into a single high-level lock. There were race conditions in the existing setup and it was hard to reason about. ## Additional Info NA	2020-08-06 07:26:46 +00:00
divma	7d87e11e0f	Fix rpc coded response display (#1470 ) Prevent errors to be printed in debug mode	2020-08-06 04:29:23 +00:00
Pawan Dhananjay	983f768034	Remove ssz encoding support from rpc (#1457 ) ## Issue Addressed Partially resolves #1422 ## Proposed Changes Remove ssz encoding from req/resp in rpc.	2020-08-06 04:29:19 +00:00
divma	138c0cf7f0	Remove block clone (#1448 ) ## Issue Addressed #1028 A bit late, but I think if `BlockError` had a kind (the current `BlockError` minus everything on the variants that comes directly from the block) and the original block, more clones could be removed	2020-08-06 04:29:17 +00:00
Age Manning	09a615b2c0	Lighthouse crate v0.2.0 bump (#1450 ) ## Description This PR marks Lighthouse v0.2.0. This release marks the stable version of Lighthouse, ready for the approaching Medalla testnet.	2020-08-06 03:43:05 +00:00
divma	924ba66218	Update v0.12.2 gossip params (#1449 ) ## Issue Addressed #1422	2020-08-06 00:04:33 +00:00
Paul Hauner	5629126f45	Add reason to invalid attestation log (#1460 ) ## Issue Addressed NA ## Proposed Changes Adds an extra field to a debug log so we can see why an attestation was invalid. ## Additional Info NA	2020-08-05 01:49:52 +00:00
Paul Hauner	f26adc0a36	Lighthouse v0.2.0 (Medalla) (#1452 ) ## Issue Addressed NA ## Proposed Changes - Moves the git-based versioning we were doing into the `lighthouse_version` crate in `common`. - Removes the `beacon_node/version` crate, replacing it with `lighthouse_version`. - Bumps the version to `v0.2.0`. ## Additional Info There are now two types of version string: 1. `const VERSION: &str = Lighthouse/v0.2.0-1419501f2+` 1. `version_with_platform() = Lighthouse/v0.2.0-1419501f2+/x86_64-linux` (1) is handy cause it's a `const` and shorter. (2) has platform info so it's more useful. Note that the plus-sign (`+`) indicates the the git commit is dirty (it used to be `(modified)` but I had to shorten it to fit into graffiti). These version strings are now included on: - `lighthouse --version` - `lcli --version` - `curl localhost:5052/node/version` - p2p messages when we communicate our version You can update the version by changing this constant (version is not related to a `Cargo.toml`): `b9ad7102d5/common/lighthouse_version/src/lib.rs (L4-L15)`	2020-08-04 07:44:53 +00:00
divma	1bbecbcf26	Track gossip subscriptions as a metric (#1445 ) ## Issue Addressed #1399 ## Proposed Changes Set an Int gauge per topic and inc/dec when peers subscribe/unsubscribe	2020-08-04 04:18:10 +00:00
Age Manning	31707ccf45	Shift author to sigma prime on some crates (#1440 ) Shifts the author to sigma prime on some crates	2020-08-04 02:31:41 +00:00
Age Manning	1419501f2e	Update peerdb constants (#1444 ) Increases the cache for disconnected and banned peers.	2020-08-03 12:48:22 +00:00
Age Manning	37679b8898	Update score decay behaviour (#1442 )	2020-08-03 20:46:08 +10:00
Age Manning	f634f073a8	Correct issue with network message passing (#1439 ) ## Issue Addressed Sync was breaking occasionally. The root cause appears to be identify crashing as events we being sent to the protocol after nodes were banned. Have not been able to reproduce sync issues since this update. ## Proposed Changes Only send messages to sub-behaviour protocols if the peer manager thinks the peer is connected. All other messages are dropped.	2020-08-03 09:35:53 +00:00
Age Manning	3b5da8f35f	Gossipsub update (#1432 ) ## Issue Addressed The most recent gossipsub update had an issue where some privacy settings lead to not sending a sequence number with the message. Although Lighthouse treats these as valid (based on current configuration) other clients may not. This corrects gossipsub to send sequence numbers where expected and based on the configuration settings.	2020-08-02 13:19:56 +00:00
divma	4d77784bb8	Rate limit RPC requests (#1402 ) ## Issue Addressed #1056 ## Proposed Changes - Add a rate limiter to the RPC behaviour. This also means the rate limiting occurs just before the door to the application level, so the number of connections a peer opens does not affect this (this would happen in the future if put on the handler) - The algorithm used is the leaky bucket as a meter / token bucket implemented the GCRA way - Each protocol has its own limit. Due to the way the algorithm works, the "small" protocols have a hard limit, while bbrange and bbroot allow [burstiness](https://www.wikiwand.com/en/Burstiness). This is so that a peer can't request hundreds of individual requests expecting only one block in a short period of time, it also allows a peer to send two half size requests instead of one with max if they want to without getting limited, and.. it also allows a peer to request a batch of the maximum size and then send _appropriately spaced_ requests of really small sizes. From what I've seen in sync this is plausible when reaching the target slot. ## Additional Info Needs to be heavily tested	2020-07-31 05:47:09 +00:00
Age Manning	a37e75f44b	Downgrade sync and rpc warn logs (#1417 ) * Downgrade sycn and rpc warn logs * Correct warning	2020-07-30 13:52:44 +10:00
Age Manning	febb300a2d	Limit incoming connection requests (#1413 ) ## Issue Addressed Limits the number of incoming connections and adjusts the buffer sizes in libp2p	2020-07-29 06:39:30 +00:00
Paul Hauner	36d3d37cb4	Add support for multiple testnet flags (#1396 ) ## Issue Addressed NA ## Proposed Changes Allows for multiple "hardcoded" testnets. ## Additional Info This PR is incomplete. ## TODO - [x] Add flag to CLI, integrate with rest of Lighthouse. Co-authored-by: Pawan Dhananjay <pawandhananjay@gmail.com> Co-authored-by: Michael Sproul <michael@sigmaprime.io>	2020-07-29 06:39:29 +00:00
Age Manning	395d99ce03	Sync update (#1412 ) ## Issue Addressed Recurring sync loop and invalid batch downloading ## Proposed Changes Shifts the batches to include the first slot of each epoch. This ensures the finalized is always downloaded once a chain has completed syncing. Also add in logic to prevent re-dialing disconnected peers. Non-performant peers get disconnected during sync, this prevents re-connection to these during sync. ## Additional Info N/A	2020-07-29 05:25:10 +00:00
Age Manning	ba0f3daf9d	Gossipsub update (#1400 ) ## Issue Addressed N/A ## Proposed Changes This provides a number of corrections and improvements to gossipsub. Specifically - Enables options for greater privacy around the message author - Provides greater flexibility on message validation - Prevents unvalidated messages from being gossiped - Shifts the duplicate cache to a time-based cache inside gossipsub - Updates the message-id to handle bytes - Bug fixes related to mesh maintenance and topic subscription. This should improve our attestation inclusion rate.	2020-07-29 03:40:22 +00:00
realbigsean	09b40b7a5e	Discover query grouping (#1364 ) ## Issue Addressed #1281 ## Proposed Changes Groups queries for specific subnets into groups of up to 3. ## Additional Info	2020-07-29 02:43:50 +00:00
divma	9ae9df806c	Fix clippy lints rpc (#1401 ) ## Issue Addressed #1388 partially (eth2_libp2p & network) ## Proposed Changes TLDR at the end - Complex types are 3 on the handlers/Behaviours but the types are `Poll<ComplexType>` where `ComplexType` comes from the traits of libp2p. Those, I don't thing are worth an alias. A couple more were from using tokio combinators and were removed writing things the async way and using [`BoxFuture`](https://docs.rs/futures/0.3.5/futures/future/type.BoxFuture.html) - The cognitive complexity.. I tried to address those before (they come from the poll functions too) and tbh they are cognitively simpler to understand the way they are now. Moving separate parts to functions doesn't add much since that code is not repeated and they all do early returns. If moved those returns would now need to be wrapped in an Option, probably, and checked to be returned again. I would leave them like that but that's just preference. - Too many arguments: They are not easily put together in a wrapping struct since the parameters don't relate semantically (Ex: fn new with a log, a reference to the chain, a peer, etc) but some may differ. - Needless returns were indeed needless ## Additional Info TLDR: removed needless return, used BoxFuture and async, left the rest untouched since those lgtm	2020-07-28 01:39:42 +00:00
Paul Hauner	0b5be9b2c0	Add info about peer scoring to block/attestation errors (#1393 ) * Add comments to `BlockError` * Add `AttnError` comments * Clean up	2020-07-26 13:16:49 +10:00

1 2 3 4 5 ...

1282 Commits