We don't have a UDP which specifies any messages that will be 4KB. Aside from being implemented for months and a necessity for encryption and piggy-backing packets, 1280bytes is ideal, and, means this TODO can be completed!
Why 1280 bytes?
* It's less than the default MTU for most WAN/LAN networks. That means fewer fragmented datagrams (esp on well-connected networks).
* Fragmented datagrams and dropped packets suck and add latency while OS waits for a dropped fragment to never arrive (blocking readLoop())
* Most of our packets are < 1280 bytes.
* 1280 bytes is minimum datagram size and MTU for IPv6 -- on IPv6, a datagram < 1280bytes will *never* be fragmented.
UDP datagrams are dropped. A lot! And fragmented datagrams are worse. If a datagram has a 30% chance of being dropped, then a fragmented datagram has a 60% chance of being dropped. More importantly, we have signed packets and can't do anything with a packet unless we receive the entire datagram because the signature can't be verified. The same is true when we have encrypted packets.
So the solution here to picking an ideal buffer size for receiving datagrams is a number under 1400bytes. And the lower-bound value for IPv6 of 1280 bytes make's it a non-decision. On IPv4 most ISPs and 3g/4g/let networks have an MTU just over 1400 -- and *never* over 1500. Never -- that means packets over 1500 (in reality: ~1450) bytes are fragmented. And probably dropped a lot.
Just to prove the point, here are pings sending non-fragmented packets over wifi/ISP, and a second set of pings via cell-phone tethering. It's important to note that, if *any* router between my system and the EC2 node has a lower MTU, the message would not go through:
On wifi w/normal ISP:
localhost:Debug $ ping -D -s 1450 52.6.250.242
PING 52.6.250.242 (52.6.250.242): 1450 data bytes
1458 bytes from 52.6.250.242: icmp_seq=0 ttl=42 time=104.831 ms
1458 bytes from 52.6.250.242: icmp_seq=1 ttl=42 time=119.004 ms
^C
--- 52.6.250.242 ping statistics ---
2 packets transmitted, 2 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 104.831/111.918/119.004/7.087 ms
localhost:Debug $ ping -D -s 1480 52.6.250.242
PING 52.6.250.242 (52.6.250.242): 1480 data bytes
ping: sendto: Message too long
ping: sendto: Message too long
Request timeout for icmp_seq 0
ping: sendto: Message too long
Request timeout for icmp_seq 1
Tethering to O2:
localhost:Debug $ ping -D -s 1480 52.6.250.242
PING 52.6.250.242 (52.6.250.242): 1480 data bytes
ping: sendto: Message too long
ping: sendto: Message too long
Request timeout for icmp_seq 0
^C
--- 52.6.250.242 ping statistics ---
2 packets transmitted, 0 packets received, 100.0% packet loss
localhost:Debug $ ping -D -s 1450 52.6.250.242
PING 52.6.250.242 (52.6.250.242): 1450 data bytes
1458 bytes from 52.6.250.242: icmp_seq=0 ttl=42 time=107.844 ms
1458 bytes from 52.6.250.242: icmp_seq=1 ttl=42 time=105.127 ms
1458 bytes from 52.6.250.242: icmp_seq=2 ttl=42 time=120.483 ms
1458 bytes from 52.6.250.242: icmp_seq=3 ttl=42 time=102.136 ms
With the introduction of static/trusted nodes, the peer count
can go above MaxPeers. Update the capacity check to handle this.
While here, decouple the trusted nodes check from the handshake
by passing a function instead.
The previous metric was pubkey1^pubkey2, as specified in the Kademlia
paper. We missed that EC public keys are not uniformly distributed.
Using the hash of the public keys addresses that. It also makes it
a bit harder to generate node IDs that are close to a particular node.
This commit changes the discovery protocol to use the new "v4" endpoint
format, which allows for separate UDP and TCP ports and makes it
possible to discover the UDP address after NAT.
Peer.readLoop will only terminate if the connection is closed. Fix the
hang by closing the connection before waiting for readLoop to terminate.
This also removes the british disconnect procedure where we're waiting
for the remote end to close the connection. I have confirmed with
@subtly that cpp-ethereum doesn't adhere to it either.
This is supposed to apply some back pressure so Server is not accepting
more connections than it can actually handle. The current limit is 50.
This doesn't really need to be configurable, but we'll see how it
behaves in our test nodes and adjust accordingly.
As of this commit, p2p will disconnect nodes directly after the
encryption handshake if too many peer connections are active.
Errors in the protocol handshake packet are now handled more politely
by sending a disconnect packet before closing the connection.
There were multiple synchronization issues in the disconnect handling,
all caused by the odd special-casing of Peer.readLoop errors. Remove the
special handling of read errors and make readLoop part of the Peer
WaitGroup.
Thanks to @Gustav-Simonsson for pointing at arrows in a diagram
and playing rubber-duck.
This a fix for an attack vector where the discovery protocol could be
used to amplify traffic in a DDOS attack. A malicious actor would send a
findnode request with the IP address and UDP port of the target as the
source address. The recipient of the findnode packet would then send a
neighbors packet (which is 16x the size of findnode) to the victim.
Our solution is to require a 'bond' with the sender of findnode. If no
bond exists, the findnode packet is not processed. A bond between nodes
α and β is created when α replies to a ping from β.
This (initial) version of the bonding implementation might still be
vulnerable against replay attacks during the expiration time window.
We will add stricter source address validation later.
This is better because protocols might not actually read the payload for
some errors (msg too big, etc.) which can be a pain to test with the old
behaviour.
Message encoding functions have been renamed to catch any uses.
The switch to the new encoder can cause subtle incompatibilities.
If there are any users outside of our tree, they will at least be
alerted that there was a change.
NewMsg no longer exists. The replacements for EncodeMsg are called
Send and SendItems.
With RLPx frames, the message code is contained in the
frame and is no longer part of the encoded data.
EncodeMsg, Msg.Decode have been updated to match.
Code that decodes RLP directly from Msg.Payload will need
to change.
This mostly changes how information is passed around.
Instead of using many function parameters and return values,
put the entire state in a struct and pass that.
This also adds back derivation of ecdhe-shared-secret. I deleted
it by accident in a previous refactoring.
The diff is a bit bigger than expected because the protocol handshake
logic has moved out of Peer. This is necessary because the protocol
handshake will have custom framing in the final protocol.