We don't have a UDP which specifies any messages that will be 4KB. Aside from being implemented for months and a necessity for encryption and piggy-backing packets, 1280bytes is ideal, and, means this TODO can be completed!
Why 1280 bytes?
* It's less than the default MTU for most WAN/LAN networks. That means fewer fragmented datagrams (esp on well-connected networks).
* Fragmented datagrams and dropped packets suck and add latency while OS waits for a dropped fragment to never arrive (blocking readLoop())
* Most of our packets are < 1280 bytes.
* 1280 bytes is minimum datagram size and MTU for IPv6 -- on IPv6, a datagram < 1280bytes will *never* be fragmented.
UDP datagrams are dropped. A lot! And fragmented datagrams are worse. If a datagram has a 30% chance of being dropped, then a fragmented datagram has a 60% chance of being dropped. More importantly, we have signed packets and can't do anything with a packet unless we receive the entire datagram because the signature can't be verified. The same is true when we have encrypted packets.
So the solution here to picking an ideal buffer size for receiving datagrams is a number under 1400bytes. And the lower-bound value for IPv6 of 1280 bytes make's it a non-decision. On IPv4 most ISPs and 3g/4g/let networks have an MTU just over 1400 -- and *never* over 1500. Never -- that means packets over 1500 (in reality: ~1450) bytes are fragmented. And probably dropped a lot.
Just to prove the point, here are pings sending non-fragmented packets over wifi/ISP, and a second set of pings via cell-phone tethering. It's important to note that, if *any* router between my system and the EC2 node has a lower MTU, the message would not go through:
On wifi w/normal ISP:
localhost:Debug $ ping -D -s 1450 52.6.250.242
PING 52.6.250.242 (52.6.250.242): 1450 data bytes
1458 bytes from 52.6.250.242: icmp_seq=0 ttl=42 time=104.831 ms
1458 bytes from 52.6.250.242: icmp_seq=1 ttl=42 time=119.004 ms
^C
--- 52.6.250.242 ping statistics ---
2 packets transmitted, 2 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 104.831/111.918/119.004/7.087 ms
localhost:Debug $ ping -D -s 1480 52.6.250.242
PING 52.6.250.242 (52.6.250.242): 1480 data bytes
ping: sendto: Message too long
ping: sendto: Message too long
Request timeout for icmp_seq 0
ping: sendto: Message too long
Request timeout for icmp_seq 1
Tethering to O2:
localhost:Debug $ ping -D -s 1480 52.6.250.242
PING 52.6.250.242 (52.6.250.242): 1480 data bytes
ping: sendto: Message too long
ping: sendto: Message too long
Request timeout for icmp_seq 0
^C
--- 52.6.250.242 ping statistics ---
2 packets transmitted, 0 packets received, 100.0% packet loss
localhost:Debug $ ping -D -s 1450 52.6.250.242
PING 52.6.250.242 (52.6.250.242): 1450 data bytes
1458 bytes from 52.6.250.242: icmp_seq=0 ttl=42 time=107.844 ms
1458 bytes from 52.6.250.242: icmp_seq=1 ttl=42 time=105.127 ms
1458 bytes from 52.6.250.242: icmp_seq=2 ttl=42 time=120.483 ms
1458 bytes from 52.6.250.242: icmp_seq=3 ttl=42 time=102.136 ms
With the introduction of static/trusted nodes, the peer count
can go above MaxPeers. Update the capacity check to handle this.
While here, decouple the trusted nodes check from the handshake
by passing a function instead.
The previous metric was pubkey1^pubkey2, as specified in the Kademlia
paper. We missed that EC public keys are not uniformly distributed.
Using the hash of the public keys addresses that. It also makes it
a bit harder to generate node IDs that are close to a particular node.
This commit changes the discovery protocol to use the new "v4" endpoint
format, which allows for separate UDP and TCP ports and makes it
possible to discover the UDP address after NAT.
Peer.readLoop will only terminate if the connection is closed. Fix the
hang by closing the connection before waiting for readLoop to terminate.
This also removes the british disconnect procedure where we're waiting
for the remote end to close the connection. I have confirmed with
@subtly that cpp-ethereum doesn't adhere to it either.
This is supposed to apply some back pressure so Server is not accepting
more connections than it can actually handle. The current limit is 50.
This doesn't really need to be configurable, but we'll see how it
behaves in our test nodes and adjust accordingly.
As of this commit, p2p will disconnect nodes directly after the
encryption handshake if too many peer connections are active.
Errors in the protocol handshake packet are now handled more politely
by sending a disconnect packet before closing the connection.
There were multiple synchronization issues in the disconnect handling,
all caused by the odd special-casing of Peer.readLoop errors. Remove the
special handling of read errors and make readLoop part of the Peer
WaitGroup.
Thanks to @Gustav-Simonsson for pointing at arrows in a diagram
and playing rubber-duck.
This a fix for an attack vector where the discovery protocol could be
used to amplify traffic in a DDOS attack. A malicious actor would send a
findnode request with the IP address and UDP port of the target as the
source address. The recipient of the findnode packet would then send a
neighbors packet (which is 16x the size of findnode) to the victim.
Our solution is to require a 'bond' with the sender of findnode. If no
bond exists, the findnode packet is not processed. A bond between nodes
α and β is created when α replies to a ping from β.
This (initial) version of the bonding implementation might still be
vulnerable against replay attacks during the expiration time window.
We will add stricter source address validation later.
This is better because protocols might not actually read the payload for
some errors (msg too big, etc.) which can be a pain to test with the old
behaviour.
Message encoding functions have been renamed to catch any uses.
The switch to the new encoder can cause subtle incompatibilities.
If there are any users outside of our tree, they will at least be
alerted that there was a change.
NewMsg no longer exists. The replacements for EncodeMsg are called
Send and SendItems.
With RLPx frames, the message code is contained in the
frame and is no longer part of the encoded data.
EncodeMsg, Msg.Decode have been updated to match.
Code that decodes RLP directly from Msg.Payload will need
to change.
This mostly changes how information is passed around.
Instead of using many function parameters and return values,
put the entire state in a struct and pass that.
This also adds back derivation of ecdhe-shared-secret. I deleted
it by accident in a previous refactoring.
The diff is a bit bigger than expected because the protocol handshake
logic has moved out of Peer. This is necessary because the protocol
handshake will have custom framing in the final protocol.
Range expressions capture the length of the slice once before the first
iteration. A range expression cannot be used here since the loop
modifies the slice variable (including length changes).
There are now two deadlines, frameReadTimeout and payloadReadTimeout.
The frame timeout is longer and allows for connections that are idle.
The message timeout is still short and ensures that we don't get stuck
in the middle of a message.
I have verified that UPnP and NAT-PMP work against an older version of
the MiniUPnP daemon running on pfSense. This code is kind of hard to
test automatically.
Overview of changes:
- ClientIdentity has been removed, use discover.NodeID
- Server now requires a private key to be set (instead of public key)
- Server performs the encryption handshake before launching Peer
- Dial logic takes peers from discover table
- Encryption handshake code has been cleaned up a bit
- baseProtocol is gone because we don't exchange peers anymore
- Some parts of baseProtocol have moved into Peer instead
- add const length params for handshake messages
- add length check to fail early
- add debug logs to help interop testing (!ABSOLUTELY SHOULD BE DELETED LATER)
- wrap connection read/writes in error check
- add cryptoReady channel in peer to signal when secure session setup is finished
- wait for cryptoReady or timeout in TestPeersHandshake
- set proper public key serialisation length in pubLen = 64
- reset all sizes and offsets
- rename from DER to S (we are not using DER encoding)
- add remoteInitRandomPubKey as return value to respondToHandshake
- add ImportPublicKey with error return to read both EC golang.elliptic style 65 byte encoding and 64 byte one
- add ExportPublicKey falling back to go-ethereum/crypto.FromECDSAPub() chopping off the first byte
- add Import - Export tests
- all tests pass
- abstract the entire handshake logic in cryptoId.Run() taking session-relevant parameters
- changes in peer to accomodate how the encryption layer would be switched on
- modify arguments of handshake components
- fixed test getting the wrong pubkey but it till crashes on DH in newSession()
- add session token check and fallback to shared secret in responder call too
- use explicit length for the types of new messages
- fix typo resp[resLen-1] = tokenFlag
- correct sizes for the blocks : sec signature 65, ecies sklen 16, keylength 32
- added allocation to Xor (should be optimized later)
- no pubkey reader needed, just do with copy
- restructuring now into INITIATE, RESPOND, COMPLETE -> newSession initialises the encryption/authentication layer
- crypto identity can be part of client identity, some initialisation when server created