This commit moderately refactors the ranged export code. It addresses several
problems:
* Code does not finish cleanly and things hang on ctrl-c
* Same block is read multiple times in a row (artificially increasing cached
blockstore metrics to 50%)
* It is unclear whether there are additional races (a single worker quits
when reaching height 0)
* CARs produced have duplicated blocks (~400k for an 80M-blocks CAR or
so). Some blocks appear up to 5 times.
* Using pointers for tasks where it is not necessary.
The changes:
* Use a FIFO instead of stack: simpler implementation as its own type. This
has not proven to be much more memory-friendly, but it has not made things
worse either.
* We avoid a probably not small amount of allocations by not using
unnecessary pointers.
* Fix duplicated blocks by atomically checking+adding to CID set.
* Context-termination now works correctly. Worker lifetime is correctly tracked and all channels
are closed, avoiding any memory leaks and deadlocks.
* We ensure all work is finished before finishing, something that might have
been broken in some edge cases previously. In practice, we would not have
seen this except perhaps in very early snapshots close to genesis.
Initial testing shows the code is currently about 5% faster. Resulting
snapshots do not have duplicates so they are a bit smaller. We have manually
verified that no CID is lost versus previous results, with both old and recent
snapshots.