vyzo
1b77361301
add option for hotstore message retention
2021-07-17 08:35:35 +03:00
vyzo
e003203bea
implement exposed splitstore
2021-07-15 13:12:10 +03:00
vyzo
5a23f64b3b
code reorg: break splitstore.go into smaller logical units
2021-07-14 13:11:15 -07:00
vyzo
3f3a12b75c
remove BlockstoreMover interface
...
we decided it's premature
2021-07-14 22:59:53 +03:00
vyzo
023146803d
use Broadcast for view barrier
2021-07-14 22:59:53 +03:00
vyzo
3d77ae1f4d
make trackTxnRefMany consistent with trackTxnRef
2021-07-14 22:59:53 +03:00
vyzo
6f126c80bf
remove redundant log, more descriptive error message for closing condition
2021-07-14 22:59:53 +03:00
vyzo
ff093fae00
use a missing compactionIndex as an indicator for warmup
...
so that splitstore v0 nodes upgrading will get a fresh warmup.
2021-07-14 22:59:53 +03:00
vyzo
669b47cfc9
do moving gc for hotstore every 20 compactions
...
that's about once a week
2021-07-14 22:59:53 +03:00
vyzo
818b8de182
keep track of the compaction serial (index)
...
it is useful so that:
- we only do slow (but very effective) moving gc every 10 compactions
- we can detect a splitstore v0 upgrade and re-warm up
2021-07-14 22:59:53 +03:00
vyzo
c93328b036
use the new traits for hotstore gc
2021-07-14 22:59:52 +03:00
vyzo
35180b4761
merge Compact and CollectGarbage in badger
2021-07-14 22:59:52 +03:00
vyzo
dc81c0e6a2
add blockstore traits related to gc
2021-07-14 22:59:52 +03:00
vyzo
af399529ec
finetune view waiting
2021-07-13 09:06:40 +03:00
vyzo
257423e917
fix view waiting issues with the WaitGroup
...
We can add after Wait is called, which is problematic with WaitGroups.
This instead uses a mx/cond combo and waits while the count is > 0.
The only downside is that we might needlessly wait for (a bunch) of views
that started while the txn is active, but we can live with that.
2021-07-13 09:01:50 +03:00
Steven Allen
04abd190ab
nit: remove useless goto
...
Because stebalien has allergies.
2021-07-12 21:46:50 -07:00
vyzo
60212c86cb
put a mutex around HeadChange
2021-07-13 03:14:13 +03:00
vyzo
759594d01c
always return the waitgroup in protectView
...
so that we preclude the following scenario:
Start compaction.
Start view.
Finish compaction.
Start compaction.
which would not wait for the view to complete.
2021-07-13 03:11:40 +03:00
vyzo
df9670c58d
fix lint
2021-07-10 16:38:40 +03:00
vyzo
0c5e336ff1
address review comments
2021-07-10 16:30:27 +03:00
vyzo
870a47f55d
handle id cids in internal versions of view/get
2021-07-09 20:07:17 +03:00
vyzo
f5ae10e3d1
refactor debug log code to eliminate duplication
2021-07-09 19:53:51 +03:00
vyzo
41290383e2
fix test
2021-07-09 19:24:44 +03:00
vyzo
b9a5ea8f7b
update wording around discard store
2021-07-09 19:23:55 +03:00
vyzo
c0a1cfffa1
rename noopstore to discardstore
2021-07-09 19:19:37 +03:00
vyzo
18161fee38
remove unused lookback constructs
2021-07-09 19:12:58 +03:00
vyzo
095d7427ba
make view protection optimistic again, as there is a race window
2021-07-09 15:41:10 +03:00
vyzo
da0feb3fa4
dont mark references inline; instad rely on the main compaction thread to do concurrent marking
...
The problem is that it is possible that an inline marking might take minutes for some objects
(infrequent, but still possible for state roots and prohibitive if that's a block validation).
So we simply track references continuously and rely on the main compaction thread to trigger
concurrent marking for all references at opportune moments.
Assumption: we can mark references faster than they are created during purge or else we'll
never purge anything.
2021-07-09 15:10:02 +03:00
vyzo
acc4c374ef
properly handle protecting long-running views
2021-07-09 13:20:18 +03:00
vyzo
565faff754
fix test
2021-07-09 11:38:09 +03:00
vyzo
4f89d260b0
kill isOldBlockHeader; it's dangerous.
2021-07-09 11:35:10 +03:00
vyzo
de5e21bf1a
correctly handle identity cids
2021-07-09 11:31:04 +03:00
vyzo
909f7039d4
make badger Close-safe
2021-07-09 09:54:12 +03:00
vyzo
abdf4a161a
explicitly switch marksets for concurrent marking
...
this has very noticeable impact in initial marking time; it also allows us
to get rid of the confusing ts monikers.
2021-07-09 04:26:36 +03:00
vyzo
b6611125b6
add environment variables to turn on the debug log without recompiling
2021-07-08 21:30:39 +03:00
vyzo
60dd97c7fc
fix potential deadlock in View
...
As pointed out by magik, it is possible to deadlock if the view callback performs
a blockstore operation while a Lock is pending.
This fixes the issue by optimistically tracking the reference before actually calling
the underlying View and limiting the scope of the lock.
2021-07-08 21:18:59 +03:00
vyzo
c0537848b3
fix typo
...
Co-authored-by: Łukasz Magiera <magik6k@users.noreply.github.com>
2021-07-08 17:54:16 +03:00
vyzo
fa30ac8c5d
fix typo
...
Co-authored-by: Łukasz Magiera <magik6k@users.noreply.github.com>
2021-07-08 17:53:59 +03:00
vyzo
00d7772f57
move check for closure in walkChain
...
so that we don't do it too often and also cover warmup.
2021-07-08 13:12:19 +03:00
vyzo
5cf1e09e81
README: add instructions for how to enable
2021-07-08 13:00:31 +03:00
vyzo
9aa4f3b3b2
add README for documentation
2021-07-08 12:32:41 +03:00
vyzo
e6eacbdd56
use RW mutexes in marksets
2021-07-08 10:20:29 +03:00
vyzo
48f13a43b7
intelligently close marksets and signal errors in concurrent operations
2021-07-08 10:18:43 +03:00
vyzo
f5c45bd517
check the closing state variable often
...
so that we have a reasonably quick graceful shutdown
2021-07-08 10:13:44 +03:00
vyzo
4f808367f8
fix lint
2021-07-07 21:32:58 +03:00
vyzo
fee50b13a2
check the closing state on each batch during the purge.
2021-07-07 21:32:05 +03:00
vyzo
c6421f8a75
don't nil the mark sets on close, it's dangerous.
...
a concurrent marking can panic.
2021-07-07 21:27:36 +03:00
vyzo
aec2ba2c82
nil map/bf on markset close
2021-07-07 16:46:14 +03:00
vyzo
451ddf50ab
RIP bbolt-backed markset
2021-07-07 16:39:37 +03:00
vyzo
9dbb2e0abd
don't leak tracking errors through the API
2021-07-07 16:34:02 +03:00
vyzo
83c30dc4c0
protect assignment of warmup epoch with the mutex
2021-07-07 11:31:27 +03:00
vyzo
6cc2112749
remove the curTs state variable; we don't need it
2021-07-07 09:55:25 +03:00
vyzo
05dbbe9681
rename som Txn methods for better readability
2021-07-07 09:52:31 +03:00
vyzo
90da6227b3
transactional protect incoming tipsets
2021-07-07 02:11:37 +03:00
vyzo
0e2af11f6a
prepare the transaction before launching the compaction goroutine
2021-07-07 01:39:58 +03:00
vyzo
f2f4af669d
clean up: simplify debug log, get rid of ugly debug log
2021-07-06 17:13:38 +03:00
vyzo
c1c25868cc
improve comments
2021-07-06 15:09:04 +03:00
vyzo
fdff1bebc9
move map markset implementation to its own file
2021-07-06 14:44:40 +03:00
vyzo
5c514504f7
remove unused GetGenesis method from ChainAccessor interface
2021-07-06 14:41:41 +03:00
vyzo
dc8139a1d2
add some comments for debug only code
2021-07-06 13:23:12 +03:00
vyzo
c4ae3e0c3d
minor tweak
2021-07-06 09:17:35 +03:00
vyzo
169ab262f5
really optimize computing object weights
...
sort is still taking a long time, this should be as fast as it gets.
2021-07-06 09:02:44 +03:00
vyzo
55a9e0ccd1
short-circuit block headers on sort weight computation
2021-07-06 08:22:43 +03:00
vyzo
bf7aeb3167
optimize sort a tad
...
it's taking a long time to compute weights...
2021-07-06 08:10:57 +03:00
vyzo
0659235e21
cache cid strings in sort
...
so as to avoid making a gazillion of strings
2021-07-06 07:26:13 +03:00
vyzo
525a2c71dd
use hashes as keys in weight map to avoid duplicate work
...
otherwise the root object will be raw, but internal references will be dag; duplicate work.
2021-07-06 01:27:56 +03:00
vyzo
c6ad8fdaed
use walkObjectRaw for computing object weights
...
cids that come out of the hotstore with ForEach are raw.
2021-07-06 01:08:44 +03:00
vyzo
2cbd3faf5a
make sure to nil everything in txnEndProtect
2021-07-05 23:56:31 +03:00
vyzo
51ab891d5c
quiet linter
...
it's a false positive, function doesn't escape.
2021-07-05 23:53:45 +03:00
vyzo
bd436ab9de
make endTxnProtect idempotent
2021-07-05 23:51:10 +03:00
vyzo
e859942fa4
code cleanup: refactor txn state code into their own functions
2021-07-05 23:31:37 +03:00
vyzo
3477d265c6
unify the two marksets
...
really, it's concurrent marking and there is no reason to have two different marksets
2021-07-05 20:10:47 +03:00
vyzo
73d07999bf
dont needlessly wait 1 min in first retry for missing refs
2021-07-05 18:24:48 +03:00
vyzo
af8cf712be
handle all missing refs together
...
so that we wait 6min at most, not 12.
2021-07-05 18:16:54 +03:00
vyzo
5a099b7d05
more commentary on the missing refs situation
2021-07-05 16:12:17 +03:00
vyzo
59639a0788
reinstate some better code for handling missing references.
2021-07-05 16:08:08 +03:00
vyzo
fa195bede2
get rid of ugly missing reference handling code
...
those missing objects don't seem to ever get there, are they from an abandoned fork?
2021-07-05 14:29:55 +03:00
vyzo
59936ef468
fix log
2021-07-05 13:30:31 +03:00
vyzo
0b7153be86
use internal version of has for occurs checks
2021-07-05 12:41:11 +03:00
vyzo
d8b8d75e0f
readd minute delay before trying for missing objects
2021-07-05 12:38:09 +03:00
vyzo
d7709deb2b
reduce memory pressure from marksets when the size is decreased
2021-07-05 11:51:22 +03:00
vyzo
3ec834b2e3
improve logs and error messages
2021-07-05 11:41:09 +03:00
vyzo
918a7ec749
a bit more fil commitment short-circuiting
2021-07-05 11:38:53 +03:00
vyzo
2ea2abc07d
short-circuit fil commitments
...
they don't make it to the blockstore anyway
2021-07-05 11:32:52 +03:00
vyzo
839f7bd2b5
only occur check for DAGs
2021-07-05 11:11:08 +03:00
vyzo
c81ae5fc20
add some comments about the missing business and anothre log
2021-07-05 10:42:14 +03:00
vyzo
4c41f52828
add warning for missing objects for marking for debug purposes
2021-07-05 10:35:04 +03:00
vyzo
3597192d58
remove the sleeps and busy loop more times when waiting for missing objects
2021-07-05 10:31:47 +03:00
vyzo
1726eb993c
deal with incomplete objects that need to be marked and protected
...
seems that something is writing DAGs before its consituents, which causes problems.
2021-07-05 10:22:52 +03:00
vyzo
db53859e7a
reduce CompactionThreshold to 5 finalities
...
so that we run compaction every finality, once we've first compacted
2021-07-04 22:12:51 +03:00
vyzo
b08e0b7102
fix lint
2021-07-04 21:24:15 +03:00
vyzo
94efae419e
reduce length of critical section
...
Just the purge; the rest is not critical -- e.g. it's ok if we do some duplicate copies
to the coldstore, we'll have gc soon.
2021-07-04 21:21:53 +03:00
vyzo
f33d4e79aa
simplify transactional protection logic
...
Now that we delete objects heaviest first, we don't have to do deep walk and rescan gymnastics.
2021-07-04 20:49:39 +03:00
vyzo
40c271cda1
sort cold objects before deleting
...
so that we can't shoot ourselves in the foot by deleting the constituents of a DAG while it is
still in the hotstore.
2021-07-04 20:17:07 +03:00
vyzo
13d612f72f
smarter trackTxnRefMany
2021-07-04 19:33:49 +03:00
vyzo
f124389b66
recursively protect all references
2021-07-04 19:21:00 +03:00
vyzo
4d286da593
fix error message
2021-07-04 18:58:39 +03:00
vyzo
680af8eb09
use deep object walking for more robust handling of transactional references
2021-07-04 18:38:28 +03:00
vyzo
1f02428225
fix lint
2021-07-04 18:38:28 +03:00
vyzo
2c7a89a1db
short-circuit rescanning on block headers
2021-07-04 18:38:28 +03:00
vyzo
028a5c4942
make test do something useful again
2021-07-04 18:38:28 +03:00
vyzo
8e56fffb33
walkChain should visit the genesis state root
2021-07-04 18:38:28 +03:00
vyzo
95c3aaec9a
fix test
2021-07-04 18:38:28 +03:00
vyzo
190cb18ab0
housekeeping
...
- remove defunct tracking store implementations
- update splitstore node config
- use mark set type config option (defaulting to mapts); a memory constrained node
may want to use an on-disk one
2021-07-04 18:38:28 +03:00
vyzo
19d1b1f532
deal with partially written objects
2021-07-04 18:38:28 +03:00
vyzo
0a1d7b3732
fix log
2021-07-04 18:38:28 +03:00
vyzo
08cad30be2
reuse key buffer in badger ForEachKey
...
cid copies the bytes so it's safe
2021-07-04 18:38:28 +03:00
vyzo
eafffc1634
more efficient trackTxnRefMany
2021-07-04 18:38:28 +03:00
vyzo
36f93649ef
fix panic from concurrent map writes in txnRefs
2021-07-04 18:38:28 +03:00
vyzo
6fa2cd232d
simplify compaction model
2021-07-04 18:38:28 +03:00
vyzo
1f2b604c07
RIP tracking store
2021-07-04 18:38:28 +03:00
vyzo
d476a3db2c
BlockstoreIterator trait with implementation for badger
2021-07-04 18:38:28 +03:00
vyzo
68a83500bc
fix bug that turned candidate filtering to dead code
2021-07-04 18:38:28 +03:00
vyzo
00fcf6dd72
add staging cache to bolt tracking store
2021-07-04 18:38:28 +03:00
vyzo
642f0e4740
deal with memory pressure, don't walk under the boundary
2021-07-04 18:38:28 +03:00
vyzo
c5cf8e226b
remove unnecessary code
2021-07-04 18:38:28 +03:00
vyzo
d79e4da7aa
more accurate stats about mark set updates
2021-07-04 18:38:28 +03:00
vyzo
6f58fdcb22
remove vm copy context detection hack
...
stack tracing is slow.
2021-07-04 18:38:28 +03:00
vyzo
2b03316cd9
fix log message
2021-07-04 18:38:28 +03:00
vyzo
184d3802b6
remove dead code
2021-07-04 18:38:28 +03:00
vyzo
228a435ba7
rework tracking logic; do it lazily and far more efficiently
2021-07-04 18:38:28 +03:00
vyzo
9d6cabd18a
if it's not a dag, it's not a block
2021-07-04 18:38:28 +03:00
vyzo
8157f889ce
short-circuit marking walks when encountering a block and more efficient walking
2021-07-04 18:38:28 +03:00
vyzo
736d6a3c19
only treat Has as an implicit write within vm.Copy context
2021-07-04 18:38:28 +03:00
vyzo
39723bbe60
use a single map for tracking pending writes, properly track implicits
2021-07-04 18:38:28 +03:00
vyzo
5834231e58
create the transactional protect filter before walking
2021-07-04 18:38:28 +03:00
vyzo
e4bb4be855
fix some residual purge races
2021-07-04 18:38:28 +03:00
vyzo
68bc5d2291
skip moving cold blocks when running with a noop coldstore
...
it is a noop but it still takes (a lot of) time because it has to read all the cold blocks.
2021-07-04 18:38:28 +03:00
vyzo
b87295db93
bubble up dependent txn ref errors
...
This cause Has to return false if it fails to traverse/protect all links, which would cause
the vm to recompute.
2021-07-04 18:38:28 +03:00
vyzo
637fbf6c5b
fix faulty if/else logic for implicit txn protection
2021-07-04 18:38:28 +03:00
vyzo
9d6bcd7705
avoid clown shoes: only walk links for tracking in implicit writes/refs
2021-07-04 18:38:28 +03:00
vyzo
484dfaebce
reused cidset across all walks when flushing pending writes
2021-07-04 18:38:28 +03:00
vyzo
1d41e1544a
optimize transitive write tracking a bit
2021-07-04 18:38:28 +03:00
vyzo
da00fc66ee
downgrade a couple of logs to warnings
2021-07-04 18:38:28 +03:00
vyzo
4071488ef2
first write, then track
2021-07-04 18:38:28 +03:00
vyzo
bd92c230da
refactor txn reference tracking, do deep marking of DAGs
2021-07-04 18:38:28 +03:00
vyzo
a98a062347
do the dag walk for deep write tracking during flush
...
avoid crawling everything to a halt
2021-07-04 18:38:28 +03:00
vyzo
13a674330f
add pending write check before tracking the object in Has
2021-07-04 18:38:28 +03:00
vyzo
982867317e
transitively track dags from implicit writes in Has
2021-07-04 18:38:28 +03:00
vyzo
4de0cd9fcb
move write log back to flush so that we don't crawl to a halt
2021-07-04 18:38:28 +03:00
vyzo
b3ddaa5f02
fix panic at startup
...
genesis is written (!) before starting the splitstore, so curTs is nil
2021-07-04 18:38:28 +03:00
vyzo
2faa4aa993
debug log writes at track so that we get correct stack traces
2021-07-04 18:38:28 +03:00
vyzo
aeaa59d4b5
move comments about tracking perf issues into a more pertinent place
2021-07-04 18:38:28 +03:00
vyzo
3e8e9273ca
track all writes using async batching, not just implicit ones
2021-07-04 18:38:28 +03:00
vyzo
d0bfe421b5
flush implicit writes at the right time before starting compaction to avoid races
2021-07-04 18:38:28 +03:00
vyzo
7f473f56eb
flush implicit writes before starting compaction
2021-07-04 18:38:28 +03:00
vyzo
a29947d47c
flush implicit writes in all paths in updateWriteEpoch
2021-07-04 18:38:28 +03:00
vyzo
be6cc2c3e6
batch implicit write tracking
...
bolt performance leaves something to be desired; doing a single Put takes 10ms, about the same time
as batching thousands of them.
2021-07-04 18:38:28 +03:00
vyzo
e472cacb3e
add missing return
2021-07-04 18:38:28 +03:00
vyzo
6a3cbea790
treat Has as an implicit Write
...
Rationale: the VM uses the Has check to avoid issuing a duplicate Write in the blockstore.
This means that live objects that would be otherwise written are not actually written, resulting
in the first write epoch being considered the write epoch.
2021-07-04 18:38:28 +03:00