-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
compression: additional limit for effective dict #12087
base: main
Are you sure you want to change the base?
Conversation
c02a1ab
to
4a4d89f
Compare
|
||
m.RLock() | ||
if _, ok := usedPatterns[p.code]; !ok && len(usedPatterns) >= effectiveDictLimit { | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
jump outside of mutex section - it's deadlock
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh sry, just notice you fix it
de415c2
to
a9f22ee
Compare
compression ratio seems impacted:
i did debug deeply yet. not sure - do we drop "part of dictionary" which has low-score or also can drop high-scored patterns. i used next command: go run ./cmd/erigon snapshots uncompress /erigon-data/snapshots/v1-060700-060800-transactions.seg | DictReducerSoftLimit=1000000 MinPatternLen=20 MaxPatternLen=128 SamplingFactor=1 MaxDictPatterns=262144 EffectiveDictLimit=65536 go run ./cmd/erigon snapshots compress --datadir=/erigon-data/erigon3/ /erigon-backup/v1-060700-060800-transactions.seg.compressed_s1_dict_8k_min20_max128_sof1_dictInt256k_new --pprof --pprof.port=6061 > ~/log3.txt 2>&1 & |
MaxDictPatterns = MaxDictPatterns * 2 With above config, init-dict will include more patterns than before, and may exclude some high-score patterns during dict reduction because patterns are hit randomly, low-score pattern may occupy the space of effective dict in advance. will re-consider the solution. |
The previous version incorrectly included some unused patterns Here's the reason: I believe effectiveDictLimit serves as a hard limit to prevent the resulting dictionary from becoming too large, rather than a means to improve the compression ratio. I conducted some investigation but would appreciate your insights or suggestions for a better solution. @AskAlexSharov |
Tnx. I will trst another day |
#11921