compression: additional limit for effective dict #12087

stevemilk · 2024-09-25T10:33:44Z

#11921

AskAlexSharov · 2024-09-27T09:19:22Z

erigon-lib/seg/parallel_compress.go

+
+		m.RLock()
+		if _, ok := usedPatterns[p.code]; !ok && len(usedPatterns) >= effectiveDictLimit {
+			continue


jump outside of mutex section - it's deadlock

oh sry, just notice you fix it

AskAlexSharov · 2024-09-27T11:32:56Z

compression ratio seems impacted:

2.4G	v1-060700-060800-transactions.seg.compressed_s1_dict_8k_min20_max128_sof1_dictInt256k_old
3.1G	v1-060700-060800-transactions.seg.compressed_dict_8k_min20_max128_sof1_dictInt128k_new

i did debug deeply yet. not sure - do we drop "part of dictionary" which has low-score or also can drop high-scored patterns.

i used next command:

go run ./cmd/erigon snapshots uncompress /erigon-data/snapshots/v1-060700-060800-transactions.seg | DictReducerSoftLimit=1000000 MinPatternLen=20 MaxPatternLen=128 SamplingFactor=1 MaxDictPatterns=262144 EffectiveDictLimit=65536 go run ./cmd/erigon snapshots compress --datadir=/erigon-data/erigon3/  /erigon-backup/v1-060700-060800-transactions.seg.compressed_s1_dict_8k_min20_max128_sof1_dictInt256k_new --pprof --pprof.port=6061 > ~/log3.txt 2>&1 &

stevemilk · 2024-09-27T13:29:32Z

MaxDictPatterns = MaxDictPatterns * 2
EffectiveDictLimit = MaxDictPatterns / 2

With above config, init-dict will include more patterns than before, and may exclude some high-score patterns during dict reduction because patterns are hit randomly, low-score pattern may occupy the space of effective dict in advance.

will re-consider the solution.

stevemilk · 2024-09-30T08:34:51Z

The previous version incorrectly included some unused patterns
I'v updated , however, the compression ratio will still be impacted to some extent.

Here's the reason:
In the current design, init-dict is reduced to reduced-dict after compression, retaining only the used patterns. For example, if len(init-dict) = 1024 and len(reduced-dict) = 512, with a compression ratio of 0.3, setting effectiveDictLimit to 512 keeps the ratio the same. However, if effectiveDictLimit is less than len(reduced-dict), some patterns will be dropped, potentially increasing the compression ratio.

I believe effectiveDictLimit serves as a hard limit to prevent the resulting dictionary from becoming too large, rather than a means to improve the compression ratio. I conducted some investigation but would appreciate your insights or suggestions for a better solution. @AskAlexSharov

AskAlexSharov · 2024-09-30T09:48:52Z

Tnx. I will trst another day

stevemilk marked this pull request as draft September 25, 2024 13:56

additional limit for effective dict

4a4d89f

stevemilk force-pushed the effectiveDict branch from c02a1ab to 4a4d89f Compare September 25, 2024 16:09

stevemilk marked this pull request as ready for review September 25, 2024 16:41

AskAlexSharov assigned AskAlexSharov and unassigned AskAlexSharov Sep 25, 2024

save

f4289de

AskAlexSharov requested changes Sep 27, 2024

View reviewed changes

fix deadLock

a9f22ee

stevemilk force-pushed the effectiveDict branch from de415c2 to a9f22ee Compare September 27, 2024 09:31

stevemilk requested a review from AskAlexSharov September 27, 2024 09:34

stevemilk marked this pull request as draft September 30, 2024 02:04

save

b9a31f2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

compression: additional limit for effective dict #12087

compression: additional limit for effective dict #12087

stevemilk commented Sep 25, 2024

AskAlexSharov Sep 27, 2024

stevemilk Sep 27, 2024

AskAlexSharov commented Sep 27, 2024 •

edited

Loading

stevemilk commented Sep 27, 2024

stevemilk commented Sep 30, 2024

AskAlexSharov commented Sep 30, 2024

compression: additional limit for effective dict #12087

Are you sure you want to change the base?

compression: additional limit for effective dict #12087

Conversation

stevemilk commented Sep 25, 2024

AskAlexSharov Sep 27, 2024

Choose a reason for hiding this comment

stevemilk Sep 27, 2024

Choose a reason for hiding this comment

AskAlexSharov commented Sep 27, 2024 • edited Loading

stevemilk commented Sep 27, 2024

stevemilk commented Sep 30, 2024

AskAlexSharov commented Sep 30, 2024

AskAlexSharov commented Sep 27, 2024 •

edited

Loading