Performance improvement by using FastUtil Char2ObjectMap (concept PR) #90

omarshibli · 2020-11-21T07:55:52Z

Description

Suggestion for performance improvement by using FastUtil Char2ObjectMap, specialized char map implementation.

Changes

switch to Char2ObjectOpenHashMap
use primitive char
fixed warnings

Observed results

memory consumption is about 10x better in case there is no match
throughput improved between 10-22% better in case there is no match
throughput improvement varies in case there is a match, could be up to 22% too.

Benchmark result raw files
bench-post.2.txt
bench-post.csv.txt
bench-post.txt
bench-pre.2.txt
bench-pre.csv.txt
bench-pre.txt

Down sides

fastutil dependecy is pretty big, benchmark fat jar is 21M after adding it, prior to that it was about 2.4M

Reference

#31

- use primitive char - fixed warnings

codecov-io · 2020-11-21T08:25:10Z

Codecov Report

Merging #90 (45524fb) into master (012ad80) will decrease coverage by 25.34%.
The diff coverage is 3.55%.

@@              Coverage Diff              @@
##             master      #90       +/-   ##
=============================================
- Coverage     89.56%   64.22%   -25.35%     
  Complexity      209      209               
=============================================
  Files            26       28        +2     
  Lines           479      668      +189     
  Branches         66       89       +23     
=============================================
  Hits            429      429               
- Misses           34      223      +189     
  Partials         16       16

Impacted Files	Coverage Δ	Complexity Δ
src/main/java/com/intendia/jmh/GcProfiler.java	`0.00% <0.00%> (ø)`	`0.00 <0.00> (?)`
...rg/ahocorasick/benchmark/BenchmarkPayloadTrie.java	`0.00% <0.00%> (ø)`	`0.00 <0.00> (?)`
src/main/java/org/ahocorasick/trie/State.java	`86.48% <66.66%> (ø)`	`20.00 <3.00> (ø)`
...c/main/java/org/ahocorasick/trie/PayloadState.java	`96.96% <100.00%> (ø)`	`21.00 <4.00> (ø)`
...rc/main/java/org/ahocorasick/trie/PayloadTrie.java	`95.83% <100.00%> (ø)`	`60.00 <0.00> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 012ad80...45524fb. Read the comment docs.

ghost · 2020-11-21T18:40:30Z

The increase is size may have unforeseen consequences with applications depending on this library, such as: embedded systems, increased download times, and so forth. Fortunately, we can employ a similar strategy to how JDBC works, with the downside that it'll be a fair amount more effort.

For example:

import static CollectionFactory.createHashMap;

public class C {
  final Map<String, String> map = createHashMap( ... );
}

The default implementation would use Java's standard libraries. If an implementation exists for a drop-in replacement---fastutil, HPPC, Koloboke, or HFTC---then that library should be used through dynamic detection.

In this way, if developers decide to add the dependency, then it will be used. If they don't import the dependency, then the standard JDK is used and the resulting JAR file won't explode by an extra 20MB. (ProGuard can reduce the number of bundled classes to a bare minimum, but its usage not something we can impose upon developers.)

For cases where there isn't a direct 1:1 correlation (e.g., char primitive instead of Character class), then perhaps the strategy pattern or a functional interface may be appropriate to abstract away the implementation detail.

omarshibli · 2020-11-22T07:40:18Z

Thanks for the input.
I'm not sure if JDBC-style is the optimal approach here, the usage pattern of this library is very well defined.

put: during build phase
get: during all run time
No need for updates/resizing/...
Giving that the direction that I've in mind is to find the optimal implementation and add it the repo.

ghost · 2020-11-22T18:23:16Z

Giving that the direction that I've in mind is to find the optimal implementation and add it the repo.

What other ways can you see to get these performance gains that don't require developers who import this software to get a surprise 20MB increase to their own applications?

Note: A PR that incurs a 10-fold increase to the final JAR size will not be accepted.

omarshibli · 2020-11-23T06:20:20Z

I'm totally with you.
As I mentioned giving that the usage pattern is well defined, I want to find optimal implementation and copy it into the repo, so we will just have an increase of few classes.

omarshibli · 2020-11-23T06:24:22Z

I'm also checking other libraries:
https://github.com/thomasmueller/minperf
https://github.com/nicholassm/perfect-hashmaps

omarshibli added 2 commits November 21, 2020 07:49

- switch to Char2ObjectOpenHashMap

577fade

- use primitive char - fixed warnings

benchmark

45524fb

omarshibli force-pushed the unboxing branch from 7886eda to 45524fb Compare November 21, 2020 08:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance improvement by using FastUtil Char2ObjectMap (concept PR) #90

Performance improvement by using FastUtil Char2ObjectMap (concept PR) #90

omarshibli commented Nov 21, 2020

codecov-io commented Nov 21, 2020

ghost commented Nov 21, 2020 •

edited by ghost

Loading

omarshibli commented Nov 22, 2020

ghost commented Nov 22, 2020 •

edited by ghost

Loading

omarshibli commented Nov 23, 2020 •

edited

Loading

omarshibli commented Nov 23, 2020

Performance improvement by using FastUtil Char2ObjectMap (concept PR) #90

Are you sure you want to change the base?

Performance improvement by using FastUtil Char2ObjectMap (concept PR) #90

Conversation

omarshibli commented Nov 21, 2020

Description

Changes

Observed results

Down sides

Reference

codecov-io commented Nov 21, 2020

Codecov Report

ghost commented Nov 21, 2020 • edited by ghost Loading

omarshibli commented Nov 22, 2020

ghost commented Nov 22, 2020 • edited by ghost Loading

omarshibli commented Nov 23, 2020 • edited Loading

omarshibli commented Nov 23, 2020

ghost commented Nov 21, 2020 •

edited by ghost

Loading

ghost commented Nov 22, 2020 •

edited by ghost

Loading

omarshibli commented Nov 23, 2020 •

edited

Loading