Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of parsing simple dates #8386

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

tobias-hotz
Copy link
Contributor

This PR aims to improve the performance of IsoDate when handling "simple" Dates. This method is not very slow, but it is called very often (e.g. by UUIDMapper during harvesting), which makes it show up in some performance measurements. See #8007 where parsing simple Dates would take as much as 10-20% of the entire harvesting CPU time.

To validate the performance improvement, JMH benchmarks were added.
This is the baseline performance of the patch:

Benchmark                               (arg)  Mode  Cnt    Score     Error  Units
ISODateBenchmark.measureIsoSimple  1976-06-03  avgt    5  604.618 ± 69.233  ns/op
ISODateBenchmark.measureIsoSimple  1976/06/03  avgt    5  608.454 ± 44.684  ns/op
ISODateBenchmark.measureIsoSimple    24-06-06  avgt    5  874.100 ± 55.160  ns/op

Notice that the overall execution time is very small, but due to the extremly high number of calls, this is still a relevant metric
The first idea was to precompile the pattern used for the regex (see the first number), which resulted in the following numbers:

Benchmark                               (arg)  Mode  Cnt    Score     Error  Units
ISODateBenchmark.measureIsoSimple  1976-06-03  avgt    5  466.098 ±  32.134  ns/op
ISODateBenchmark.measureIsoSimple  1976/06/03  avgt    5  456.157 ±  52.184  ns/op
ISODateBenchmark.measureIsoSimple    24-06-06  avgt    5  716.912 ± 112.579  ns/op

This means a minor improvement, but there is still room to improve.
The second commit changes the algorithm to use as little string operations as possible. Whith this. I get the following numbers:

Benchmark                               (arg)  Mode  Cnt    Score    Error  Units
ISODateBenchmark.measureIsoSimple  1976-06-03  avgt    5  105.912 ± 11.120  ns/op
ISODateBenchmark.measureIsoSimple  1976/06/03  avgt    5  115.320 ±  9.542  ns/op
ISODateBenchmark.measureIsoSimple    24-06-06  avgt    5  193.689 ± 14.505  ns/op

The speedup is about 5x compared to the baseline performance.

Checklist

  • I have read the contribution guidelines
  • Pull request provided for main branch, backports managed with label
  • Good housekeeping of code, cleaning up comments, tests, and documentation
  • Clean commit history broken into understandable chucks, avoiding big commits with hundreds of files, cautious of reformatting and whitespace changes
  • Clean commit messages, longer verbose messages are encouraged
  • API Changes are identified in commit messages
  • Testing provided for features or enhancements using automatic tests
  • User documentation provided for new features or enhancements in manual
  • Build documentation provided for development instructions in README.md files
  • Library management using pom.xml dependency management. Update build documentation with intended library use and library tutorials or documentation

Funded by LGL BW

JMH allows per-method benchmarks. See jmh/README for more details
Simples dates are common in many cases. Performance analysis showed that this can be a performance hotspot during harvesting.
This is due to the design of UUIDMapper, which loads all metadata for a harvester for every new batch. This can not be easily changed, but we can improve the performance here, which also helps other code paths using this method.
This change improves the performance of the method by a factor of about 1.3x
This change improves the performance of parseDate by removing as many string operations as possible. Doing this and some other minor optimisations, we can improve the performance so it is about 5x faster than the original.
@tobias-hotz tobias-hotz changed the title Simple datetime perf Improve performance of parsing simple Dates Sep 24, 2024
@tobias-hotz tobias-hotz changed the title Improve performance of parsing simple Dates Improve performance of parsing simple dates Sep 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant