Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

优化resize,避免fgc #1889

Open
wants to merge 2 commits into
base: 1.x
Choose a base branch
from
Open

优化resize,避免fgc #1889

wants to merge 2 commits into from

Conversation

qiangwang
Copy link

优化resize,避免fgc

Description

增加每次resize的大小来减少resize次数,避免生成大量大数组引发fullgc

Type of Change

Please check any relevant options and delete the rest.

  • Bug fix (non-breaking change which fixes an issue)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • New feature (non-breaking change which adds functionality)
  • This change requires a documentation update

How Has This Been Tested?

对比resize方法调用次数,优化后次数明显下降,同时考虑到build最后会执行shrink方法,build完成后不会引起额外的内存消耗

Checklist

Check all items that apply.

  • ⚠️Changes must be made on dev branch instead of master
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have checked my code and corrected any misspellings

@qiangwang
Copy link
Author

qiangwang commented Apr 1, 2024

@hankcs 麻烦评审下

@w6et
Copy link

w6et commented Apr 1, 2024

优化resize,避免fgc

我是看客,我是看到resize、shrink 都在build方法里,是前后调用,也不是循环体,为什么同一个方法内,先多申请一些,然后再shrink,能避免多次resize、fgc??所以有点儿好奇你的调用场景

@qiangwang
Copy link
Author

qiangwang commented Apr 3, 2024

优化resize,避免fgc

我是看客,我是看到resize、shrink 都在build方法里,是前后调用,也不是循环体,为什么同一个方法内,先多申请一些,然后再shrink,能避免多次resize、fgc??所以有点儿好奇你的调用场景

使用时发现的现象:10s 内调了160次resize,双数组大小为40M,相当于10s内产生了160*40M=6400M=6.4G的数组

现象原因:在构建过程中,随着数据增加数组长度不够用需要扩容,如果每次扩容量太少就需要频繁调用resize扩容,而resize会产生废弃的旧的数组,这些数组没有被及时回收堆积起来就会触发fgc。build方法里的resize只是初始化,后续逻辑还会多次调用resize。

解决办法:增加每次resize的大小,避免频繁扩容,减少垃圾的产生,也就避免了fgc

@hankcs
Copy link
Owner

hankcs commented Apr 4, 2024

感谢贡献,我也有类似的concern。

  1. 每次resize2倍,虽然减少了碎片,但增加了内存峰值
  2. resize多少次应该与具体词典大小有关,可否预先给大辞典更大的初始大小?当然具体多少需要做实验总结
  3. 直接在resize内部double size,违反了be explicit的原则

@qiangwang
Copy link
Author

qiangwang commented Apr 12, 2024

@hankcs 这3个点都很有道理,是不是可以提供一个参数出来控制扩容比例(比如resizeRatio)?用户可以自己决定要不要用,也能明确知道对应的影响

@qiangwang
Copy link
Author

qiangwang commented May 20, 2024

@hankcs 看看最新的修改怎么样

关于2,给一个更大的初始值对用户来说使用难度会增大,多少合适其实是一个变化的过程,当字典不断增大需要持续调整,而设定一个合适的增长比例就比较简单了,根据用户能接受的内存峰值配置就好

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants