Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement parallel dbuf eviction #16487

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

allanjude
Copy link
Contributor

Sponsored-by: Expensify, Inc.
Sponsored-by: Klara, Inc.

Motivation and Context

Replace dbuf_evict_one() with dbuf_evict_many() to more efficiently evict dbuf objects without looping over the same locked objects over and over.

Description

In the previous code, dbuf_evict_thread() would called dbuf_evict_one() in a look while dbuf_cache_above_lowater().

dbuf_evict_one() would select a random sublist from the dbuf cache, then walk it from the tail forward, attempting to acquire the lock on each object until it succeeded, then evict that object and return.

As the name suggests, it would evict only a single object from the cache. However, evicting one object is not likely to bring us below the desired low water mark, so dbuf_evict_one() will be called again, where it will loop over all of the same busy objects again, until it founds one it can evict.

This has been replaced with dbuf_evict_many() which takes a specific sublist as a parameter, as well as a desired amount of data to evict. It then walks the sublist from the tail forward, evicting what it can until the number of bytes evicted satisfies the input parameter or the head of the sublist is reached.

The dbuf_evict_thread now runs is parallel as well, allowing it to keep up with demand more easily. For the dbuf cache, if the single thread was not able to keep up, ZFS would shift the work of evicting some items to each incoming I/O thread. While that is still the case it should be seen much less often now that dbuf_evict is more efficient and no longer bottlenecked to a single thread.

How Has This Been Tested?

Performance benchmarks while the dbuf cache is full

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

Comment on lines +806 to +807
if (skip == 0)
skip = 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why to check before assign?

*/
static void
dbuf_evict_one(void)
dbuf_evict_many(uint64_t bytes, unsigned int idx)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would change the arguments order.

Comment on lines +925 to +933
uint64_t nchunks = ((bytes - 1) >> DBUF_MIN_EVICT_PERTASK_SHIFT) + 1;
unsigned n = nchunks < num_sublists ? nchunks : num_sublists;
uint64_t fullrows = nchunks / n;
unsigned lastrowcols = nchunks % n;
unsigned k = (lastrowcols ? lastrowcols : n);

uint64_t bytes_pertask_low = fullrows << DBUF_MIN_EVICT_PERTASK_SHIFT;
uint64_t bytes_pertask = bytes_pertask_low + (lastrowcols ?
(1 << DBUF_MIN_EVICT_PERTASK_SHIFT) : 0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as for ARC seems over-engineered. No need for tasks to be multiple of 1 << DBUF_MIN_EVICT_PERTASK_SHIFT.

Comment on lines +938 to +942
evarg[i].idx = idx;
evarg[i].bytes = evict;

taskq_dispatch_ent(dbuf_evict_taskq, dbuf_evict_task,
&evarg[i], 0, &evarg[i].tqe);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somewhere here should be taskq_init_ent(), probably.

Comment on lines 1009 to +1011
if (size > dbuf_cache_target_bytes()) {
if (size > dbuf_cache_hiwater_bytes())
dbuf_evict_one();
dbuf_evict();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets assume we have 10 user threads calling this. I suppose each of them will try to create own task sets to evict the same full amount of extra dbuf caches using all the same CPUs. In best case it may end up with empty dbuf cache. I am not sure I greatly like the design of one main eviction thread calling bunch of other taskqs, but each client thread doing that definitely looks weird. I think if user threads has to do evictions, they should do it directly, just doing more than one buffer at a time to be more efficient, as you have said.

@0mp 0mp force-pushed the parallel_dbuf_evict branch 2 times, most recently from d4a8574 to 54888d8 Compare September 12, 2024 11:57
In the previous code, dbuf_evict_thread() would called dbuf_evict_one()
in a look while dbuf_cache_above_lowater().

dbuf_evict_one() would select a random sublist from the dbuf cache,
then walk it from the tail forward, attempting to acquire the lock on
each object until it succeeded, then evict that object and return.

As the name suggests, it would evict only a single object from the
cache. However, evicting one object is not likely to bring us below the
desired low water mark, so dbuf_evict_one() will be called again, where
it will loop over all of the same busy objects again, until it founds
one it can evict.

This has been replaced with dbuf_evict_many() which takes a specific
sublist as a parameter, as well as a desired amount of data to evict.
It then walks the sublist from the tail forward, evicting what it can
until the number of bytes evicted satisfies the input parameter or
the head of the sublist is reached.

The dbuf_evict_thread now runs is parallel as well, allowing it to
keep up with demand more easily. For the dbuf cache, if the single
thread was not able to keep up, ZFS would shift the work of evicting
some items to each incoming I/O thread. While that is still the case
it should be seen much less often now that dbuf_evict is more efficient
and no longer bottlenecked to a single thread.

Sponsored-by: Expensify, Inc.
Sponsored-by: Klara, Inc.
Co-authored-by: Allan Jude <[email protected]>
Co-authored-by: Mateusz Piotrowski <[email protected]>
Signed-off-by: Alexander Stetsenko <[email protected]>
Signed-off-by: Allan Jude <[email protected]>
Signed-off-by: Mateusz Piotrowski <[email protected]>
@0mp
Copy link
Contributor

0mp commented Sep 12, 2024

I have updated the patch with a different logic for picking the default maximum number of dbuf eviction threads. The new logic aims to pick the number that is one-eighth of the available CPUs, with a minimum of 2 and a maximum of 16.

@behlendorf behlendorf added the Status: Code Review Needed Ready for review and testing label Sep 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Code Review Needed Ready for review and testing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants