Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it possible to set capacity by bytes? #52

Closed
rueian opened this issue Feb 11, 2024 · 42 comments
Closed

Is it possible to set capacity by bytes? #52

rueian opened this issue Feb 11, 2024 · 42 comments
Labels
enhancement New feature or request

Comments

@rueian
Copy link

rueian commented Feb 11, 2024

Hi @maypok86,

First of all, thank you for making this amazing library. I didn't know S3FIFO before until now. I learned a lot from your work.

Then this issue is not a feature request because I guess setting capacity by bytes is almost impossible based on my understanding of S3FIFO which requires the maximum entry count to be set beforehand.

However, setting the maximum entry count at the beginning is hard in cases of caching data of different sizes which I believe are the most of the cases. Therefore I think it will be better for users to set the capacity by bytes.

I just want to know if you have any idea about this topic or alternatives. Thanks!

@rueian rueian added the enhancement New feature or request label Feb 11, 2024
@maypok86
Copy link
Owner

maypok86 commented Feb 11, 2024

Hi, in general, I have a slightly different opinion, let me try to explain, you may know more than I do.

  1. Caches usually store fairly small key-value pairs. This is discussed in the following papers (first and second). It was also pointed out by Dgraph labs when they wrote ristretto (for example, it is mentioned in their benchmarks). This makes it much easier to just limit the number of entries in the cache, as this often even increases the hit ratio a bit (the eviction policy doesn't need to track an additional parameter).

  2. S3-FIFO can also be used when specifying capacity in bytes. This is partly why the original paper mentioned several times that the ghost queue stores the same number of items as the main queue, instead of being the same size. It's a little bit about different things. The only problem with specifying the capacity in bytes is the complexity of implementation, since now instead of a simple ring buffer based queue you have to use a growable version of it (a technique very similar to slices in go) or a queue based on doubly-linked lists.

  3. Otter already supports specifying capacity in bytes (though it doesn't take into account the overhead of internal structures, but okay). Just specify the Cost function in builder and enjoy the magic. For example, creating a cache with a maximum cost of key-value pairs of 100 MB.

cache, err := otter.MustBuilder[string, string](100 * 1024 * 1024).
    Cost(func(key string, value string) uint32 {
        return uint32(len(key) + len(value))
    }).
    WithTTL(time.Hour).
    Build()

P.S. It looks like otter is in urgent need of more detailed guides... Hopefully next week I'll finish a couple more sections in https://maypok86.github.io/otter/, but for now there's only a more detailed description of the Performance section from the README with an additional section on memory consumption.

@rueian
Copy link
Author

rueian commented Feb 12, 2024

cache, err := otter.MustBuilder[string, string](100 * 1024 * 1024).
    Cost(func(key string, value string) uint32 {
        return uint32(len(key) + len(value))
    }).
    WithTTL(time.Hour).
    Build()

Wow, this looks great. Thanks! I previously thought the cost function was used for specifying the cost of populating a given cache entry so that entries with higher costs would be less likely to be evicted.

S3-FIFO can also be used when specifying capacity in bytes. This is partly why the original paper mentioned several times that the ghost queue stores the same number of items as the main queue, instead of being the same size.

I missed that. I scanned the paper several times, but I didn't find the reason for setting the ghost queue length to be the same as the main one. I did find the author said that VMware found that 50% of the main queue was enough for the ghost.

Furthermore, the paper seemed to treat each entry as the same size intentionally because scalability brought by ring buffers was one key motivation of the paper and the author didn't know BP-wrapper back then.

Caches usually store fairly small key-value pairs.

Thank you for providing the references. There is no doubt that entries are usually small. But turning this fact into an estimated capacity as a number of entries is still hard. The second paper you provided also mentioned that the object size distribution is not static over time.

@maypok86
Copy link
Owner

I previously thought the cost function was used for specifying the cost of populating a given cache entry so that entries with higher costs would be less likely to be evicted.

No, it's specifically about specifying capacity in bytes, usually cost or weight (in the case of caffeine) only make sense for specifying capacity in bytes or something like that. This is largely why otter doesn't allow you to specify cost for entries directly.

I didn't find the reason for setting the ghost queue length to be the same as the main one.

It was just calculated experimentally, and since they tested on large traces it's even explainable. You just don't need fingerprints for one-hit wonders in a small queue, because they won't give you much of an advantage, and will waste memory.

I did find the author said that VMware found that 50% of the main queue was enough for the ghost.

It very much depends on the workload, for example I found that on S3 (search) and DS1 (page cache in database) traces, limiting the ghost queue to the number of items in the main queue does not work well, because of this Otter limits the size of the ghost queue to the number of items in the small and main queues, which helps to smooth out problems on unfriendly traces.

Furthermore, the paper seemed to treat each entry as the same size intentionally because scalability brought by ring buffers was one key motivation of the paper and the author didn't know BP-wrapper back then.

Yes, unfortunately the original paper glosses over a large number of problems. And especially about ring buffers, most of these problems can be solved, but it will lead to a very strong complication (it is very unpleasant to add features to such architecture in principle) and after these problems I am not sure at all that in the end something simpler and faster than W-TinyLFU + BP-Wrapper will work out (in this architecture it is already easy to add features and optimize). It will be very interesting to see it if someone can implement it with the same number of features.

But turning this fact into an estimated capacity as a number of entries is still hard.

That's why otter allows you to specify the capacity in bytes, but usually a rough understanding of the size of entries is more than enough and from this you can calculate the required number of entries in the cache, but this is the choice of each user.

@maypok86
Copy link
Owner

I've written some notes here about cost-based eviction.

@rueian
Copy link
Author

rueian commented Feb 13, 2024

I've written some notes here about cost-based eviction.

The notes are great! Should we also change the examples in the README? I would like to help.

Also, will it be better to specify the cost as uint32(len(key) * 2 + len(value)) since the key is stored in the hash table and its node of linked list?

@maypok86
Copy link
Owner

Should we also change the examples in the README?

I actually started writing the documentation because Otter's features were starting to exceed the README format. And in the documentation I was able to describe the Perfomance section in much more detail, add memory consumption benchmark results and additional hit ratio simulations on traces collected relatively recently from production systems of big tech companies. So I was planning to just start sending users to read the documentation for more details.

Also, will it be better to specify the cost as uint32(len(key) * 2 + len(value)) since the key is stored in the hash table and its node of linked list?

  1. Nodes store only the string structure (pointer to the start and length). And in the hash table too.
  2. Since Otter uses a fork of the hash table, it does not store keys twice like ristretto (it stores hashes, but it looks very controversial), theine and many other libraries does. This saves unsafe.Sizeof(key) + 8 bytes per entry (in case of strings it is 16 + 8 = 24 bytes). And also because of this, otter prevents some of the synchronization problems possible in ristretto, theine and ccache.

@rueian
Copy link
Author

rueian commented Feb 14, 2024

Brilliant! What synchronization problem do they have? I thought inplace string mutations are usually disallowed.

@rueian
Copy link
Author

rueian commented Feb 14, 2024

No, it's specifically about specifying capacity in bytes, usually cost or weight (in the case of caffeine) only make sense for specifying capacity in bytes or something like that. This is largely why otter doesn't allow you to specify cost for entries directly.

Just a random thought: would it be good idea to replace the term Capacity with Budget? I think the latter one is much more connected to the term Cost.

@maypok86
Copy link
Owner

What synchronization problem do they have?

It's not about strings, it's about the bp-wrapper technique used to achieve excellent scalability. The main idea of which is eventual consistency when interacting with eviction policy.

Because of which special effects like this can occur during concurrent execution:

insert(k, v) // node1
...
update(k, v) // node2 (node1 is no longer in the hash table).
// eviction policy decides to evict node1, but this does not apply to node2!
// In the end, nothing needs to be done, but the following will happen
delete(k)

Eventually the cache will have no entry with key equal to k. This problem is in both ristretto and ccache, it is not in theine, but there is another, even worse problem.

When updating a node with a cost change, theine updates it directly with atomic (link). Because of which something like this can happen and it will lead to terrible consequences for the whole cache.

insert(k, v, cost=2)
...
update(k, v, cost=100000)
// let the event with update k has not reached the eviction policy, but the event with insert has long since reached the eviction policy.

// and at this point the eviction policy decides to move this node from one lru queue to another,
// which may reduce the capacity of one of the queues to very small or negative values
// now the node update event reaches the eviction policy and it further increases the capacity by 100000-2 (but the other queue will remain broken)
// looks very sad :)

And all these caches still have problems with mixing up sequences of insert and update events when inserting into write buffer (including otter), but it's very hard to do it even on purpose, and the consequences, though unpleasant, are correctable in runtime.

@maypok86
Copy link
Owner

maypok86 commented Feb 14, 2024

Just a random thought: would it be good idea to replace the term Capacity with Budget?

Looks like it. But with a non-cost cache, it can throw off users. Most likely the ideal solution would be to split it into two separate functions (Capacity and Cost) in builder, but unfortunately I didn't guess that solution initially. For now, let it stay as it is, especially since it looks like there will be some more important backward-compatibility breaking changes to come.

@ben-manes
Copy link

This was a really frustrating problem when I first tried to implement variably sized entries and an advanced eviction policy. I wrote briefly about it in the old CLHM googlecode project's design doc and changelog. I was tried to replace LRU with LIRS but ran into races and didn't realize at first (so I tried rewrites a few times, trying to comb through the code to figure out my mistakes). LIRS being so complicated and the resulting race made me uncomfortable maintaining such a beast, as I couldn't trust code that I didn't understand. That prompted writing a simulator to have a simpler reference to debug from, a brute force test suite that generates millions of scenarios with deep integrity checks, and avoiding algorithms where I'd forget how they'd work so it would be a struggle to maintain it.

Anyway, my solution to that race may not have been ideal but was straightforward. I maintain two variable size fields, one from the writer's perspective and the other from the policy's. I use the replay logic to apply the updates as a delta, so they eventually become consistent and the eviction lists are correctly sized. The races resolve themselves with very little extra care needed. I forget the alternatives, which likely required additional synchronization instead of the metadata overhead.

fwiw, cost often referred to latency or miss penalty in papers (but not consistently) and variable sized as just "size", as they'd qualify as "byte size" when appropriate. Since Java uses size for the Map entry count and capacity for the underlying table's length, I wanted a clearer term. I also wanted to abstract the concept out so that I didn't have to manage the calculation of what an object's "byte size" means, since that is gnarly in Java. The "weight" of an entry seemed less ambiguous term and I liked the resulting api of a Weigher type and weigh(key, value) method. I wish Ristretto hadn't NIH'd to cost as to vague and I'd probably use penalty if I ever incorporate retrieval latency into the policy choice. I always run through the bumper sticker api design notes when writing a public api.

it looks like there will be some more important backward-compatibility breaking changes to come

In that case, you might consider moving TTL methods outside of the core api so it could be optional. There are various ad hoc user needs that only make sense for particular configurations, but burden the core api. Instead I made cache.policy() a dumping ground for odds and ends that most users wouldn't encounter, and used an Optional type to safely traverse into those features if enabled at construction. This way users have their escape hatches, I don't bloat the runtime costs, don't pollute the apis or make them inconsistent, and keep the conceptual weight low. I don't know if that is idiomatic to Go, so just a random bit of experience you can ignore if not applicable.

@maypok86
Copy link
Owner

This was a really frustrating problem when I first tried to implement variably sized entries and an advanced eviction policy.

Yeah, when the otter did the same thing, I didn't realize what was going on for days and was tearing my hair out. 😅 Thankfully RCU fixes this problem out of the box.

I maintain two variable size fields, one from the writer's perspective and the other from the policy's.

Wow, I've been using something very similar for a while. (link)


I'll probably write about the rest later, especially since I've already accumulated a lot of questions (some of them quite tricky) and answers. I apologize for not replying to the message in the other issue, but I was not ready to participate in the discussion then. Now I've collected some wants, polled a lot of people, and even implemented something I never wrote before

  1. A relatively extensible hit ratio simulator that can handle multi-gigabyte traces much faster than theine's simulator (ristretto's simulator just crashes with oom 😐). On which many traces from libCacheSim authors have been run.
  2. A code generator (not nicely written, but it works!) that not only reduces otter's memory footprint a bit, but also allows for more efficient algorithms for expiration policies. This was honestly borrowed from caffeine. 🙂 Let me ask something right away. The worst case O(n) time complexity at wheel ticks (daily, for example) is accepted and bp-wrapper should just amortize the loss, right?

P.S. Hopefully I'll still be able to formulate my thoughts at least some way tomorrow, because so far I'm getting something incoherent.

@ben-manes
Copy link

haha, looks very similar. Your node's setCost and setPolicyCost are my node's setWeight and setPolicyWeight.

Your simulator looks nicely structured and similar to mine. I parse the traces as a buffered stream that is collected into batches, dispatched to the policy's queue, and the policy implementor consumes one-by-one. This let me parallelize it like the actor model's mailbox and thread, or goroutine + buffered channel, let that blocking queue throttle the producer for large traces. Then each eviction policy became simple single-threaded logic, and I avoided too much code reuse to avoid over abstractions that could make them harder to follow and be less self-contained algorithms. A nice structure like you have makes it easy to extend in whatever direction your interests pull you.

I tried libCacheSim on large traces, but since it memory maps the entire decompressed trace that exceeded 36gb that I could allocate in docker (iirc, the author told others who also asked to use larger machines). It might work by recompiling the kernel to enable over commit, but I think it was decompressing the zstd file so I'd have needed a massive swap to preallocate it. I never got it running for the large traces and used smaller ones to spot check instead. I don't know why it didn't stream it, but I think it was to allow each worker to run as an independent thread through the file at its own pace, rather than divvy it out piecemeal by a controller.

I started with fixed expiration and added variable much later, since I originally did not know how to make it amortized O(1). I hadn't wanted to force size eviction to discard dead entries and users could easily do that themselves as a workaround. If starting anew then I'd emulate the fixed expiration using the timer wheel instead of having two algorithmic approaches. The time/memory cost is very low so it doesn't seem worth the effort and could be added later as an optimization if indeed was useful. Just know it wasn't an intentional concern against timing wheels, they just were not as well known until recently. I believe Caffeine was the first to use them for cache expiration but it wasn't novel as they are used for other timer event like network timeouts. I chose a hierarchical version as a better fit for a local cache because users often want prompt notification for their eviction listener, e.g. to close a websocket or other business logic. The other variations could also make sense and it depends on the requirements.

The tricky and fun part was making it cheap bitwise. I had no good references so I hope mine is sane, but curious if others would do it differently. Per your question, that linear time complexity has not yet been a problem because (a) the reordering is extremely cheap, (b) on-heap local caches are not often extremely large, (c) activity means its unlikely to do a full scan, (d) bp-wrapper allows the maintenance to be async so it doesn't impact user latencies. It could have a threshold limit, e.g. in the hill climber I amortized the region resizing over multiple maintenance runs. I have not yet had a user report or benchmark indicate that was needed, but it is a very simple change once deemed appropriate. The fix would be to not advance the timer wheel current time if it exited early and hint that a new maintenance run should be scheduled. Then over the next few runs it will fully catch up and fully advance. Caffeine has rescheduleCleanUpIfIncomplete to more aggressively deliver those prompt removal notifications.

The codegen looks nice! Mine feels ugly as well, but since it is not user facing and build time that seemed okay. It's worked pretty well and I like the memory saving so users get the features at low cost.

@rueian
Copy link
Author

rueian commented Feb 15, 2024

And all these caches still have problems with mixing up sequences of insert and update events when inserting into write buffer (including otter), but it's very hard to do it even on purpose, and the consequences, though unpleasant, are correctable in runtime.

Hi @maypok86, thank you for the detailed explanation! If the problem is out of order of insert/update/delete tasks of the same key, how about waiting a little bit until the previous task is queued? For example, using an empty channel on every node:

evicted := c.hashmap.Set(n)
if evicted != nil {
	// update
	<-evicted.queued
	c.writeBuffer.Insert(node.NewUpdateTask(n, evicted))
} else {
	// insert
	c.writeBuffer.Insert(node.NewAddTask(n))
}
close(n.queued)

@maypok86
Copy link
Owner

Unfortunately, there are a few problems here

  1. Then each node will store an additional pointer, and also creating a channel will create an additional allocation, resulting in increased pressure on gc, which is what all onheap caches are always accused of. I would like to take care of it as much as possible.
  2. Sequential insertion will still corrupt the eviction policy.
// for the eviction policy it will look like this
insert(k, v1, cost=100000) // dead node
update(k, v2, cost=2) // already good node

As a result, many good nodes can be evicted to insert a dead node.

I haven't looked at how it's done in caffeine yet, so far it looks like something like a set of states a node can be in and a set of transitions between them, i.e. something like a state machine. It would be very cool to implement this with a single uint32/int32, but I haven't really thought about how it works yet. I'm not sure if it's easy to make it linearizable, but for now I hope it can be :).

@rueian
Copy link
Author

rueian commented Feb 15, 2024

Then each node will store an additional pointer, and also creating a channel will create an additional allocation, resulting in increased pressure on gc, which is what all onheap caches are always accused of. I would like to take care of it as much as possible.

Maybe a busy loop to wait for toggling an atomic uint32 is acceptable here.

Sequential insertion will still corrupt the eviction policy.

I feel like this is somehow unavoidable, but could it be probably relieved by pre-aggregating each batch from the queue?

@maypok86
Copy link
Owner

maypok86 commented Feb 15, 2024

I feel like this is somehow unavoidable

Yes and no. When I was bored in traffic, I came up with something like this.

states = {alive, dead}
transitions = {alive -> dead}

And with set we do something like this

evicted := c.hashmap.Set(n)
if evicted != nil {
	// update
	evicted.Die() // evicted.state = dead
	c.writeBuffer.Insert(node.NewUpdateTask(n, evicted))
} else {
	// insert
	c.writeBuffer.Insert(node.NewAddTask(n))
}

and in the worker just do it like this:

if n.IsDead() {
    continue
}

That is, let's not try to fix insertion and update problems, but just filter events on the side of the worker who updates eviction policy.

This seems to fix the problems when update and insert are swapped, and also doesn't corrupt the eviction policy on consecutive insert and update events.

@rueian
Copy link
Author

rueian commented Feb 15, 2024

This seems to fix the problems when update and insert are swapped, and also doesn't corrupt the eviction policy on consecutive insert and update events.

Yes, that looks like a better idea than what I just came up with:

n.mu.Lock()
...
evicted := c.hashmap.Set(n)
if evicted != nil {
	// update
	evicted.mu.Lock()
	c.writeBuffer.Insert(node.NewUpdateTask(n, evicted))
	evicted.mu.Unlock()
} else {
	// insert
	c.writeBuffer.Insert(node.NewAddTask(n))
}
n.mu.Unlock()

@maypok86
Copy link
Owner

In fact, if we increase the number of transitions, we could even try using sync.Pool and reduce the number of allocations to zero, but when I started thinking about concurrent transitions, I got scared and decided to shelve that.

@rueian
Copy link
Author

rueian commented Feb 15, 2024

In fact, if we increase the number of transitions, we could even try using sync.Pool and reduce the number of allocations to zero, but when I started thinking about concurrent transitions, I got scared and decided to shelve that.

Using sync.Pool to recycle channels might be inconvenient because then those channels couldn't be closed. However I found that a mutex per node is enough (the above comment) to hold the order and will not incur GC overhead.

Using sync.Pool to recycle nodes would probably a good idea though.

@maypok86
Copy link
Owner

Ah, damn, it seems we can't. Because the scheduler will interrupt the goroutine with a get operation and after the gc cycle we will get an access to the freed memory. Debugging this would be very unpleasant.

Though we should find out what happens to the elements that still have pointers to them, but they're unlikely to survive.

@maypok86
Copy link
Owner

Using sync.Pool to recycle nodes would probably a good idea though.

Yeah, that's what I mean.

@maypok86
Copy link
Owner

Though we should find out what happens to the elements that still have pointers to them, but they're unlikely to survive.

Yes, gc will still clear the memory, too bad. :(

@rueian
Copy link
Author

rueian commented Feb 16, 2024

we will get an access to the freed memory

Why? I mean shouldn’t we clear all the accesses to a node before putting it to the pool?

Nevertheless, I found another downside of sync.Pool that might make it useless to otter: one main benefit of the pool is its thread local cache. If we get a node from many different goroutines but always put it back using the goroutine which handles the tasks queue, then the benefit is probably gone.

@maypok86
Copy link
Owner

@rueian Because nobody can forbid the scheduler to interrupt execution, for example, on this line. And then you may get a panic because of accessing freed memory.

@rueian
Copy link
Author

rueian commented Feb 20, 2024

Hi @maypok86, I think all questions in this thread are answered. Thank you for the very detailed explanation!

Before I close this issue, I would like to ask one more question: whether the idea of n.IsDead() is on your roadmap?

@maypok86
Copy link
Owner

Hi @rueian, I've been sick, but something like this was planned.

  1. Add node codogeneration
  2. Fix eviction policy corruption in case of wrong sequence of events (it's a bug, but it's pretty hard for users to get seriously affected by it).
  3. Release v1.1.0
  4. Remove prefetching
  5. Add dynamic buffers
  6. Revise API
  7. Release v2.0.0
  8. Add callbacks to notify when nodes are deleted
  9. Add Loading Cache.

Sequence 6-9 may change, but the rest should not change. I'm most interested in code generation of nodes and implementation of dynamic buffers, but I'm not sure they will be very useful to users, so I often add some features to the roadmap with high priority if someone comes with an issue. So far the only exception is loading cache, but there is a pretty big problem with its correct and fast implementation.

@rueian
Copy link
Author

rueian commented Mar 2, 2024

Hi @maypok86, hope you are doing well!

I have been trying how to correctly process the out-of-order policy events in recent days. It feels to me that the n.IsDead() only is not enough.

The following procedures are what I think how to process out-of-order events without corrupting a policy:

When inserting a node into a policy:

  1. Did we receive the deletion task of the node before? If yes, then we skip this insertion task.
  2. Is the node still in the hashmap? If not, then we skip this insertion task.
  3. Finally, we can insert the node into the policy.

When deleting a node from a policy:

  1. Did we insert the node into the policy before? If not, then we skip this deletion task.
  2. Finally, we can delete the node from the policy.

Therefore, besides an isDead field, I think we need at least another field in every node to answer the first question in the both above procedures. Please let me know if this makes sense to you.

@ben-manes
Copy link

I use alive, retired, dead node states. Does that work for you?

@maypok86
Copy link
Owner

maypok86 commented Mar 2, 2024

@rueian
Maybe I'm wrong, but I can't think of a counterexample in which the two-state option would break.
Most likely, you are confused by the possibility of incorrect order of Delete and Insert operations. But then we know in advance that the node is not in the eviction policy (queueType = unknown) and the Delete event will be ignored due to ignorance of the queue in which the node is located. Insert will be ignored because of the IsDead check.

@ben-manes
I don't know why caffeine uses three states, but I think it is more than enough for otter to know that the node has been removed from the hash table.

@ben-manes
Copy link

I think it was a simpler solution when the node is evicted prior to processing an explicit delete. There wasn't a difference in metadata cost between 2 and 3 states, so making it easier to reason about was the main benefit.

In the evict-then-delete case, the queued removal task should no-op instead of decrementing the total size twice. We could instead check to see if the entry was in the policy to infer that it had been evicted and no-op. Unfortunately, that doesn't work because the delete can be reordered before its insertion, so now insertion needs to not increment the total if not alive. That gets very confusing when an update also changes the weight and is reordered. Now the weight is changing and the update task must also avoid incorrectly changing the total size. Given all the potential scenarios it probably seemed less confusing to just introduce an extra state for coercing the total size.

@maypok86
Copy link
Owner

maypok86 commented Mar 2, 2024

Oh, I wonder, I think I got it. It seems that without noticing it myself, I actually introduced a third state by creating nodes with queueType = unknown (aka not in the eviction policy), as well as setting queueType to unknown during eviction. I do this specifically to avoid damage when deleting an evicted node. But it turns out that it's really not very obvious.

I may have to leave it as it is, since I don't really understand how to correctly make atomic transitions between more than two states without introducing locks (in nodes or arrays). And it may be even more difficult to understand.

@ben-manes
Copy link

When I first implemented this it was on LRU so there was no queue type. I don't have a preference as I think both are relatively clear.

@rueian
Copy link
Author

rueian commented Mar 3, 2024

That gets very confusing when an update also changes the weight and is reordered.

Yes, it is confusing so I made a checklist for a policy to insert and delete a node. I hope that can help clarify all the possibilities.

Most likely, you are confused by the possibility of incorrect order of Delete and Insert operations.

Yes, that is what I am concerning. As @ben-manes pointed out, we should do no-op if the deletion comes before its insertion.

Given the previous suggested implementation:

evicted := c.hashmap.Set(n)
if evicted != nil {
	// update
	evicted.Die() // evicted.state = dead
	c.writeBuffer.Insert(node.NewUpdateTask(n, evicted))
	...

The n.IsDead() can only answer the second question in the checklist (Is the node still in the hashmap?) but can't answer the first one. (Did we receive the deletion task of the node before?) unless we also mark the node dead after processing its deletion task. My original thought was to change the queueType to something like deleted which is why I felt n.IsDead() only is not enough.

@ben-manes
Copy link

Yes, it is confusing so I made a checklist for a policy to insert and delete a node. I hope that can help clarify all the possibilities.

I think we all agree, some type 3rd state is simplest. I was replying under the assumption that only two states were used and it was inferred somehow by the state of the policy data structures. I think using the queueType as you both suggested is reasonable. I encoded it using the entry's key field with sentinel retired/dead keys, which works nicely because keys are a reference type. When Java adds value types then I'd probably switch to something like your ideas, but I haven't thought about that much.

@rueian
Copy link
Author

rueian commented Mar 4, 2024

Hi @maypok86 and @ben-manes,

Thank you both for your detailed discussion on the above topics. It looks like the order issue has been fixed in #59. So I think this issue can be closed now.

@rueian rueian closed this as completed Mar 4, 2024
@rueian
Copy link
Author

rueian commented Mar 14, 2024

Hi @maypok86,

I am not sure if you are still considering implementing the Loading Cache, but when I worked on my own forked otter to integrate into rueidis, I found a potential problem that could make a pinned entry never be updated and left as pinned forever.

What I was trying to do is basically the same as I previously mentioned: #33 (comment)

The idea is first to insert a pinned entry, as a placeholder, into the hashmap and later update it with a second cache.Set() call after its value and cost are prepared.

However, there is a check that can make the update be skipped, so it is possible that a pinned entry will left forever:

if cost > c.policy.MaxAvailableCost() {
c.stats.IncRejectedSets()
return false
}

I guess my only choice is to delete the check and I think you may also have a similar situation if you are going to implement the Loading Cache.

@maypok86
Copy link
Owner

maypok86 commented Mar 14, 2024

Actually, I wanted to release v2 with a fixed api first, then adding Loader would be easier. Also, I didn't plan to use Set for this at all.

But there were also some problems. I am very confused about what CockroachDB's Swiss Map is going to do. If it becomes the new standard map, it will break a lot of libraries and systems (many people import the map structure from the go runtime) and otter as well (maphash is used). And the authors of Go refuse to add a hash function for comparable types to the standard library (but they have added a million functions to the standard library that are not particularly useful😵‍💫). As a result, all that remains is to ask the user to write the hash function on their own... but I doubt that most people will write a good hash function on their own, and it's unpleasant to do it every time.

The only reassuring thing is that if this happens, a lot of things will break. And there is also a potential opportunity to import typehash from runtime, but I have not studied its limitations.

It is also possible that writing a hash function by the user is not so bad. Java seems to use a similar approach, but no one does this in Go and most likely it will scare off users.

@maypok86
Copy link
Owner

@rueian I think it will really be easier for you to delete this part, but I sincerely hope to write a Loading version.

@maypok86
Copy link
Owner

I assumed something like this api

type basicCache[K comparable, V any] interface {
	Set(key K, value V)
	Range(f func(key K, value V) bool)
	..
}

type Cache[K comparable, V any] interface {
	basicCache[K, V]
	Has(key K) bool
	Get(key K) (V, bool)
	BulkGet(keys []K) map[K]V
}

type LoadingCache[K comparable, V any] interface {
	basicCache[K, V]
	Get(ctx context.Context, key K) (V, error)
	// An incredibly convenient feature, but it will be very difficult to implement it well.
	// Perhaps the option of creating goroutines is suitable or goroutine pool.
	BulkGet(ctx context.Context, keys []K) (map[K]V, error)
}

type Builder[K comparable, V any] struct {
	...
}

func (b *Builder[K, V]) ConstTTL(ttl time.Duration) *Builder[K, V] {
	...
}

// How to provide a function for a variable ttl is not very clear...
func (b *Builder[K, V]) VarTTL(calcTTL func (key K, value V) time.Duration) *Builder[K, V] {

}

// It seems that several different build methods are much easier to implement and quite understandable to the user.

func (b *Builder[K, V]) Build() (Cache[K, V], error) {
	if b.err != nil {
		return nil, b.err
	}
	return newCache(b), nil
}

// It is probably better to take the interface as a parameter, but in Go you cannot create an anonymous structure with methods.
func (b *Builder[K, V]) BuildLoading(loader func(ctx context.Context, key K) (V, error)) (LoadingCache[K, V], error) {
	if b.err != nil {
		return nil, b.err
	}
	return newLoadingCache(b), nil
}

@rueian
Copy link
Author

rueian commented Mar 15, 2024

I am very confused about what CockroachDB's Swiss Map is going to do. If it becomes the new standard map, it will break a lot of libraries and systems

According to their meeting notes, it is likely to happen.

type LoadingCache[K comparable, V any] interface {
	basicCache[K, V]
	Get(ctx context.Context, key K) (V, error)
	// An incredibly convenient feature, but it will be very difficult to implement it well.
	// Perhaps the option of creating goroutines is suitable or goroutine pool.
	BulkGet(ctx context.Context, keys []K) (map[K]V, error)
}

Automatic BulkGet by using the same loader function is indeed incredibly convenient, but I guess some users would like to implement a bulk loader by themselves.

@maypok86
Copy link
Owner

According to their meeting golang/go#43930 (comment), it is likely to happen.

Yes, I hope they will come up with something so as not to break so many applications.

Automatic BulkGet by using the same loader function is indeed incredibly convenient, but I guess some users would like to implement a bulk loader by themselves.

Yeah, that's why I want to have some kind of universal solution for this.

@rueian I will try to collect ideas on the new api in this issue. It will be very cool to know your opinion. There we will try to take into account LoadingCache and other features.

@rueian rueian mentioned this issue Mar 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants