-
Notifications
You must be signed in to change notification settings - Fork 232
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A76: Improvements to the Ring Hash LB Policy #412
base: master
Are you sure you want to change the base?
Changes from 7 commits
e558476
5407438
cc3af7f
01449d7
cf65a14
202a225
0de42f2
a9db1d1
3458c0a
4120933
44ef7d3
7edaaad
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,236 @@ | ||
A76: Improvements to the Ring Hash LB Policy | ||
---- | ||
* Author(s): atollena | ||
* Approver: markdroth | ||
* Status: Draft | ||
* Implemented in: Go | ||
* Last updated: 2024-04-26 | ||
* Discussion at: https://groups.google.com/g/grpc-io/c/ZKI1RIF0e_s/m/oBXqOFb0AQAJ | ||
|
||
## Abstract | ||
|
||
This proposal describes two improvements to the `ring_hash` load balancing | ||
policy: | ||
|
||
1. The ability to use ring hash without xDS, by extending the policy | ||
configuration to define the [request header][header] to use as the request | ||
hash key. | ||
2. The ability to specify endpoint hash keys explicitly, instead of hashing the | ||
endpoint IP address. | ||
|
||
## Background | ||
|
||
### Terminology | ||
|
||
* The *request hash key*, after being hashed, defines where a given request is | ||
to be placed on the ring in order to find the closest endpoints. | ||
* The *endpoint hash key*, after being hashed, determines the locations of an | ||
endpoint on the ring. | ||
|
||
### Using ring hash without xDS by explicitly setting the request hash key | ||
|
||
gRPC supports the `ring_hash` load balancing policy for consistent hashing. This | ||
policy currently requires using xDS for configuration because users have no | ||
other way to configure the hash for a request but to use the route configuration | ||
`hash_policy` field in the `RouteAction` route configuration. This makes the | ||
`ring_hash` policy unusable without an xDS infrastructure in place. | ||
|
||
This proposal extends the configuration of `ring_hash` policy to specify a | ||
header to hash. This will make it possible to use `ring_hash` by configuring it | ||
entirely in the [service config][service-config]. If this configuration is | ||
omitted, we will preserve the current behavior of using the xDS hash policy. | ||
|
||
### Using an explicit endpoint hash key | ||
|
||
Another limitation of the current `ring_hash` load balancing policy is that it | ||
always hashes the endpoint IP address to place the endpoints on the ring. In | ||
some scenario, this choice is not ideal: for example, [Kubernetes | ||
Statefulsets](https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/) | ||
offer a way to configure workloads with sticky identity such that endpoints keep | ||
storage and network identifier across restarts. However, the IP address may | ||
change across restarts. After a deployment, if all IP addresses have changed, | ||
then a completely new ring has to be constructed, even though it may have been | ||
desirable to keep the ring unchanged based on the Statefulsets identities, so | ||
that each instance stays at the same location on the ring. | ||
|
||
Envoy offers a solution to control endpoint hashes independently of IP | ||
addresses. This mechanism uses the `"envoy.lb"` | ||
[LbEndpoint.Metadata][LbEndpoint.Metadata] field `hash_key` value available in | ||
EDS instead of the endpoint IP address, as described in [the Envoy documentation | ||
for ring hash][envoy-ringhash]. This proposal adds support for setting the | ||
endpoint hash key explicitly via EDS by reusing the configuration mechanism | ||
implemented in Envoy. To retain the advantage of being able to use `ring_hash` | ||
without xDS, custom gRPC name resolvers will be able to set this endpoint | ||
attribute through the language specific resolver attribute interface. | ||
|
||
### Related Proposals: | ||
|
||
This proposal extends the following existing gRFCs: | ||
|
||
* [gRFC A42: xDS Ring Hash LB Policy][A42] | ||
markdroth marked this conversation as resolved.
Show resolved
Hide resolved
|
||
* [gRFC A61: IPv4 and IPv6 Dualstack Backend Support][A61] | ||
* [gRFC A74: xDS Config Tears][A74] | ||
|
||
## Proposal | ||
|
||
### Explicitly setting the request hash key | ||
|
||
A new field `request_hash_header` will be added to the `ring_hash` policy | ||
config: | ||
|
||
```proto | ||
message RingHashLoadBalancingConfig { | ||
// (existing fields omitted) | ||
string request_hash_header = 3; | ||
} | ||
``` | ||
|
||
Upon loading the load balancing config, if the `request_hash_header` field | ||
contains a value that is not a valid header name, then the configuration is | ||
rejected. If the `request_hash_header` refers to a binary header (suffixed with | ||
`-bin`), the configuration is also rejected. | ||
|
||
At pick time: | ||
- If `request_hash_header` is empty, then the request hash key will be based on | ||
the xDS hash policy in RDS, which keeps the existing LB configuration for ring | ||
hash working as before with xDS. | ||
- If `request_hash_header` is not empty, and the header has a non-empty value, | ||
then the request hash key will be set to this value. If the header contains | ||
multiple values, then values are joined with a comma `,` character before | ||
hashing. | ||
- If `request_hash_header` is not empty, and the request has no value associated | ||
with the header or its value is empty, then the picker will generate a random | ||
ejona86 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
hash for the request. If the use of this random hash triggers a connection | ||
attempt (according to the rules described in [A42: Picker | ||
Behavior][A42-picker-behavior] and updated in [A61: Ring Hash][A61-ring-hash]), | ||
then before queuing the pick, the picker will scan forward searching for a | ||
subchannel in `READY` state. If it finds a subchannel in `READY` state, the | ||
picker returns it. | ||
|
||
The following pseudo code describes the updated picker logic: | ||
|
||
``` | ||
// Determine request hash. | ||
using_random_hash = false; | ||
if (config.request_hash_header.empty()) { | ||
request_hash = call_attributes.hash; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can't this also be empty? In which case you'd use a random hash, too? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, I think the behavior is a bit confusing as it it, since if |
||
} else { | ||
header = headers.find(config.request_hash_header); | ||
if (header == null) { | ||
using_random_hash = true; | ||
request_hash = ComputeRandomHash(); | ||
} else { | ||
request_hash = ComputeHash(header); | ||
} | ||
} | ||
|
||
// Do pick based on hash. | ||
first_index = ring.FindIndexForHash(request_hash); | ||
requested_connection = false; | ||
for (i = 0; i < ring.size(); ++i) { | ||
index = (first_index + i) % ring.size(); | ||
if (ring[index].state == READY) { | ||
return ring[index].picker->Pick(...); | ||
} | ||
if (requested_connection) continue; | ||
if (ring[index].state == IDLE) { | ||
ring[index].endpoint.TriggerConnectionAttemptInControlPlane(); | ||
if (using_random_hash) { | ||
requested_connection = true; | ||
continue; | ||
} | ||
return PICK_QUEUE; | ||
} | ||
if (ring[index].state == CONNECTING) { | ||
return PICK_QUEUE; | ||
ejona86 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
} | ||
} | ||
if (requested_connection) return PICK_QUEUE; | ||
return PICK_FAIL; | ||
``` | ||
|
||
This behavior ensures that a single request creates at most one new connection, | ||
and that a request missing the header does not add extra latency in the common | ||
ejona86 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
case where there is already at least one subchannel in `READY` state. It | ||
converges to picking a random host, since each request may create a new | ||
connection to a random endpoint. | ||
|
||
markdroth marked this conversation as resolved.
Show resolved
Hide resolved
|
||
### Explicitly setting the endpoint hash key | ||
|
||
The `ring_hash` policy will be changed such that the hash key used for | ||
determining the locations of each endpoint on the ring will be extracted from a | ||
pre-defined endpoint attribute called `hash_key`. If this attribute is set, then | ||
the endpoint is placed on the ring by hashing its value. If this attribute is | ||
not set or empty, then the endpoint's first address is hashed, matching the | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. "first address" -- So the A61 gRFC says the DNS resolver should do RFC-6724 sorting. But the endpoints aren't necessarily coming from the DNS resolver, right? So do we want to apply the same sorting here anyway, to ensure consistency? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
The DNS resolver will create an endpoint per address. The fact that it sorts them seems irrelevant, since we do not take the order of endpoints into consideration when placing them on the ring. I would expect other resolvers that cannot group endpoints to operate like the DNS resolver, and resolvers that can group endpoints to sort addresses within an endpoint in a consistent way (such as according to RFC-6724). To me the wording seems sufficient here: users that care about this consistency should either sort addresses in the resolver, or use the endpoint metadata key. You do raise an interesting point which is that if the DNS resolver returns multiple addresses for the same endpoint, there is no way for ring hash to associate them to the same endpoint. So each address, regardless of family, will get its own place on the ring. Some addresses may be multi-home of the same endpoint, etc. That seems like a fundamental limitation of using DNS A/AAAA records for service discovery: we can't do things that rely on knowing the list of endpoints rather than the list of addresses. And I think this concern also applies to other LB policies, such as RR and WRR, which when used with DNS, would cause multiple connections to the same endpoint, potentially resulting in load imbalance. |
||
current behavior. The locations of an existing endpoint on the ring is updated | ||
if its `hash_key` endpoint attribute changes. | ||
|
||
The xDS resolver, described in [A74][A74], will be changed to set the `hash_key` | ||
endpoint attribute to the value of [LbEndpoint.Metadata][LbEndpoint.Metadata] | ||
`envoy.lb` `hash_key` field, as described in [Envoy's documentation for the ring | ||
hash load balancer][envoy-ring-hash]. | ||
ejona86 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
### Temporary environment variable protection | ||
|
||
Explicitly setting the request hash key cannot possibly lead to problem with | ||
existing deployment because the new behavior requires setting a load balancing | ||
policy configuration field that did not exist before. Therefore, it is not gated | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That's not the condition. It's "whether remote I/O can trigger it" and it can. The reason to have an environment variable here is to handle bugs in implementations while they are being implemented. If there's a crasher bug, for example, you don't want to prevent using the new feature because you might trigger old clients to crash. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. OK, so I guess for this one it also makes sense to have it be opt-in to start with. I added an environment variable condition called There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The normal process is, "when it is adequately tested, enable it by default." C++ core tends to remove the env var at that point. In Java/Go we tend to wait a release with the env var there so it can be disabled if something goes awry. When the env var is removed doesn't matter; we care about when the default changes here. But the main point is "when tested." In most other gRFC's that's our interop test framework; here, I guess it'll just be unit tests? For Go in practice it might be "when someone tests it with a service" where "someone" is "Antoine and friends". Again, the main thing we want to avoid is a bug during the initial implementation that causes really bad client behavior such that you can't enable the feature in the future, lest you trigger broken clients. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. OK, that makes sense, and it also applies to the other environment variable then. I'm happy to remove the environment variables when we have usage (we have internal usage of a fork of the existing ring hash that we'll be able to replace when this is implemented in Java and Go). |
||
behind an environment variable. | ||
|
||
Adding support for the hash_key in xDS endpoint metadata could potentially break | ||
existing clients whose control plane is setting this key, because upgrading the | ||
client to a new version of gRPC would automatically cause the key to start being | ||
used. We expect that this change will not cause problems for most users, but | ||
just in case there is a problem, we will provide a migration story by supporting | ||
a temporary mechanism to tell gRPC to ignore the `hash_key` endpoint | ||
metadata. This will be enabled by setting the | ||
`GRPC_XDS_ENDPOINT_HASH_KEY_BACKWARD_COMPAT` environment variable to true. This | ||
mechanism will be supported for a couple of gRPC releases but will be removed in | ||
the long run. | ||
dfawley marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
## Rationale | ||
|
||
We originally proposed using language specific interfaces to set the request | ||
hash key. The advantage would have been that the request hash key would not have | ||
to be exposed through gRPC outgoing headers. However, this would have required | ||
defining language specific APIs, which would increase the complexity of this | ||
change. | ||
|
||
We also discussed the option of exposing all `LbEndpoint.metadata` from EDS | ||
through name resolver attributes, instead of only extracting the specific | ||
`hash_key` attribute, so as to make them available to custom LB policies. We | ||
decided to keep only extract `hash_key` to limit the scope of this gRFC. | ||
|
||
We discussed various option to handle requests that are missing a hash key in | ||
the non-xDS case. When using ring hash with xDS, the hash is assigned a random | ||
value in the xDS config selector, which ensure all picks for this request can | ||
trigger a connection to at most one endpoint. However, without xDS, there is no | ||
place in the code to assign the hash such that it retains this property. We | ||
considered the following alternative solutions: | ||
1. Add a config selector or filter to pick the hash. There currently is no | ||
public API to do so from the service config, so we would have had to define | ||
one. | ||
2. Using an internal request attribute to set the hash. Again, there is no | ||
cross-language public API for this. | ||
3. Failing the pick. We generally prefer the lack of a header to affect load | ||
balancing but not correctness, so this option was not ideal. | ||
4. Treating a missing header as being present but having the empty string for | ||
value. All instances of the channel would end up picking the same endpoint to | ||
send requests with a missing header, which could overload this endpoint if a | ||
lot of requests do not have a request hash key. | ||
|
||
## Implementation | ||
|
||
Implemented in Go: | ||
- Allow setting the request hash key: https://github.com/grpc/grpc-go/pull/7170 | ||
- Make endpoint hash key configurable, and set it from EDS: https://github.com/grpc/grpc-go/pull/7161 | ||
|
||
[A42]: A42-xds-ring-hash-lb-policy.md | ||
[A61]: A61-IPv4-IPv6-dualstack-backends.md | ||
[A74]: A74-xds-config-tears.md | ||
[envoy-ringhash]: https://www.envoyproxy.io/docs/envoy/latest/intro/arch_overview/upstream/load_balancing/load_balancers#ring-hash | ||
[header]: https://grpc.io/docs/guides/metadata/#headers | ||
[service-config]: https://github.com/grpc/grpc/blob/master/doc/service_config.md | ||
[LbEndpoint.Metadata]: https://www.envoyproxy.io/docs/envoy/latest/api-v3/config/endpoint/v3/endpoint_components.proto#envoy-v3-api-field-config-endpoint-v3-lbendpoint-metadata | ||
[A42-picker-behavior]: A42-xds-ring-hash-lb-policy.md#picker-behavior | ||
[A61-ring-hash]: A61-IPv4-IPv6-dualstack-backends.md#ring-hash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's probably also worth referencing A61, which significantly simplified the ring_hash picker logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, thanks for pointing this out, I wasn't aware of A61's impact on ring hash. I spent some time to understand how that affects the implementation I started in Go. Go implements A62 (pick first sticky transient failure) but not A61, so it cannot take advantage of this without further refactoring the ring hash policy. I'll sync up with the Go team to see what the best path forward is, but I imagine it'd be best to start with implementing at least part of A61 (delegating to
pick_first
).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After a quick discussion with @dfawley I decided to try to implement this before A61. I think the proposal could be more clear in this case, at least for Go, because there is a difference between immediately attempting to connect (which happens only for subchannels in "idle" state), and queuing a connect for subchannels in transient failure, where a connection attempt will be triggered when the subchannel returns to
idle
state after the connection backoff.I implemented the new behavior of scanning forward for a ready subchannel, either when:
I think it has the desired behaviour of making sure we don't trigger more than one connection attempt on each pick if there is a ready subchannel, not adding latency if there is a ready subchannel, and converge to random. But I also considered only implementing the scanning forward for ready subchannel when we triggered an immediate connection attempt, which happens when either:
This ambiguity will disappear when all implementations have A61 implemented. This is planned for this quarter for Go, IIUC, but I imagine there may be reasons for some implementations to want to implement A76 before A61. My question is whether you think it's worth adding a note about it in this gRFC to lift this ambiguity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My inclination is that this logic is already complicated enough to understand, so I'd prefer to avoid muddying the waters by trying to do this without having first implemented the relevant parts of A61. From first glance, what you describe here seems reasonable, but I think I'd have to stare at it for a lot longer to convince myself it didn't have any potential pitfalls.
@dfawley, is it not feasible to implement just the ring_hash part of A61 before implementing this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discussed with the go team and we're going to implement delegating to pick first before this change.