Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BlazeClient's idleTimeout works differently than BlazeServer's idleTimeout #635

Open
RafalSumislawski opened this issue Apr 30, 2021 · 10 comments

Comments

@RafalSumislawski
Copy link
Member

Both BlazeClient and BlazeServer offer an idleTimeout configuration option. The option has the same name in both, and shares implementation (IdleTimeoutStage) but from user's perspective it behaves differently.

In BlazeClient the idleTimeout is active only when a request is in-flight. It may kick in if the server takes too long to respond (unless other timeout does it first) or when the server starts sending a response and then pauses for too long. An idle (not borrowed from the pool) connection will remain open indefinitly as the idleTimeout is deactivated when a connection is returned to the pool. This may contradict users assumptions about the meaning of the word "idle" in the context of an HTTP client.

In BlazeServer the idleTimeout is always active, and it may terminate connections that have an in-flight request but nothing happened for too long, as well as connections which are kept open but unused.

I see this inconsistency as a bad developer experience. In addition to that I see the lack of mechanism for evicting unused connections from the client's connection pool, as a missing feature and potential resource leak.

@rossabaker
Copy link
Member

By way of comparison, there is a readTimeout and a pooledConnectionIdleTimeout in AHC. Jetty client has just an idleTimeout, and it's not clear which definition it means.

@RafalSumislawski
Copy link
Member Author

http4s/http4s#3700 seems to be related. Maybe the can be solved together.

@RafalSumislawski
Copy link
Member Author

#675 also seems related

@RafalSumislawski
Copy link
Member Author

For reference akka-http has a similar problem, and there's a PR adding a second kind of timeout to http client: akka/akka-http#3816

@rossabaker
Copy link
Member

So if we added a keepAliveTimeout and made idleTimeout behave like blaze-client, it seems we'd be in a good place?

@RafalSumislawski
Copy link
Member Author

I'm not convinced what we should do with the blaze-server idleTimeout (or if we should do anything with it).

What I am currently convinced to is that the changes in akka-http-client go in the right direction and we should do the same in blaze-client. It should have two timeouts:

  • an idleTimeout that is active while the connection is being used. It's main purpose being to prevent getting stuck in this state. The current implementation does exactly that.
  • and a new keepAliveTimeout which would be active while the connection sits in the pool.

While this doesn't solve the user experience problem I stated in the title of this issue "BlazeClient's idleTimeout works differently than BlazeServer's idleTimeout". I think it solves the the more serious technical problems of:

  • being unable to evict connections from the pool before server closes them, and therefore risking a race between client using the connection and server closing it due to timeout.
  • being able to evict unused connection from the pool in order to free up resources

P.S. I feel a need to on more client-side timeout, sort of a maximumLifespan, that would be useful in forcing a rebalancing of connections though a load-balancer. But that's a story for another ticket.

@rossabaker
Copy link
Member

We had an issue at $WORK with a server that used round-robin DNS interacting badly with the JVM's DNS caching. We filled the pool against one cached instance. It was proposed a max lifespan would help that, but I think it would expire those connections at once and then overload another instance as it replenished the connections within a new cache window. A maximum lifespan with jitter could have helped us here after the first wave. But that's probably solving the wrong problem: we needed a different load balancing strategy.

I like your ideas here.

@mr-git
Copy link

mr-git commented Oct 8, 2021

is there a plan to make this timeout available and exposed? we were getting org.http4s.blaze.pipeline.Command$EOF$: EOF regularly on 0.21.28 and 0.22.4, we updated to 0.22.6 now, but I haven't seen any change, which would suggest that internal pool gets refreshed or cleaned up before server closes connections.

@rossabaker
Copy link
Member

No, there's not yet anything like the keepAliveTimeout implemented, but I'm sure that would be exposed and configurable once implemented.

@rossabaker rossabaker transferred this issue from http4s/http4s May 24, 2022
@mr-git
Copy link

mr-git commented Aug 9, 2022

 /** Time a connection can be idle and still be borrowed.  Helps deal
    * with connections that are closed while idling in the pool for an
    * extended period.  `Duration.Inf` means no timeout.
    */
  def withMaxIdleDuration(maxIdleDuration: Duration = maxIdleDuration): BlazeClientBuilder[F] =
    copy(maxIdleDuration = maxIdleDuration)

Is this the new and correct timeout, which should remove connections from pool?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants