The Spring Kafka library provides error handling via a well-established exponential back-off mechanism. However, if the error is HTTP 429 “Too Many Requests” (that is, rate limiting), ordinary exponential increases in the delay between retries are a bit like a brute-force attack: we either waste resources by retrying too early, or waste time by waiting too long. It would be better to wait exactly as long as the server tells us to.
In the following sections, I will show how to approach the implementation. In short, it boils down to writing a single, uncomplicated class.
Spis treści
HTTP 429
The HTTP 429 status, known as “Too Many Requests”, is intended to be used by typical rate-limiting implementations. When we send too many requests, they start getting rejected with this status, along with information in the Retry-After header telling us how long we should wait before the next attempt. This is defined in RFC 7231.
Dependencies
In short: there are two of them.
No external libraries are required for the implementation. Since the article is about the spring-kafka library, I take that dependency as a given.
Because we are dealing with rate limiting when sending requests to an external service, we also need a library to perform those requests. Staying within the Spring ecosystem, I assume that this library is spring-web.
spring-web can be integrated with an external library such as Apache Http Client, or it can use the HTTP client built into the JVM. I am skipping the question of which concrete implementation is used, as it does not affect the implementation approach presented in this article.
Spring also provides a reactive client. In that case, instead of spring-web, the second dependency is spring-webflux.
Integrating with the Spring Kafka library
To shorten the Handling Exceptions section of the documentation: in practice, error handling with exponential backoff is usually configured more or less like this:
@Bean
fun kafkaListenerContainerFactory(consumerFactory: ConsumerFactory<String, String>) =
ConcurrentKafkaListenerContainerFactory<String, String>().apply {
setConsumerFactory(consumerFactory)
setCommonErrorHandler(
DefaultErrorHandler(
ExponentialBackOffWithMaxRetries(6).apply {
initialInterval = 1_000
multiplier = 2.0
maxInterval = 120_000
}
)
)
}
The first step is to choose a constructor of the DefaultErrorHandler class that takes more arguments.
val backOffHandler = TooManyRequestsBackOffHandler(DefaultBackOffHandler()) val defaultErrorHandler = DefaultErrorHandler(null, backOff, backOffHandler)
Here:
backOffisExponentialBackOffWithMaxRetriesthat is already familiar to youDefaultBackOffHandleris a class provided by the library; it is responsible for applying the back-off (it does not calculate anything, it does roughly whatThread.sleep()does)TooManyRequestsBackOffHandleris our own class, which will pass along a modified pause duration
The interface that we need to implement is not particularly complicated.
class TooManyRequestsBackOffHandler(private val delegate: BackOffHandler) : BackOffHandler {
override fun onNextBackOff(container: MessageListenerContainer?, exception: Exception, nextBackOff: Long) {
delegate.onNextBackOff(
container,
exception,
maxOf(
// unwrap ListenerExecutionFailedException
backOffFromException(exception.cause!!),
nextBackOff
)
)
}
}
Integrating with the HTTP client
If we use spring-web, one nice thing is that regardless of whether we use the old RestTemplate class or the newer RestClient, and regardless of whether the underlying implementation is the JDK HTTP client, Apache HttpClient, or OkHttp, the same exceptions are always thrown—those related to org.springframework.web.client.HttpClientErrorException.
Our implementation can therefore be as simple as:
import org.springframework.web.client.HttpClientErrorException.TooManyRequests
private fun backOffFromException(exception: Throwable): Long {
if (exception !is TooManyRequests)
return Long.MIN_VALUE
return backOffFromHeaders(exception.responseHeaders!!)
}
In the reactive WebClient from spring-webflux, the exception hierarchy is separate (it is rooted at org.springframework.web.reactive.function.client.WebClientResponseException), but the principle of operation is exactly the same.
Parsing the Retry-After header
One complication from the client’s perspective is that the RFC allows the server to send two completely different forms of the header:
Retry-After: Fri, 31 Dec 1999 23:59:59 GMT
Retry-After: 120
While the second form poses no difficulty (we simply convert the seconds from the header into the milliseconds expected by the Spring Kafka library), the date format used in the first type of header requires a closer look.
The RFC describes it as an “HTTP-date”. Unfortunately, in the Java ecosystem this is not a date format supported out of the box by the standard library, and finding a way to handle it in Spring is not obvious either.
Looking through the sources of the HttpHeaders class reveals something that the JavaDoc does not mention: the getFirstDate() method is able to parse this header. Here is the suggested usage:
private fun backOffFromHeaders(httpHeaders: HttpHeaders): Long {
val retryAfterString = httpHeaders.getFirst(HttpHeaders.RETRY_AFTER)
?: return Long.MIN_VALUE
retryAfterString.toLongOrNull()?.let {
return it * 1000 + randomJitterMillis()
}
return try {
httpHeaders.getFirstDate(HttpHeaders.RETRY_AFTER) - System.currentTimeMillis() + randomJitterMillis()
} catch (e: IllegalArgumentException) {
Long.MIN_VALUE
}
}
Jitter and Retry Storm
The final micro-optimization is the randomJitterMillis function. Its purpose is to avoid the „Thundering Herd”/„Retry Storm” phenomenon.
The problem arises when we have many threads and all of them receive the same HTTP 429 “retry after 14:31:00”. If every thread waits until exactly 14:31:00 and then sends a request, several things can go wrong. If one thread consistently fires slightly earlier than another, it may even starve the slower one by stealing tokens from the rate limiter. Network and memory resources experience a sudden spike in usage, and then sit idle afterwards.
Jitter—that is, a small random “noise”—can smooth out such a spike by spreading requests over a wider time window. I will not go into a discussion of the best way to do this, or whether a percentage-based or absolute approach is better. For the purposes of this article, let’s take something simple:
private fun randomJitterMillis(): Long =
Random.nextLong(100, 2_500)
Summary
We are able to combine the good properties of the exponential backoff mechanism built into Spring Kafka (a maximum number of attempts, with retries spaced further and further apart) with proper handling of standard rate-limiting headers (instead of blindly guessing the delay, we use the information provided by the server).
Why is nothing like this built into Spring Kafka? I can only speculate. A good rule when designing libraries is to minimize dependencies. Spring Kafka does not include code related to making REST calls. It would be odd for the authors to add a dependency on spring-web solely to handle HTTP errors. In addition, there are different ways of performing requests: there is also spring-webflux with a separate exception hierarchy, and someone might skip the Spring libraries altogether and use an HTTP client directly with its own error handling. It is therefore not surprising that the code that ties rate limiting together with Spring Kafka’s back-off mechanism has to be written manually.