Producer request timeout in container #1915

ts00193189 · 2025-02-04T03:49:33Z

Description

I used confluent-kafka-python to write a worker responsible for processing messages in a specific kafka topic. To implement EOS (Exactly Once Semantics), I enabled transactions and configured the producer and consumer settings accordingly. Since the tasks may involve a longer processing time, I set the producer's transaction.timeout.ms to 3 hours and also adjusted the broker's transaction.max.timeout.ms.

Currently, the worker runs fine on my local machine. However, when I run it in a local Docker container or in a production environment on Kubernetes, I often see the following messages:

%4|1738575666.358|REQTMOUT|rdkafka#producer-2| [thrd:TxnCoordinator]: TxnCoordinator/3: Timed out 0 in-flight, 0 retry-queued, 1 out-queue, 0 partially-sent requests
%3|1738575666.358|FAIL|rdkafka#producer-2| [thrd:TxnCoordinator]: TxnCoordinator: 10.11.123.134:9092: 1 request(s) timed out: disconnect (after 58468ms in state UP)
%4|1738575666.411|REQTMOUT|rdkafka#producer-2| [thrd:10.11.123.134:9092/bootstrap]: 10.11.123.134:9092/3: Timed out 0 in-flight, 0 retry-queued, 1 out-queue, 0 partially-sent requests
%3|1738575666.411|FAIL|rdkafka#producer-2| [thrd:10.11.123.134:9092/bootstrap]: 10.11.123.134:9092/3: 1 request(s) timed out: disconnect (after 26197ms in state UP)
%4|1738575667.420|REQTMOUT|rdkafka#producer-2| [thrd:10.11.123.134:9092/bootstrap]: 10.11.123.134:9092/3: Timed out 0 in-flight, 0 retry-queued, 1 out-queue, 0 partially-sent requests
%3|1738575667.420|FAIL|rdkafka#producer-2| [thrd:10.11.123.134:9092/bootstrap]: 10.11.123.134:9092/3: 1 request(s) timed out: disconnect (after 393ms in state UP, 1 identical error(s) suppressed)
%4|1738575667.924|REQTMOUT|rdkafka#producer-2| [thrd:10.11.123.132:9092/bootstrap]: 10.11.123.132:9092/1: Timed out 0 in-flight, 0 retry-queued, 1 out-queue, 0 partially-sent requests
%3|1738575667.924|FAIL|rdkafka#producer-2| [thrd:10.11.123.132:9092/bootstrap]: 10.11.123.132:9092/1: 1 request(s) timed out: disconnect (after 60053ms in state UP)
%4|1738575668.925|REQTMOUT|rdkafka#producer-2| [thrd:10.11.123.133:9092/bootstrap]: 10.11.123.133:9092/2: Timed out 0 in-flight, 0 retry-queued, 1 out-queue, 0 partially-sent requests
%3|1738575668.925|FAIL|rdkafka#producer-2| [thrd:10.11.123.133:9092/bootstrap]: 10.11.123.133:9092/2: 1 request(s) timed out: disconnect (after 486ms in state UP)
%4|1738575669.933|REQTMOUT|rdkafka#producer-2| [thrd:10.11.123.133:9092/bootstrap]: 10.11.123.133:9092/2: Timed out 0 in-flight, 0 retry-queued, 1 out-queue, 0 partially-sent requests
%3|1738575669.934|FAIL|rdkafka#producer-2| [thrd:10.11.123.133:9092/bootstrap]: 10.11.123.133:9092/2: 1 request(s) timed out: disconnect (after 994ms in state UP, 1 identical error(s) suppressed)
%4|1738575670.942|REQTMOUT|rdkafka#producer-2| [thrd:10.11.123.133:9092/bootstrap]: 10.11.123.133:9092/2: Timed out 0 in-flight, 0 retry-queued, 1 out-queue, 0 partially-sent requests
%4|1738575671.952|REQTMOUT|rdkafka#producer-2| [thrd:10.11.123.133:9092/bootstrap]: 10.11.123.133:9092/2: Timed out 0 in-flight, 0 retry-queued, 1 out-queue, 0 partially-sent requests

It seems that the producer is encountering timeout when sending requests. Based on my online research, the requests are eventually sent successfully due to Kafka's retry mechanism. However, this issue keeps recurring, happening at least once every 15 minutes. When I restart the worker on my local machine, these issues disappear.

Another strange observation is that when I reduce the transaction.timeout.ms to a lower value (e.g., 30 minutes in my tests), these timeout messages no longer appear. However, as far as I know, this setting should not affect whether a request times out.

Has anyone else encountered a similar issue or knows the reason behind this behavior?

Checklist

Please provide the following information:

confluent-kafka-python and librdkafka version (confluent_kafka.version('2.4.0') and confluent_kafka.libversion('2.4.0')):
Apache Kafka broker version: 3.9.0
Client configuration: {...}
Operating system: linux (k8s)、macOS (local machine)
Provide client logs (with 'debug': '..' as necessary)
Provide broker log excerpts
Critical issue

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Producer request timeout in container #1915

Producer request timeout in container #1915

ts00193189 commented Feb 4, 2025 •

edited

Loading

Producer request timeout in container #1915

Producer request timeout in container #1915

Comments

ts00193189 commented Feb 4, 2025 • edited Loading

Description

Checklist

ts00193189 commented Feb 4, 2025 •

edited

Loading