Skip to content

Producer request timeout in container #1915

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
3 of 7 tasks
ts00193189 opened this issue Feb 4, 2025 · 0 comments
Open
3 of 7 tasks

Producer request timeout in container #1915

ts00193189 opened this issue Feb 4, 2025 · 0 comments

Comments

@ts00193189
Copy link

ts00193189 commented Feb 4, 2025

Description

I used confluent-kafka-python to write a worker responsible for processing messages in a specific kafka topic. To implement EOS (Exactly Once Semantics), I enabled transactions and configured the producer and consumer settings accordingly. Since the tasks may involve a longer processing time, I set the producer's transaction.timeout.ms to 3 hours and also adjusted the broker's transaction.max.timeout.ms.

Currently, the worker runs fine on my local machine. However, when I run it in a local Docker container or in a production environment on Kubernetes, I often see the following messages:

%4|1738575666.358|REQTMOUT|rdkafka#producer-2| [thrd:TxnCoordinator]: TxnCoordinator/3: Timed out 0 in-flight, 0 retry-queued, 1 out-queue, 0 partially-sent requests
%3|1738575666.358|FAIL|rdkafka#producer-2| [thrd:TxnCoordinator]: TxnCoordinator: 10.11.123.134:9092: 1 request(s) timed out: disconnect (after 58468ms in state UP)
%4|1738575666.411|REQTMOUT|rdkafka#producer-2| [thrd:10.11.123.134:9092/bootstrap]: 10.11.123.134:9092/3: Timed out 0 in-flight, 0 retry-queued, 1 out-queue, 0 partially-sent requests
%3|1738575666.411|FAIL|rdkafka#producer-2| [thrd:10.11.123.134:9092/bootstrap]: 10.11.123.134:9092/3: 1 request(s) timed out: disconnect (after 26197ms in state UP)
%4|1738575667.420|REQTMOUT|rdkafka#producer-2| [thrd:10.11.123.134:9092/bootstrap]: 10.11.123.134:9092/3: Timed out 0 in-flight, 0 retry-queued, 1 out-queue, 0 partially-sent requests
%3|1738575667.420|FAIL|rdkafka#producer-2| [thrd:10.11.123.134:9092/bootstrap]: 10.11.123.134:9092/3: 1 request(s) timed out: disconnect (after 393ms in state UP, 1 identical error(s) suppressed)
%4|1738575667.924|REQTMOUT|rdkafka#producer-2| [thrd:10.11.123.132:9092/bootstrap]: 10.11.123.132:9092/1: Timed out 0 in-flight, 0 retry-queued, 1 out-queue, 0 partially-sent requests
%3|1738575667.924|FAIL|rdkafka#producer-2| [thrd:10.11.123.132:9092/bootstrap]: 10.11.123.132:9092/1: 1 request(s) timed out: disconnect (after 60053ms in state UP)
%4|1738575668.925|REQTMOUT|rdkafka#producer-2| [thrd:10.11.123.133:9092/bootstrap]: 10.11.123.133:9092/2: Timed out 0 in-flight, 0 retry-queued, 1 out-queue, 0 partially-sent requests
%3|1738575668.925|FAIL|rdkafka#producer-2| [thrd:10.11.123.133:9092/bootstrap]: 10.11.123.133:9092/2: 1 request(s) timed out: disconnect (after 486ms in state UP)
%4|1738575669.933|REQTMOUT|rdkafka#producer-2| [thrd:10.11.123.133:9092/bootstrap]: 10.11.123.133:9092/2: Timed out 0 in-flight, 0 retry-queued, 1 out-queue, 0 partially-sent requests
%3|1738575669.934|FAIL|rdkafka#producer-2| [thrd:10.11.123.133:9092/bootstrap]: 10.11.123.133:9092/2: 1 request(s) timed out: disconnect (after 994ms in state UP, 1 identical error(s) suppressed)
%4|1738575670.942|REQTMOUT|rdkafka#producer-2| [thrd:10.11.123.133:9092/bootstrap]: 10.11.123.133:9092/2: Timed out 0 in-flight, 0 retry-queued, 1 out-queue, 0 partially-sent requests
%4|1738575671.952|REQTMOUT|rdkafka#producer-2| [thrd:10.11.123.133:9092/bootstrap]: 10.11.123.133:9092/2: Timed out 0 in-flight, 0 retry-queued, 1 out-queue, 0 partially-sent requests

It seems that the producer is encountering timeout when sending requests. Based on my online research, the requests are eventually sent successfully due to Kafka's retry mechanism. However, this issue keeps recurring, happening at least once every 15 minutes. When I restart the worker on my local machine, these issues disappear.

Another strange observation is that when I reduce the transaction.timeout.ms to a lower value (e.g., 30 minutes in my tests), these timeout messages no longer appear. However, as far as I know, this setting should not affect whether a request times out.

Has anyone else encountered a similar issue or knows the reason behind this behavior?

Checklist

Please provide the following information:

  • confluent-kafka-python and librdkafka version (confluent_kafka.version('2.4.0') and confluent_kafka.libversion('2.4.0')):

  • Apache Kafka broker version: 3.9.0

  • Client configuration: {...}

  • Operating system: linux (k8s)、macOS (local machine)

  • Provide client logs (with 'debug': '..' as necessary)

  • Provide broker log excerpts

  • Critical issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant