Skip to content

help request: unexpected behavior in health check marking nodes as unhealthy #12116

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
edda25 opened this issue Apr 2, 2025 · 2 comments
Open
Labels
question label for questions asked by users

Comments

@edda25
Copy link

edda25 commented Apr 2, 2025

Description

I configured an upstream with two nodes and I am testing that the health check works. If queried directly, one of the nodes responds with a 200 OK status code and the other one with a 404 for the moment. When no health check is enabled, the upstream correctly distributes requests to both nodes. However, after enabling the health check, both nodes are marked as unhealthy: in particular, the node which responds 404 is marked as unhealthy due to tcp_failure, but shouldn’t it be marked as such due to http_failure? The service is up and running, it is simply responding with a 404 status code. Moreover, the same exact node is marked as healthy in another upstream configuration, where again it should be marked as an unhealthy due to http_failure. How is this possible? Am I misjudging something?

Here is the upstream configuration:

{
"nodes": [
{
"host": "webservice1",
"port": xxx,
"weight": 1
},
{
"host": "webservice2",
"port": yyy,
"weight": 1
}
],
"timeout": {
"connect": 6,
"send": 6,
"read": 30
},
"type": "roundrobin",
"checks": {
"active": {
"concurrency": 10,
"healthy": {
"http_statuses": [
200,
302
],
"interval": 10,
"successes": 2
},
"http_path": "/status",
"https_verify_certificate": false,
"timeout": 5,
"type": "https",
"unhealthy": {
"http_failures": 5,
"http_statuses": [
429,
404,
500,
501,
502,
503,
504,
505
],
"interval": 10,
"tcp_failures": 2,
"timeouts": 3
}
}
},
"hash_on": "vars",
"scheme": "https",
"pass_host": "node",
"name": "upstream1",
"keepalive_pool": {
"idle_timeout": 60,
"requests": 1000,
"size": 320
}
}

Here is what Control API shows:
{
"name": "/apisix/upstreams/1",
"type": "https",
"nodes": [
{
"hostname": "webservice1",
"counter": {
"http_failure": 5,
"success": 0,
"timeout_failure": 0,
"tcp_failure": 0
},
"status": "unhealthy"
},
{
"hostname": "webservice2",
"counter": {
"http_failure": 0,
"success": 0,
"timeout_failure": 0,
"tcp_failure": 2
},
"status": "unhealthy"
}
]
},
{
"name": "/apisix/upstreams/2",
"type": "https",
"nodes": [
{
"hostname": "webservice3",
"counter": {
"http_failure": 0,
"success": 0,
"timeout_failure": 0,
"tcp_failure": 0
},
"status": "healthy"
},
{
"hostname": "webservice2",
"counter": {
"http_failure": 0,
"success": 0,
"timeout_failure": 0,
"tcp_failure": 0
},
"status": "healthy"
}
]
},

Expected Behavior:
• Nodes responding with 200 OK should not be marked as unhealthy.
• A node responding with 404 should be marked unhealthy due to http_failure, not tcp_failure.
• A node marked as unhealthy in one upstream should not be marked as healthy in another upstream if the conditions are the same.

Is this a bug or am I missing something in the configuration? Any suggestion would be greatly appreciated.

Environment

• APISIX version: 3.10.0
• APISIX Dashboard version: 3.0.1
• Operating system: Linux

@dosubot dosubot bot added the question label for questions asked by users label Apr 2, 2025
@Baoyuantop
Copy link
Contributor

I see that you have configured http_path: /status, which means that the health check sends a request for this path to the upstream, did you confirm that the upstream can respond to this request properly?

@edda25
Copy link
Author

edda25 commented Apr 3, 2025

Yes, I confirm that both nodes respond at the endpoint indicated in the healthcheck as I expect: one with 200 OK, the other one with 404.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question label for questions asked by users
Projects
Status: 📋 Backlog
Development

No branches or pull requests

2 participants