Skip to content

KIC wrong configuration after all kube nodes restarted #7205

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
1 task done
samaitegal opened this issue Mar 5, 2025 · 2 comments
Open
1 task done

KIC wrong configuration after all kube nodes restarted #7205

samaitegal opened this issue Mar 5, 2025 · 2 comments
Labels
bug Something isn't working

Comments

@samaitegal
Copy link

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

On K8S platform with 1 control-plane and 3 data-plane.
KIC and kong gateway are installed in their respective pods.
time to time during our failover test (all data-plane are restarted, one by one, or 3 as once), KIC send incomplete configuration to gateway.

get incomplete configuration from KIC:
[ ]# curl localhost:10256/debug/config/successful | jq
{
"hash": "",
"config": {
"_format_version": "3.0",
"_info": {
"select_tags": [
"managed-by-ingress-controller"
],
"defaults": {}
},
"consumers": [
{
"id": "1e666ac2-b19a-5f87-bc4e-f033f5353307",
"username": "xxxx",
"tags": [
"k8s-name:xxxxxxx",
"k8s-namespace:xxxxx",
"k8s-kind:KongConsumer",
"k8s-uid:199b2c30-f717-4693-8036-aae76c583c79",
"k8s-group:configuration.konghq.com",
"k8s-version:v1"
],
"keyauth_credentials": [
{
"key": "xxxx",
"tags": [
"k8s-name:xxxxxx-token",
"k8s-namespace:xxxx",
"k8s-kind:Secret",
"k8s-uid:7fe3119c-5734-431e-bad9-593155c8fb04",
"k8s-version:v1"
]
}
]
}
]
}
}

When configuration correctly sent to gateway, same request show:
[ ]# curl localhos:10256/debug/config/successful | jq
{
"hash": "",
"config": {
"_format_version": "3.0",
"_info": {
"select_tags": [
"managed-by-ingress-controller"
],
"defaults": {}
},
"services": [
{
"connect_timeout": 60000,
"host": "httproute.xxxxxx.0",
"id": "0cf8d881-4f0a-5cbb-ab9a-eb0025d6b492",
"name": "httproute.xxxxxxxx.0",
"port": 80,
"protocol": "http",
"read_timeout": 60000,
"retries": 5,
"write_timeout": 60000,
"tags": [
"k8s-name:xxxxxx",
"k8s-namespace:xxxxxx",
"k8s-kind:Service",
"k8s-uid:63873666-c60b-4075-b8d1-572e785d9bd8",
"k8s-version:v1"
],
"routes": [
{
"id": "7e70cb3a-bc3f-532a-a7db-02163b1093ca",
"name": "httproute.xxxxxxx.0.0",
"methods": [
"GET",
"PATCH",
"PUT",
"DELETE"
],
"paths": [
"/xx/users$",
"/xx/users/"
],
"path_handling": "v0",
"preserve_host": true,
"protocols": [
"https"
],
"strip_path": false,
"tags": [
"k8s-name:xxxxx",
"k8s-namespace:xxxxxxxxxx",
"k8s-kind:HTTPRoute",
"k8s-uid:7ad6fc4e-cb11-47af-8b05-043b8d95f33f",
"k8s-group:gateway.networking.k8s.io",
"k8s-version:v1"
],
"https_redirect_status_code": 426
}
]
},
{
"connect_timeout": 60000,
"host": "httproute.yyyyyyyyyyyyyyy.0",
"id": "732153b4-28fc-5c0a-9f21-1a18652ec017",
"name": "httproute.yyyyyyyyyyyyyy.0",
"port": 80,
"protocol": "http",
"read_timeout": 60000,
"retries": 5,
"write_timeout": 60000,
"tags": [
"k8s-name:yyyy",
"k8s-namespace:yyyyyyy",
"k8s-kind:Service",
"k8s-uid:86c84e2e-33fc-4006-bd03-b0caad32836f",
"k8s-version:v1"
],
"routes": [
{
"id": "34c80be2-8d6d-5209-8e67-040666be9406",
"name": "httproute.yyyyyyyy.0.0",
"methods": [
"GET",
"POST",
"PUT",
"DELETE"
],
"paths": [
"
/yyy/unigys$",
"/yyy/unigys/"
],
"path_handling": "v0",
"preserve_host": true,
"protocols": [
"https"
],
"strip_path": false,
"tags": [
"k8s-name:yyyyyyyyyyyyyyy",
"k8s-namespace:yyyyyyyy",
"k8s-kind:HTTPRoute",
"k8s-uid:6d414529-573d-40a1-adb7-8f8b1b06c70a",
"k8s-group:gateway.networking.k8s.io",
"k8s-version:v1"
],
"https_redirect_status_code": 426
}
]
},
{ all Services correctly configured },
...

Expected Behavior

Always a correct configuration sent to gateway after a node failover

Steps To Reproduce

on k8s 1.26.5
1 control-plane/3 data-plane (kvm)

alternatively power off/ power on all data-plane wait all pods in correct status and test route configuration in KIC

The configuration problem is an intermittent issue

Kong Ingress Controller version

v:3.4

Kubernetes version

Anything else?

No response

@samaitegal samaitegal added the bug Something isn't working label Mar 5, 2025
@randmonkey
Copy link
Contributor

@samaitegal TL;DR: There is a latency for KIC to load all the controlled resources but we cannot predict the latency now.
Are there many resources (ingresses, services, pods, ...) controlled by KIC in your cluster? Since the client cannot load all the reconciled resources at once, KIC may not have all the resources in the cache when it is restarted for a short time. However, based on k8s's eventual consistency, all the resources will be finally configured after several rounds of reconciliation.

How many resources (ingresses/services) are in your cluster approximately? And how long after the restart of KIC did you observe the behavior? If there are ~1000 ingresses/services, it may take a few rounds (3s per round by default) of reconciliation to wait for the cache inside the controller to sync with the API server.

@randmonkey
Copy link
Contributor

@samaitegal Is there any update on this issue now? If no more additional context is given in 7 days, we will close the issue as "won't fix".

@randmonkey randmonkey closed this as not planned Won't fix, can't repro, duplicate, stale Apr 14, 2025
@randmonkey randmonkey reopened this Apr 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants