|
| 1 | += Scalability and performance of {serverlessproductname} Serving |
| 2 | +:compat-mode!: |
| 3 | +:description: Scalability and performance of {serverlessproductname} Serving |
| 4 | + |
| 5 | +== Introduction |
| 6 | + |
| 7 | +{serverlessproductname} consists of several different components which have different resource requirements and scaling behaviours. |
| 8 | +As some components, like the `controller` pods, are responsible to watch and react to `CustomResources` and continuously reconfigure the system, these components are called control-plane. |
| 9 | +Other components, like {serverlessproductname} Servings `activator` component, are directly involved in requests and response handling. These components are called data-plane. |
| 10 | +All of these components are horizontally and vertically scalable, but their resource requirements and configuration highly depends on the actual use-case. |
| 11 | + |
| 12 | +Thus, the following paragraphs are outlining a few relevant things to consider while scaling the system to handle increased usage. |
| 13 | + |
| 14 | +== Test context |
| 15 | + |
| 16 | +The following metrics and findings were recorded using the following test setup: |
| 17 | + |
| 18 | +* A cluster running {product-title} version 4.13 |
| 19 | +* The cluster running *4 worker nodes* in AWS with a machine type of `m6.xlarge` |
| 20 | +* {serverlessproductname} version 1.30 |
| 21 | + |
| 22 | +[NOTE] |
| 23 | +==== |
| 24 | +Please note, that the tests are continuously run on our continuous integration system to compare performance data |
| 25 | +between releases of {serverlessproductname}. |
| 26 | +==== |
| 27 | + |
| 28 | + |
| 29 | +== Overhead of {serverlessproductname} Serving |
| 30 | + |
| 31 | +As components of {serverlessproductname} Serving are part of the data-plane, requests from clients are routed through: |
| 32 | + |
| 33 | +* The ingress-gateway (Kourier or Service Mesh) |
| 34 | +* The `activator` component |
| 35 | +* The `queue-proxy` sidecar container in each Knative Service |
| 36 | + |
| 37 | +As these components introduce an additional hop in networking and do some additional work (like for example adding observability and request queuing), |
| 38 | +they come with some overhead. The following latency overheads have been measured: |
| 39 | + |
| 40 | +* Each additional network hop *adds 0.5-1ms* latency to a request. Note that the `activator` component is not always part of the data-plane depending on the current load of the Knative Service and if the Knative Service was scaled to zero before the request or not. |
| 41 | +* Depending on the payload size, each of those component consumes up to 1 vCPU of CPU for handling 2500 requests per second. |
| 42 | + |
| 43 | + |
| 44 | +== Known limitations of {serverlessproductname} Serving |
| 45 | + |
| 46 | +The maximum number of Knative Services that can be created using this configuration is *3,000*. |
| 47 | +This corresponds to the OpenShift Container Platform Kubernetes services limit of 10,000, since 1 Knative Service creates 3 Kubernetes services. |
| 48 | + |
| 49 | + |
| 50 | +== Scalability and performance of {serverlessproductname} Serving |
| 51 | + |
| 52 | +{serverlessproductname} Serving has to be scaled and configured based on the following parameters: |
| 53 | + |
| 54 | +* Number of Knative Services |
| 55 | +* Number of Revisions |
| 56 | +* Amount of concurrent requests in the system |
| 57 | +* Size of payloads of the requests |
| 58 | +* The startup-latency and response latency of the Knative Service added by the user's web application |
| 59 | +* Number of changes of the KnativeService `CustomResources` over time |
| 60 | + |
| 61 | +Scaling of {serverlessproductname} Serving is configured using the `KnativeServing` `CustomResource`. |
| 62 | + |
| 63 | + |
| 64 | +=== `KnativeServing` defaults |
| 65 | + |
| 66 | +Per default, {serverlessproductname} Serving is configured to run all components with high availability and with medium-sized CPU and memory requests/limits. |
| 67 | +This also means, that the `high-available` field in `KnativeServing` is automatically set to a value of 2 and all system components are scaled to two relicas. |
| 68 | +This set-up is suitable for medium-sized workload scenarios. These defaults have been tested with |
| 69 | + |
| 70 | +* 170 Knative Services |
| 71 | +* 1-2 Revisions per Knative Service |
| 72 | +* 89 test scenarios mainly focussed on testing the control-plane |
| 73 | +* 48 re-creating scenarios where Knative Services are deleted and re-created |
| 74 | +* 41 stable scenarios, in which requests are slowly but continuously sent to the system |
| 75 | + |
| 76 | +During these test cases, the system components effectively consumed: |
| 77 | + |
| 78 | +|=== |
| 79 | +| Component | Measured resources |
| 80 | + |
| 81 | +| Operator in project `openshift-serverless` |
| 82 | +| 1 GB Memory, 0.2 Cores of CPU |
| 83 | + |
| 84 | +| Serving components in project `knative-serving` |
| 85 | +| 5 GB Memory, 2.5 Cores of CPU |
| 86 | +|=== |
| 87 | + |
| 88 | +While the default set-up is suitable for medium-sized workloads, this might be over-sized for smaller set-ups or under-sized for high-workload scenarios. |
| 89 | +Please see the next sections for possible tuning options. |
| 90 | + |
| 91 | + |
| 92 | +=== Minimal requirements |
| 93 | + |
| 94 | +To configure {serverlessproductname} Serving for a minimal workload scenario, it is important to know the idle consumption of the system components. |
| 95 | + |
| 96 | +==== Idle consumption |
| 97 | +The idle consumption is dependent on the number of Knative Services. |
| 98 | +Please note, that for the idle consumption, only Memory is important. |
| 99 | +Relevant cycles of CPU are only used when Knative Services are changed or requests are sent to/from them. |
| 100 | +The following memory consumptions have been measured for the components in the `knative-serving` and `knative-serving-ingress` {product-title} projects: |
| 101 | + |
| 102 | +|=== |
| 103 | +| Component | 0 Services | 100 Services | 500 Service | 1000 Services |
| 104 | + |
| 105 | +| activator |
| 106 | +| 55Mi |
| 107 | +| 86Mi |
| 108 | +| 150Mi |
| 109 | +| 200Mi |
| 110 | + |
| 111 | +| autoscaler |
| 112 | +| 52Mi |
| 113 | +| 102Mi |
| 114 | +| 225Mi |
| 115 | +| 350Mi |
| 116 | + |
| 117 | +| controller |
| 118 | +| 100Mi |
| 119 | +| 135Mi |
| 120 | +| 250Mi |
| 121 | +| 400Mi |
| 122 | + |
| 123 | +| webhook |
| 124 | +| 60Mi |
| 125 | +| 60Mi |
| 126 | +| 60Mi |
| 127 | +| 60Mi |
| 128 | + |
| 129 | +| 3scale-kourier-gateway (1) |
| 130 | +| 20Mi |
| 131 | +| 60Mi |
| 132 | +| 190Mi |
| 133 | +| 330Mi |
| 134 | + |
| 135 | +| net-kourier-controller (1) |
| 136 | +| 90Mi |
| 137 | +| 170Mi |
| 138 | +| 340Mi |
| 139 | +| 430Mi |
| 140 | + |
| 141 | +| istio-ingressgateway (1) |
| 142 | +| 57Mi |
| 143 | +| 107Mi |
| 144 | +| 307Mi |
| 145 | +| 446Mi |
| 146 | + |
| 147 | +| net-istio-controller (1) |
| 148 | +| 60Mi |
| 149 | +| 152Mi |
| 150 | +| 350Mi |
| 151 | +| 504Mi |
| 152 | + |
| 153 | +|=== |
| 154 | +<1> Note: either `3scale-kourier-gateway` + `net-kourier-controller` or `istio-ingressgateway` + `net-istio-controller` are installed |
| 155 | + |
| 156 | + |
| 157 | +==== Configuring {serverlessproductname} Serving for minimal workloads |
| 158 | + |
| 159 | +To configure {serverlessproductname} Serving for minimal workloads, you can tune the `KnativeServing` `CustomResource`: |
| 160 | +[source,yaml] |
| 161 | +---- |
| 162 | +apiVersion: operator.knative.dev/v1beta1 |
| 163 | +kind: KnativeServing |
| 164 | +metadata: |
| 165 | + name: knative-serving |
| 166 | + namespace: knative-serving |
| 167 | +spec: |
| 168 | + high-availability: |
| 169 | + replicas: 1 <1> |
| 170 | +
|
| 171 | + workloads: |
| 172 | + - name: activator |
| 173 | + replicas: 2 <2> |
| 174 | + resources: |
| 175 | + - container: activator |
| 176 | + requests: |
| 177 | + cpu: 250m <3> |
| 178 | + memory: 60Mi <4> |
| 179 | + limits: |
| 180 | + cpu: 1000m |
| 181 | + memory: 600Mi |
| 182 | +
|
| 183 | + - name: controller |
| 184 | + replicas: 1 <6> |
| 185 | + resources: |
| 186 | + - container: controller |
| 187 | + requests: |
| 188 | + cpu: 10m |
| 189 | + memory: 100Mi <4> |
| 190 | + limits: <5> |
| 191 | + cpu: 200m |
| 192 | + memory: 300Mi |
| 193 | +
|
| 194 | + - name: webhook |
| 195 | + replicas: 1 <6> |
| 196 | + resources: |
| 197 | + - container: webhook |
| 198 | + requests: |
| 199 | + cpu: 100m <7> |
| 200 | + memory: 20Mi <4> |
| 201 | + limits: |
| 202 | + cpu: 200m |
| 203 | + memory: 200Mi |
| 204 | +
|
| 205 | + podDisruptionBudgets: <8> |
| 206 | + - name: activator-pdb |
| 207 | + minAvailable: 1 |
| 208 | + - name: webhook-pdb |
| 209 | + minAvailable: 1 |
| 210 | +---- |
| 211 | +<1> Setting this to 1 will scale all system components to one replica. |
| 212 | +<2> Activator should always be scaled to a minimum of two instances to avoid downtime. |
| 213 | +<3> Activator CPU requests should not be set lower than 250m, as a `HorizontalPodAutoscaler` will use this as a reference to scale up and down. |
| 214 | +<4> Adjust memory requests to the idle values from above. Also adjust memory limits according to your expected load (this might need custom testing to find the best values). |
| 215 | +<5> These limits are sufficient for a minimal-workload scenario, but they also might need adjustments depending on your concrete workload. |
| 216 | +<6> One webhook and one controller are sufficient for a minimal-workload scenario |
| 217 | +<7> Webhook CPU requests should not be set lower than 100m, as a `HorizontalPodAutoscaler` will use this as a reference to scale up and down. |
| 218 | +<8> Adjust the `PodDistruptionBudgets` to a value lower or equal to the `replicas`, to avoid problems during node maintenance. |
| 219 | + |
| 220 | + |
| 221 | +=== High-workload configuration |
| 222 | + |
| 223 | +To configure {serverlessproductname} Serving for a high-workload scenario the following findings are relevant: |
| 224 | + |
| 225 | +[NOTE] |
| 226 | +==== |
| 227 | +These findings have been tested with requests with a payload size of 0-32kb. |
| 228 | +The Knative Service backends used in those tests had a startup-latency between 0-10 seconds and response times between 0-5 seconds. |
| 229 | +==== |
| 230 | + |
| 231 | +* All data-plane components are mostly increasing CPU usage on higher requests and/or payload scenarios, so the CPU requests and limits have to be tested and potentially increased. |
| 232 | +* The `activator` component also might need more memory, when it has to buffer more or bigger request payloads, so the memory requests and limits might need to be increased as well. |
| 233 | +* One `activator` pod can handle *approximately 2500 requests per second* before it starts to increase latency and, at some point, leads to errors. |
| 234 | +* One `3scale-kourier-gateway` or `istio-ingressgateway` pod can also handle *approximately 2500 requests per second* before it starts to increase latency and, at some point, leads to errors. |
| 235 | +* Each of the data-plane components consumes up to 1 vCPU of CPU for handling 2500 requests per second, please note that this highly depends on the payload size and the response times of the Knative Service backend. |
| 236 | + |
| 237 | +[IMPORTANT] |
| 238 | +==== |
| 239 | +Please note, that *fast startup* and *fast response-times* of your Knative Service user workloads are *critical* for good performance of the overall system. |
| 240 | +As {serverlessproductname} Serving components are buffering incoming requests when the Knative Service user backend is scaling-up or request concurrency has reached its capacity. |
| 241 | +If your Knative Service user workload introduce long startup- or request-latency, at some point this will either overload the `activator` component (only if the CPU + memory configuration is too low) or leads to errors for the calling clients. |
| 242 | +==== |
| 243 | + |
| 244 | +To fine-tune your {serverlessproductname} installation, use the above findings combined with your own test results to configure the `KnativeServing` `CustomResource`: |
| 245 | + |
| 246 | +[source,yaml] |
| 247 | +---- |
| 248 | +apiVersion: operator.knative.dev/v1beta1 |
| 249 | +kind: KnativeServing |
| 250 | +metadata: |
| 251 | + name: knative-serving |
| 252 | + namespace: knative-serving |
| 253 | +spec: |
| 254 | + high-availability: |
| 255 | + replicas: 2 <1> |
| 256 | +
|
| 257 | + workloads: |
| 258 | + - name: component-name <2> |
| 259 | + replicas: 2 <2> |
| 260 | + resources: |
| 261 | + - container: container-name |
| 262 | + requests: |
| 263 | + cpu: <3> |
| 264 | + memory: <3> |
| 265 | + limits: |
| 266 | + cpu: <3> |
| 267 | + memory: <3> |
| 268 | +
|
| 269 | + podDisruptionBudgets: <4> |
| 270 | + - name: name-of-pod-disruption-budget |
| 271 | + minAvailable: 1 |
| 272 | +---- |
| 273 | +<1> Set this to at least 2, to make sure you have always at least two instances of every component running. You can also use `workloads` to override the replicas for certain components. |
| 274 | +<2> Use the `workloads` list to configure specific components. Use the `deployment` name of the component (like `activator`, `autoscaler`, `autoscaler-hpa`, `controller`, `webhook`, `net-kourier-controller`, `3scale-kourier-gateway`, `net-istio-controller`) and set the `replicas`. |
| 275 | +<3> Set the requested and limited CPU + Memory according to at least the idle consumption (see above) while also taking the above findings and your own test results into consideration. |
| 276 | +<4> Adjust the `PodDistruptionBudgets` to a value lower or equal to the `replicas`, to avoid problems during node maintenance. The default `minAvailable` is set to `1`, so if you increase the desired replicas, make sure to also increase `minAvailable`. |
| 277 | + |
| 278 | +[IMPORTANT] |
| 279 | +==== |
| 280 | +As each environment is highly specific, it is essential to test and find your own ideal configuration. |
| 281 | +Please use the monitoring and alerting functionality of {product-title} to continuously monitor your actual resource consumption and make adjustments if needed. |
| 282 | +
|
| 283 | +Also keep in mind, that if you are using the {serverlessproductname} and {smproductshortname} integration, additional CPU overhead is added by the `istio-proxy` sidecar containers. |
| 284 | +For more information on this, see the {smproductshortname} documentation. |
| 285 | +==== |
| 286 | + |
0 commit comments