Skip to content

Commit 19a9cc5

Browse files
authored
Merge pull request #98 from ReToCode/serving-performance
add a new page for Serving scaling & performance
2 parents 62e06d3 + 8a568f0 commit 19a9cc5

File tree

2 files changed

+287
-0
lines changed

2 files changed

+287
-0
lines changed

modules/ROOT/nav.adoc

+1
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@
77
*** xref:serverless:service-mesh/eventing-service-mesh-sinkbinding.adoc[Eventing: Using SinkBinding with OpenShift Service Mesh]
88
** Serving
99
*** xref:serverless:serving/serving-with-ingress-sharding.adoc[Use Serving with OpenShift ingress sharding]
10+
*** xref:serverless:serving/scaleability-and-performance-of-serving.adoc[Scalability and performance of {serverlessproductname} Serving]
1011
* Serverless Logic
1112
** xref:serverless-logic:about.adoc[About OpenShift Serverless Logic]
1213
** User Guides
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,286 @@
1+
= Scalability and performance of {serverlessproductname} Serving
2+
:compat-mode!:
3+
:description: Scalability and performance of {serverlessproductname} Serving
4+
5+
== Introduction
6+
7+
{serverlessproductname} consists of several different components which have different resource requirements and scaling behaviours.
8+
As some components, like the `controller` pods, are responsible to watch and react to `CustomResources` and continuously reconfigure the system, these components are called control-plane.
9+
Other components, like {serverlessproductname} Servings `activator` component, are directly involved in requests and response handling. These components are called data-plane.
10+
All of these components are horizontally and vertically scalable, but their resource requirements and configuration highly depends on the actual use-case.
11+
12+
Thus, the following paragraphs are outlining a few relevant things to consider while scaling the system to handle increased usage.
13+
14+
== Test context
15+
16+
The following metrics and findings were recorded using the following test setup:
17+
18+
* A cluster running {product-title} version 4.13
19+
* The cluster running *4 worker nodes* in AWS with a machine type of `m6.xlarge`
20+
* {serverlessproductname} version 1.30
21+
22+
[NOTE]
23+
====
24+
Please note, that the tests are continuously run on our continuous integration system to compare performance data
25+
between releases of {serverlessproductname}.
26+
====
27+
28+
29+
== Overhead of {serverlessproductname} Serving
30+
31+
As components of {serverlessproductname} Serving are part of the data-plane, requests from clients are routed through:
32+
33+
* The ingress-gateway (Kourier or Service Mesh)
34+
* The `activator` component
35+
* The `queue-proxy` sidecar container in each Knative Service
36+
37+
As these components introduce an additional hop in networking and do some additional work (like for example adding observability and request queuing),
38+
they come with some overhead. The following latency overheads have been measured:
39+
40+
* Each additional network hop *adds 0.5-1ms* latency to a request. Note that the `activator` component is not always part of the data-plane depending on the current load of the Knative Service and if the Knative Service was scaled to zero before the request or not.
41+
* Depending on the payload size, each of those component consumes up to 1 vCPU of CPU for handling 2500 requests per second.
42+
43+
44+
== Known limitations of {serverlessproductname} Serving
45+
46+
The maximum number of Knative Services that can be created using this configuration is *3,000*.
47+
This corresponds to the OpenShift Container Platform Kubernetes services limit of 10,000, since 1 Knative Service creates 3 Kubernetes services.
48+
49+
50+
== Scalability and performance of {serverlessproductname} Serving
51+
52+
{serverlessproductname} Serving has to be scaled and configured based on the following parameters:
53+
54+
* Number of Knative Services
55+
* Number of Revisions
56+
* Amount of concurrent requests in the system
57+
* Size of payloads of the requests
58+
* The startup-latency and response latency of the Knative Service added by the user's web application
59+
* Number of changes of the KnativeService `CustomResources` over time
60+
61+
Scaling of {serverlessproductname} Serving is configured using the `KnativeServing` `CustomResource`.
62+
63+
64+
=== `KnativeServing` defaults
65+
66+
Per default, {serverlessproductname} Serving is configured to run all components with high availability and with medium-sized CPU and memory requests/limits.
67+
This also means, that the `high-available` field in `KnativeServing` is automatically set to a value of 2 and all system components are scaled to two relicas.
68+
This set-up is suitable for medium-sized workload scenarios. These defaults have been tested with
69+
70+
* 170 Knative Services
71+
* 1-2 Revisions per Knative Service
72+
* 89 test scenarios mainly focussed on testing the control-plane
73+
* 48 re-creating scenarios where Knative Services are deleted and re-created
74+
* 41 stable scenarios, in which requests are slowly but continuously sent to the system
75+
76+
During these test cases, the system components effectively consumed:
77+
78+
|===
79+
| Component | Measured resources
80+
81+
| Operator in project `openshift-serverless`
82+
| 1 GB Memory, 0.2 Cores of CPU
83+
84+
| Serving components in project `knative-serving`
85+
| 5 GB Memory, 2.5 Cores of CPU
86+
|===
87+
88+
While the default set-up is suitable for medium-sized workloads, this might be over-sized for smaller set-ups or under-sized for high-workload scenarios.
89+
Please see the next sections for possible tuning options.
90+
91+
92+
=== Minimal requirements
93+
94+
To configure {serverlessproductname} Serving for a minimal workload scenario, it is important to know the idle consumption of the system components.
95+
96+
==== Idle consumption
97+
The idle consumption is dependent on the number of Knative Services.
98+
Please note, that for the idle consumption, only Memory is important.
99+
Relevant cycles of CPU are only used when Knative Services are changed or requests are sent to/from them.
100+
The following memory consumptions have been measured for the components in the `knative-serving` and `knative-serving-ingress` {product-title} projects:
101+
102+
|===
103+
| Component | 0 Services | 100 Services | 500 Service | 1000 Services
104+
105+
| activator
106+
| 55Mi
107+
| 86Mi
108+
| 150Mi
109+
| 200Mi
110+
111+
| autoscaler
112+
| 52Mi
113+
| 102Mi
114+
| 225Mi
115+
| 350Mi
116+
117+
| controller
118+
| 100Mi
119+
| 135Mi
120+
| 250Mi
121+
| 400Mi
122+
123+
| webhook
124+
| 60Mi
125+
| 60Mi
126+
| 60Mi
127+
| 60Mi
128+
129+
| 3scale-kourier-gateway (1)
130+
| 20Mi
131+
| 60Mi
132+
| 190Mi
133+
| 330Mi
134+
135+
| net-kourier-controller (1)
136+
| 90Mi
137+
| 170Mi
138+
| 340Mi
139+
| 430Mi
140+
141+
| istio-ingressgateway (1)
142+
| 57Mi
143+
| 107Mi
144+
| 307Mi
145+
| 446Mi
146+
147+
| net-istio-controller (1)
148+
| 60Mi
149+
| 152Mi
150+
| 350Mi
151+
| 504Mi
152+
153+
|===
154+
<1> Note: either `3scale-kourier-gateway` + `net-kourier-controller` or `istio-ingressgateway` + `net-istio-controller` are installed
155+
156+
157+
==== Configuring {serverlessproductname} Serving for minimal workloads
158+
159+
To configure {serverlessproductname} Serving for minimal workloads, you can tune the `KnativeServing` `CustomResource`:
160+
[source,yaml]
161+
----
162+
apiVersion: operator.knative.dev/v1beta1
163+
kind: KnativeServing
164+
metadata:
165+
name: knative-serving
166+
namespace: knative-serving
167+
spec:
168+
high-availability:
169+
replicas: 1 <1>
170+
171+
workloads:
172+
- name: activator
173+
replicas: 2 <2>
174+
resources:
175+
- container: activator
176+
requests:
177+
cpu: 250m <3>
178+
memory: 60Mi <4>
179+
limits:
180+
cpu: 1000m
181+
memory: 600Mi
182+
183+
- name: controller
184+
replicas: 1 <6>
185+
resources:
186+
- container: controller
187+
requests:
188+
cpu: 10m
189+
memory: 100Mi <4>
190+
limits: <5>
191+
cpu: 200m
192+
memory: 300Mi
193+
194+
- name: webhook
195+
replicas: 1 <6>
196+
resources:
197+
- container: webhook
198+
requests:
199+
cpu: 100m <7>
200+
memory: 20Mi <4>
201+
limits:
202+
cpu: 200m
203+
memory: 200Mi
204+
205+
podDisruptionBudgets: <8>
206+
- name: activator-pdb
207+
minAvailable: 1
208+
- name: webhook-pdb
209+
minAvailable: 1
210+
----
211+
<1> Setting this to 1 will scale all system components to one replica.
212+
<2> Activator should always be scaled to a minimum of two instances to avoid downtime.
213+
<3> Activator CPU requests should not be set lower than 250m, as a `HorizontalPodAutoscaler` will use this as a reference to scale up and down.
214+
<4> Adjust memory requests to the idle values from above. Also adjust memory limits according to your expected load (this might need custom testing to find the best values).
215+
<5> These limits are sufficient for a minimal-workload scenario, but they also might need adjustments depending on your concrete workload.
216+
<6> One webhook and one controller are sufficient for a minimal-workload scenario
217+
<7> Webhook CPU requests should not be set lower than 100m, as a `HorizontalPodAutoscaler` will use this as a reference to scale up and down.
218+
<8> Adjust the `PodDistruptionBudgets` to a value lower or equal to the `replicas`, to avoid problems during node maintenance.
219+
220+
221+
=== High-workload configuration
222+
223+
To configure {serverlessproductname} Serving for a high-workload scenario the following findings are relevant:
224+
225+
[NOTE]
226+
====
227+
These findings have been tested with requests with a payload size of 0-32kb.
228+
The Knative Service backends used in those tests had a startup-latency between 0-10 seconds and response times between 0-5 seconds.
229+
====
230+
231+
* All data-plane components are mostly increasing CPU usage on higher requests and/or payload scenarios, so the CPU requests and limits have to be tested and potentially increased.
232+
* The `activator` component also might need more memory, when it has to buffer more or bigger request payloads, so the memory requests and limits might need to be increased as well.
233+
* One `activator` pod can handle *approximately 2500 requests per second* before it starts to increase latency and, at some point, leads to errors.
234+
* One `3scale-kourier-gateway` or `istio-ingressgateway` pod can also handle *approximately 2500 requests per second* before it starts to increase latency and, at some point, leads to errors.
235+
* Each of the data-plane components consumes up to 1 vCPU of CPU for handling 2500 requests per second, please note that this highly depends on the payload size and the response times of the Knative Service backend.
236+
237+
[IMPORTANT]
238+
====
239+
Please note, that *fast startup* and *fast response-times* of your Knative Service user workloads are *critical* for good performance of the overall system.
240+
As {serverlessproductname} Serving components are buffering incoming requests when the Knative Service user backend is scaling-up or request concurrency has reached its capacity.
241+
If your Knative Service user workload introduce long startup- or request-latency, at some point this will either overload the `activator` component (only if the CPU + memory configuration is too low) or leads to errors for the calling clients.
242+
====
243+
244+
To fine-tune your {serverlessproductname} installation, use the above findings combined with your own test results to configure the `KnativeServing` `CustomResource`:
245+
246+
[source,yaml]
247+
----
248+
apiVersion: operator.knative.dev/v1beta1
249+
kind: KnativeServing
250+
metadata:
251+
name: knative-serving
252+
namespace: knative-serving
253+
spec:
254+
high-availability:
255+
replicas: 2 <1>
256+
257+
workloads:
258+
- name: component-name <2>
259+
replicas: 2 <2>
260+
resources:
261+
- container: container-name
262+
requests:
263+
cpu: <3>
264+
memory: <3>
265+
limits:
266+
cpu: <3>
267+
memory: <3>
268+
269+
podDisruptionBudgets: <4>
270+
- name: name-of-pod-disruption-budget
271+
minAvailable: 1
272+
----
273+
<1> Set this to at least 2, to make sure you have always at least two instances of every component running. You can also use `workloads` to override the replicas for certain components.
274+
<2> Use the `workloads` list to configure specific components. Use the `deployment` name of the component (like `activator`, `autoscaler`, `autoscaler-hpa`, `controller`, `webhook`, `net-kourier-controller`, `3scale-kourier-gateway`, `net-istio-controller`) and set the `replicas`.
275+
<3> Set the requested and limited CPU + Memory according to at least the idle consumption (see above) while also taking the above findings and your own test results into consideration.
276+
<4> Adjust the `PodDistruptionBudgets` to a value lower or equal to the `replicas`, to avoid problems during node maintenance. The default `minAvailable` is set to `1`, so if you increase the desired replicas, make sure to also increase `minAvailable`.
277+
278+
[IMPORTANT]
279+
====
280+
As each environment is highly specific, it is essential to test and find your own ideal configuration.
281+
Please use the monitoring and alerting functionality of {product-title} to continuously monitor your actual resource consumption and make adjustments if needed.
282+
283+
Also keep in mind, that if you are using the {serverlessproductname} and {smproductshortname} integration, additional CPU overhead is added by the `istio-proxy` sidecar containers.
284+
For more information on this, see the {smproductshortname} documentation.
285+
====
286+

0 commit comments

Comments
 (0)