Skip to content

feat: Enable Istio access logs via OTLP to Telemetry Log Gateway #1374

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

a-thaler
Copy link
Contributor

@a-thaler a-thaler commented Apr 1, 2025

Description

The telemetry module is introducing support for OTLP based ingestion and export of logs, see kyma-project/telemetry-manager#556. Hereby, logs collected via stdout will be shipped differently then before (protocol, attribute naming, anything) and users need to adopt.

Currently, Istio access logs are supported via printing JSON to stdout, using the "stdout-json" Istio extension provider. As users need to adopt when switching to OTLP, we directly like to add Istio access logs support via OTLP as well, so that users need to adopt once only.

This PR introduces a new Istio "envoyOtelAls" extension provider "kyma-logs" (analogue to "kyma-traces") which will push access logs to the upcoming telemetry log gateway.

There is no default format for defining the log body and the attributes, that's why in kyma-project/telemetry-manager#1899 a detailed analysis was done to see how to map existing attributes to the OTEL semantic conventions. The proposal was crosschecked by the SAP Cloud Logging team. As a result, this PR configures a log body using the most simple default apache access log format and defines attributes which are mainly derived from the pattern before but using otel conventions.

Additionally, one log attribute gets added to indicate for the telemetry log gateway that this log data is emmitted by Istio. Unfortunately, no better way was found as typical patterns like setting an instrumentation scope on the OTEL data is not supported by Istio yet.

Changes proposed in this pull request:

  • Adds a new extension provider for access logs via OTLP

Pre-Merge Checklist

  • As a PR reviewer, verify code coverage and evaluate if it is acceptable.

Related issues

@a-thaler
Copy link
Contributor Author

a-thaler commented Apr 7, 2025

As the new format is using CEL instead of plain Envoy attributes and is using direct pushs instead of writing to stdout, we need to run a perf test before merging.
Goal: Proof that the new provider will not impact the performance of the envoy
Tests:

  • As a baseline, run an envoy with the stdout extension configured and put it under load. Track the throughput and the resource consumption
  • Run exactly the same setup but use the new provider only with a
    • responsive OTEL backend
    • unresponsive OTEL backend
    • a backend which is delaying the responses
  • Criterias:
    • Throughput is not degraded
    • Resource consumption is not increased significant

@TeodorSAP
Copy link
Member

TeodorSAP commented Apr 24, 2025

Load Tests Summary

❗️For more details and exact numbers, please refer to: kyma-project/telemetry-manager#2075

envoy vs. kyma-logs

Where istio-proxy refers to the envoy container that exports the access logs and nginx to the target container of the network traffic load generation.

Pod CPU Usage CPU Throttling Memory Usage (WSS) Receive Bandwidth Transmit Bandwidth Rate of Received Packets Rate of Transmitted Packets Rate of Packets Dropped (Received + Transmitted)
(envoy) istio-proxy 0.249 100% 41.9 MiB - - - - -
(envoy) nginx ~0.065 - 4.77 MiB 501 KB/s 865 KB/s 594 p/s 673 p/s 0 p/s
(kyma-logs) istio-proxy 0.25 100% 44.0 MiB - - - - -
(kyma-logs) nginx ~0.06 - 4.46 MiB 467 KB/s 1.3 MB/s 595 p/s 690 p/s 0 p/s

Fault-injected Scenarios

Testing the new provider in edge-case fault-injected scenarios (unresponsive OTEL backend and a backend that is refusing some of the data) did not show any signs of failure or performance degradation.

Conclusions

Comparing the old provider (envoy) with the new provider (kyma-logs), no significant differences were observed in terms of resource consumption and performance. The new provider (kyma-logs) seems to be able to handle the same amount of traffic as the old provider (envoy), with similar CPU and memory usage and a slight increase in network bandwidth usage.

@TeodorSAP
Copy link
Member

TeodorSAP commented Apr 30, 2025

Load Tests Improvements

❗️For more details and exact numbers, please refer to: kyma-project/telemetry-manager#2088

Since the results provided above are not specific enough (not measuring the nginx and fortio pods in an isolated mode and the traffic between them) and the fault-injected scenarios are not backed up by values, I provide below detailed results tables and a follow-up PR in this regard:

nginx Pod

Run Provider Scenario [Istio] Requests Total [Istio] Request Duration (ms) [Istio] Request/Response Bytes [K8S] Received/Transmitted Bandwidth (KB/s) [K8S] Packets Rate (Received/Transmitted) [K8S] Packets Dropped (Received + Transmitted) [K8S] CPU Usage (istio-proxy, nginx) [K8S] CPU Throttling (if any) [K8S] Memory Usage (WSS) (istio-proxy, nginx)
R01 kyma-logs Functional 504 2.36 504 / 504 439 / 1230 566 / 665 p/s 0 p/s istio-proxy: 0.249, nginx: 0.061 istio-proxy: 100% istio-proxy: 44.8 MiB, nginx: 4.47 MiB
R02 telemetry-stdout Functional 564 1.95 564 / 564 485 / 827 566 / 649 p/s 0 p/s istio-proxy: 0.250, nginx: 0.063 istio-proxy: 100% istio-proxy: 49.8 MiB, nginx: 4.48 MiB
R03 kyma-logs Backend not reachable 490 2.01 486 / 486 478 / 719 493 / 563 p/s 0 p/s istio-proxy: 0.251, nginx: 0.0626 istio-proxy: 100% istio-proxy: 50.5 MiB, nginx: 4.47 MiB
R04 kyma-logs Backend refusing some access logs 522 2.2 522 / 522 463 / 1280 584 / 683 p/s 0 p/s istio-proxy: 0.250, nginx: 0.058 istio-proxy: 100% istio-proxy: 50.7 MiB, nginx: 4.47 MiB

fortio Pod

Run Provider Scenario [Istio] Requests Total [Istio] Request Duration (ms) [Istio] Request/Response Bytes [K8S] Received/Transmitted Bandwidth (KB/s) [K8S] Packets Rate (Received/Transmitted) [K8S] Packets Dropped (Received + Transmitted) [K8S] CPU Usage (istio-proxy, fortio) [K8S] CPU Throttling (if any) [K8S] Memory Usage (WSS) (istio-proxy, fortio)
R01 kyma-logs Functional 504 7.04 504 / 504 728 / 446 589 / 509 p/s 0 p/s istio-proxy: 0.17, fortio: 0.0497 istio-proxy: 0% istio-proxy: 39.5 MiB, fortio: 10.6 MiB
R02 telemetry-stdout Functional 564 6.21 564 / 564 795 / 496 646 / 566 p/s 0 p/s istio-proxy: 0.172, fortio: 0.0538 istio-proxy: 0% istio-proxy: 40.6 MiB, fortio: 11.0 MiB
R03 kyma-logs Backend not reachable 466 6.34 466 / 466 818 / 421 548 / 480 p/s 0 p/s istio-proxy: 0.17, fortio: 0.0528 istio-proxy: 0% istio-proxy: 40.5 MiB, fortio: 10.7 MiB
R04 kyma-logs Backend refusing some access logs 522 6.8 522 / 522 748 / 459 605 / 525 p/s 0 p/s istio-proxy: 0.17, fortio: 0.0509 istio-proxy: 0% istio-proxy: 40.4 MiB, fortio: 11.4 MiB

Conclusions

Conclusions remain mostly unchanged (see comment above and refer to the benchmark document from the referenced PR)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants