Skip to content

High Latency When Using Nginx Ingress with OTel Collector over HTTP #4488

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
oszlak opened this issue Mar 17, 2025 · 2 comments
Open

High Latency When Using Nginx Ingress with OTel Collector over HTTP #4488

oszlak opened this issue Mar 17, 2025 · 2 comments
Labels
question Further information is requested

Comments

@oszlak
Copy link

oszlak commented Mar 17, 2025

Describe your environment

Environment

Kubernetes: v1.30
Local setup using kind
OTel Collector: v0.121.0
sdk: 1.31.0
Nginx Ingress Controller

Description
When using the OpenTelemetry collector directly with port forwarding (4318), the metric export latency is normal (50-150ms). However, when introducing an Nginx ingress in front of the collector, the latency increases dramatically to 3-5 seconds per export.
Steps to Reproduce

Simple OTel collector pipeline configuration is used
OTLP HTTP exporter is being used (port 4318)
No visible errors in logs, just increased latency
Using ingress rewrite rule: nginx.ingress.kubernetes.io/rewrite-target: /v1/metrics
Collector configured as statefulset with minimal processing (batch processor only)

Troubleshooting Attempted

Verified that the ingress configuration is correct by confirming metrics are received
Checked Nginx ingress controller logs for any errors or warnings
Confirmed that other services behind the same ingress controller don't experience similar latency issues
Using a minimal collector configuration with only debug exporter
Ingress is using nginx.ingress.kubernetes.io/rewrite-target annotation that might affect routing

Impact
This latency increase makes using Nginx ingress in front of OTel collector impractical for production environments where timely metric export is critical.

# Simple OpenTelemetry Collector configuration
# Just receives metrics on port 4318 and outputs to stdout

global:
  defaultApplicationName: "metrics-local-kind"
  defaultSubsystemName: "metrics-local-kind"
nameOverride: "metrics-local-kind"
fullnameOverride: "metrics-local-kind"
mode: "statefulset"  # Keeping statefulset as in original config

# Disable all presets that we don't need
presets:
  logsCollection:
    enabled: false
  hostMetrics:
    enabled: false
  kubernetesAttributes:
    enabled: false  # Changed to false since we're just printing to stdout
  clusterMetrics:
    enabled: false
  kubeletMetrics:
    enabled: false

configMap:
  create: true

# The core configuration
config:
  exporters:
    # Only using debug exporter to print to stdout
    debug:
      verbosity: detailed  # Print detailed metrics information

  extensions:
    health_check: {}  # Keep health check for monitoring

  processors:
    batch:  # Basic batch processor to efficiently handle metrics
      send_batch_size: 1024
      timeout: "1s"

  receivers:
    otlp:  # OTLP receiver to get metrics
      protocols:
        http:
          endpoint: "0.0.0.0:4318"  # Listen for HTTP OTLP metrics on port 4318

  service:
    extensions:
      - health_check
    pipelines:
      metrics:  # Simple metrics pipeline
        receivers:
          - otlp
        processors:
          - batch
        exporters:
          - debug  # Only export to debug (stdout)

# Container image configuration
image:
  repository: otel/opentelemetry-collector-contrib
  pullPolicy: IfNotPresent
  tag: "0.121.0"  # Keeping your version

command:
  name: otelcol-contrib

# Basic setup for the service account
serviceAccount:
  create: true

# We don't need cluster role
clusterRole:
  create: false

# Restoring statefulset configuration from original
statefulset:
  persistentVolumeClaimRetentionPolicy:
    enabled: true
    whenDeleted: Delete
    whenScaled: Retain
  volumeClaimTemplates:
    - metadata:
        name: queue
      spec:
        storageClassName: standard
        accessModes:
          - ReadWriteOnce
        resources:
          requests:
            storage: "1Gi"

# Add pod identity as an environment variable for application use
extraVolumeMounts:
  - name: queue
    mountPath: /var/lib/storage/queue

initContainers:
  - name: init-fs
    image: busybox:latest
    command:
      - sh
      - "-c"
      - "chown -R 10001: /var/lib/storage/queue"
    volumeMounts:
      - name: queue
        mountPath: /var/lib/storage/queue

# Enable required ports
ports:
  otlp-http:
    enabled: true
    containerPort: 4318
    servicePort: 4318
    protocol: TCP
  metrics:
    enabled: true
    containerPort: 8888
    servicePort: 8888
    protocol: TCP

# Minimal resource requirements
resources:
  limits:
    memory: 200Mi
  requests:
    cpu: 200m
    memory: 200Mi

replicaCount: 1

# Simple ClusterIP service
service:
  type: ClusterIP

# Keeping the ingress configuration
ingress:
  enabled: true
  ingressClassName: nginx  # Matches the NGINX Ingress Controller
  hosts:
    - host: otel-metrics.local  # Dummy host for local testing in Kind
      paths:
        - path: /
          pathType: Prefix
          port: 4318
  annotations:
    nginx.ingress.kubernetes.io/rewrite-target: /v1/metrics  # Rewrite to OTLP metrics endpoint

python app

import logging
logger = logging.getLogger("via_telemetry")

from typing import Optional
from opentelemetry.sdk.resources import SERVICE_NAME, Attributes, Resource
from opentelemetry.exporter.otlp.proto.http import Compression
from opentelemetry.exporter.otlp.proto.http.metric_exporter import OTLPMetricExporter
from opentelemetry.metrics import set_meter_provider
from opentelemetry.sdk.metrics import MeterProvider
from opentelemetry.sdk.metrics._internal.export import MetricExporter
from opentelemetry.sdk.metrics.export import (
    ConsoleMetricExporter,
    PeriodicExportingMetricReader,
)
from opentelemetry.sdk.resources import Resource
from opentelemetry import metrics

exporter = OTLPMetricExporter(
    endpoint='http://0.0.0.0:4318/v1/metrics',
    timeout=8,  # type: ignore[arg-type]
    compression=Compression("none"),
)
metric_readers = []
reader = PeriodicExportingMetricReader(exporter)
metric_readers.append(reader)
service_attr: Attributes = {SERVICE_NAME: "sdk_test", "team": "o11y"}
service_resource = Resource(attributes=service_attr)
meter_provider = MeterProvider(resource=service_resource, metric_readers=metric_readers)
set_meter_provider(meter_provider)
meter = metrics.get_meter("otel-tests")

process_counter = meter.create_counter(
    name="sdk_counter_tests",
    unit="invocation",
    description="Counts the number of process invocations with large increase",
)

def main():
    counter = 0
    logger.info("Function triggered successfully")
    labels = {
        'env': 'dev',
        'city_id': '123',
    }
    try:
        while True:
            rand_num = random.randrange(1, 10)
            logger.info("Function triggered successfully")
            process_counter.add(rand_num, labels)
            counter += rand_num
            time.sleep(1)
    except KeyboardInterrupt:
        start = time.time()
        meter_provider.force_flush(8000)
        end = time.time()
        print(f"time to flush metrics: {end - start}")
        print(f'total counter: {counter}')
if __name__ == "__main__":
    main()

What happened?

With direct port forwarding: 50-150ms latency
With Nginx ingress: 3-5s latency (30-100x increase)

Steps to Reproduce

Set up a kind cluster with Kubernetes 1.30
Deploy OTel collector v0.121.0 with a simple pipeline
Test direct export via port forwarding:
Copykubectl port-forward service/otel-collector 4318:4318
Result: Export latency is 50-150ms
Deploy Nginx ingress controller and configure it to route to the OTel collector
Export metrics through the ingress
Result: Export latency increases to 3-5 seconds

Expected Result

The latency should remain comparable when using an ingress, perhaps with a slight increase but not a 30-100x degradation.
Additional Information

Actual Result

When using the OpenTelemetry collector directly with port forwarding (4318), the metric export latency is normal (50-150ms). However, when introducing an Nginx ingress in front of the collector, the latency increases dramatically to 3-5 seconds per export.

Additional context

No response

Would you like to implement a fix?

None

@oszlak oszlak added the bug Something isn't working label Mar 17, 2025
@xrmx
Copy link
Contributor

xrmx commented Mar 17, 2025

Pretty sure this is not the correct repository to report an issue with an ingress controller.

@xrmx xrmx added question Further information is requested and removed bug Something isn't working labels Mar 17, 2025
@oszlak
Copy link
Author

oszlak commented Mar 17, 2025

thank you @xrmx
actually i'm not sure it belongs to nginx also,
this delay only occurs when behind the ingress there is an OTel collector
Maybe someone has any clue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants