Skip to content

Span Metrics Cardinality Limiting

Span Metrics convert spans into metrics (for example, requests, errors, and duration). When spans or metric labels include many unique values, you might see a high number of time series. This can degrade performance, increase cost, and break dashboards.

This guide introduces the three-layered approach to manage and mitigate cardinality issues:

What is high cardinality?

High cardinality happens when a metric has hundreds of thousands of unique label combinations for a single service or operation.

Common examples include:

  • User IDs in span names
  • Email addresses or IP addresses in attributes
  • Full URLs with query parameters
  • Raw SQL statements with literal values

When used as metric labels, each unique value may create a separate time series, which leads to:

  • Explosive time-series growth
  • Missing or inconsistent metrics
  • Slow dashboards or stale visualizations
  • Broken Fair usage limits

Coralogix recommends the following layered approach to manage and mitigate cardinality issues.

Layer 1: Correct instrumentation (prevent cardinality at source)

Prevent high-cardinality data by avoiding dynamic values in span names or labels:

  • Use generic identifiers instead of specific IDs (for example, /user/{id} instead of /user/12345).
  • Avoid labels such as session.id, tenant.id, email, or IP addresses that vary per request.
  • Use semantic attributes like http.route, db.operation, service.version.

Layer 2: Sanitization processors (normalize dynamic data before metrics)

Sanitization normalizes dynamic or noisy values before Span Metrics generate metric time series.

Why sanitization matters

  • Prevents runaway cardinality.
  • Keeps routes, DB queries, and operations readable.
  • Ensures clean aggregation for service and database metrics.
  • Recommended in Span Metrics pipelines.

Detecting dynamic span names

Dynamic span names (for example, names that include IDs, GUIDs, or raw query strings) are a common cause of high cardinality. Use the following method to identify services that generate dynamic span names and to inspect the patterns involved.

  1. Find services with many unique span names by running the following query in Explore tracing to identify services that produce a large number of distinct operation names:

    source spans
    | groupby  $l.serviceName agg distinct_count($l.operationName)
    | orderby _distinct_count0 desc
    

    This helps you locate services that are likely creating dynamic span names.

    Example output:

    Example

  2. Discover dynamic span names within a specific service by running the following:

    source spans
    | filter $l.serviceName == `prod`
    | countby $l.serviceName, $l.operationName
    | filter _count < 2
    

    Look for non-templatized span names generated by frameworks (for example, GraphQL or auto-generated HTTP routes) or span names that embed IDs, UUIDs, or other request-specific values.

    Example output:

    Example

Next step: Normalize detected dynamic span names

After identifying dynamic span name patterns, proceed to the next section to apply normalization using:

This ensures dynamic values are replaced with stable templates (for example, /users/{id}), significantly reducing cardinality.

Span name and URL sanitization

Span name sanitization is recommended for HTTP client and server spans. For span names, we use a Markov-chain classifier trained on URL path n-grams to detect UUIDs, hashes, and other high-cardinality tokens. Only HTTP client and server spans are affected. Any “gibberish” path segments are replaced with *, while the rest of the route remains human-readable. Sanitizers are enabled by default with Span Metrics. You can customize or disable them using the spanMetricsSanitization preset.

Before:

GET /api/v1/users/9f8a7c12-3d4b-4f51-9e2b/profile

After:

GET /api/v1/users/*/profile

Span Metrics sanitization wires the redaction processor into the traces pipeline to scrub high-cardinality span names/HTTP URLs and sanitize database statements whenever Span Metrics presets are enabled.

  • enabled accepts auto, true, or false. The default auto turns sanitization on automatically whenever presets.spanMetrics.enabled or presets.spanMetricsMulti.enabled is true.
  • Set enabled: false to opt out even when Span Metrics are enabled, or enabled: true to force sanitization on explicitly.
  • sanitize_url (default true) controls whether URL-like span names and URL attributes are normalized (for example, GET /api/users/42/profileGET /api/users/?/profile).
  • sanitizeDatabases is an allowlist of database backends whose statements should be scrubbed (valid: sql, redis, memcached, mongo, opensearch, es). All supported backends are sanitized by default.
  spanMetricsSanitization:
    enabled: auto # auto: enabled when spanMetrics or spanMetricsMulti presets are enabled
    sanitize_url: true
    sanitizeDatabases:
      - sql
      - redis
      - memcached
      - mongo
      - opensearch
      - es

Database statement sanitization

Arguments and literals are stripped; statement shapes remain. Database statements are sanitized using the Datadog obfuscator. SQL, Redis/Valkey, Memcached, MongoDB, OpenSearch, and Elasticsearch payloads have literals and arguments removed, but their overall command structure is preserved. Sanitizers are enabled by default with Span Metrics. You can customize or disable them using the spanMetricsSanitization preset.

Before:

SELECT id, email FROM users WHERE email = 'jan@corp.com' AND id = 9121;

After:

SELECT id, email FROM users WHERE email = ? AND id = ?;

Layer 3: System-level cardinality limits (protect your metrics pipeline)

Span metrics might generate extremely high cardinality. To mitigate this, Coralogix adopts a mechanism similar to the OpenTelemetry Metrics SDK cardinality limits. This feature introduces an automatic, configurable cardinality control mechanism within the spanmetrics pipeline of the OpenTelemetry Collector. Coralogix detects when services exceed configured cardinality limits and exposes this information so users can identify dropped series and take corrective action early. Detection is currently available at the backend level, with frontend UI visibility planned for future releases.

How the cardinality limit works

  • A threshold (for example, 100,000) is applied per service per metric. For example, calls_total{service="order-service"} will have a 100,000 series cap.
  • Once the limit is reached, new unique label combinations are not created.
  • Instead, data is aggregated into a fallback time series marked with:
otel_metric_overflow="true"

Example:

Cardinality limit of 3, with 5 time series sent (each with 50 spans):

calls_total{span_name="uuid1"}
calls_total{span_name="uuid2"}
calls_total{span_name="uuid3"}
calls_total{otel_metric_overflow="true"}
  • The first 3 series are preserved.
  • The remaining 100 spans are collapsed into a single time series tagged with otel_metric_overflow="true".

Cardinality limit settings

To set the cardinality limit with aggregation_cardinality_limit, ensure you are using OpenTelemetry Collector version 0.130.0 or later.

Use aggregation temporality

aggregation_temporality controls how metric values accumulate.

  • Cumulative: Values increase continuously until the collector restarts; recommended for long-running services.
  • Delta: Values represent changes since the last export; recommended for serverless or short-lived workloads.

Kubernetes integration (Helm)

With the Coralogix Kubernetes Complete Observability integration, the cardinality limit is automatically enabled and set to 100,000 by default, starting from Helm chart version v.0.0.203 and later. No additional configuration is required.

  • To disable the cardinality limit by overriding the default value, add an aggregationCardinalityLimit field under the SpanMetrics connector, and set to it to 0.
  • To edit the cardinality limit, set the aggregationCardinalityLimit field to the desired value, as follows:
spanMetrics:
      enabled: true
      collectionInterval: "{{.Values.global.collectionInterval}}"
      metricsExpiration: 5m
      histogramBuckets:
        [1ms, 4ms, 10ms, 20ms, 50ms, 100ms, 200ms, 500ms, 1s, 2s, 5s]
      aggregationCardinalityLimit: 100000

OpenTelemetry Collector (non-Kubernetes)

Add the aggregationCardinalityLimit settings as part of the OTel collector under the spanmetrics connector, and set the limit you want.

  • To disable the cardinality limit, either set the aggregationCardinalityLimit field to 0 or remove it entirely.
  • If you are using the dbMetric connector, ensure that the aggregationCardinalityLimit field is specified under this connector as well.
connectors:
      spanmetrics:
        namespace: ""
        histogram:
          explicit:
            buckets: [100us, 1ms, 2ms, 4ms, 6ms, 10ms, 100ms, 250ms]
        aggregationCardinalityLimit: 100000

Retention behavior

Tracked time series are stored in-memory only and are cleared when the OpenTelemetry Collector or sending pod restarts—no persistent state is maintained.

  • If a service stops sending data for 5 minutes, its cache is reset automatically.
  • If the service is redeployed without stopping data flow, the cache persists; to reset it, either restart the collector or allow the service to idle for 5 minutes.

Alerting on cardinality overflow

Set up alerts based on the presence of the label otel_metric_overflow="true". This allows early detection of cardinality issues—as soon as overflow begins, even if only a single value is dropped.

Recommended PromQL expression:

sum by (service_name) (
  duration_ms_bucket{otel_metric_overflow="true"}
) > 0

Validation checklist

  • Check that dimension values and span names are normalized (for example, /user/{id} instead of /user/1234).
  • Ensure URL parameters are masked when needed.
  • Verify database statements have been sanitized.
  • Confirm histogram buckets and le values are consistent across environments.
  • Use the overflow query to detect dropped series:
sum by (service_name) (
  traces_spanmetrics_calls_total{otel_metric_overflow="true"}
)

Best practices summary

LayerFocusTechniques
1Prevent cardinality at sourceUse generic names, avoid dynamic labels
2Normalize remaining dynamic valuesSanitize HTTP spans, rewrite DB statements; Use replace patterns or transform processors
3Protect the system from leftover cardinalityAdd Coralogix cardinality limit setup; Set aggregationCardinalityLimit, alert on overflow
Was this helpful?