Span Metrics Cardinality Limiting
Span Metrics convert spans into metrics (for example, requests, errors, and duration). When spans or metric labels include many unique values, you might see a high number of time series. This can degrade performance, increase cost, and break dashboards.
This guide introduces the three-layered approach to manage and mitigate cardinality issues:
- Layer 1: Correct instrumentation (prevent cardinality at source)
- Layer 2: Sanitization processors (normalize dynamic data before metrics)
- Layer 3: System-level cardinality limits (protect your metrics pipeline)
What is high cardinality?
High cardinality happens when a metric has hundreds of thousands of unique label combinations for a single service or operation.
Common examples include:
- User IDs in span names
- Email addresses or IP addresses in attributes
- Full URLs with query parameters
- Raw SQL statements with literal values
When used as metric labels, each unique value may create a separate time series, which leads to:
- Explosive time-series growth
- Missing or inconsistent metrics
- Slow dashboards or stale visualizations
- Broken Fair usage limits
Coralogix recommends the following layered approach to manage and mitigate cardinality issues.
Layer 1: Correct instrumentation (prevent cardinality at source)
Prevent high-cardinality data by avoiding dynamic values in span names or labels:
- Use generic identifiers instead of specific IDs (for example,
/user/{id}instead of/user/12345). - Avoid labels such as
session.id,tenant.id,email, or IP addresses that vary per request. - Use semantic attributes like
http.route,db.operation,service.version.
Layer 2: Sanitization processors (normalize dynamic data before metrics)
Sanitization normalizes dynamic or noisy values before Span Metrics generate metric time series.
Why sanitization matters
- Prevents runaway cardinality.
- Keeps routes, DB queries, and operations readable.
- Ensures clean aggregation for service and database metrics.
- Recommended in Span Metrics pipelines.
Detecting dynamic span names
Dynamic span names (for example, names that include IDs, GUIDs, or raw query strings) are a common cause of high cardinality. Use the following method to identify services that generate dynamic span names and to inspect the patterns involved.
Find services with many unique span names by running the following query in Explore tracing to identify services that produce a large number of distinct operation names:
source spans | groupby $l.serviceName agg distinct_count($l.operationName) | orderby _distinct_count0 descThis helps you locate services that are likely creating dynamic span names.
Example output:
Discover dynamic span names within a specific service by running the following:
source spans | filter $l.serviceName == `prod` | countby $l.serviceName, $l.operationName | filter _count < 2Look for non-templatized span names generated by frameworks (for example, GraphQL or auto-generated HTTP routes) or span names that embed IDs, UUIDs, or other request-specific values.
Example output:
Next step: Normalize detected dynamic span names
After identifying dynamic span name patterns, proceed to the next section to apply normalization using:
This ensures dynamic values are replaced with stable templates (for example, /users/{id}), significantly reducing cardinality.
Span name and URL sanitization
Span name sanitization is recommended for HTTP client and server spans. For span names, we use a Markov-chain classifier trained on URL path n-grams to detect UUIDs, hashes, and other high-cardinality tokens. Only HTTP client and server spans are affected. Any “gibberish” path segments are replaced with *, while the rest of the route remains human-readable. Sanitizers are enabled by default with Span Metrics. You can customize or disable them using the spanMetricsSanitization preset.
Before:
After:
Span Metrics sanitization wires the redaction processor into the traces pipeline to scrub high-cardinality span names/HTTP URLs and sanitize database statements whenever Span Metrics presets are enabled.
enabledacceptsauto,true, orfalse. The defaultautoturns sanitization on automatically wheneverpresets.spanMetrics.enabledorpresets.spanMetricsMulti.enabledis true.- Set
enabled: falseto opt out even when Span Metrics are enabled, orenabled: trueto force sanitization on explicitly. sanitize_url(defaulttrue) controls whether URL-like span names and URL attributes are normalized (for example,GET /api/users/42/profile→GET /api/users/?/profile).sanitizeDatabasesis an allowlist of database backends whose statements should be scrubbed (valid: sql, redis, memcached, mongo, opensearch, es). All supported backends are sanitized by default.
spanMetricsSanitization:
enabled: auto # auto: enabled when spanMetrics or spanMetricsMulti presets are enabled
sanitize_url: true
sanitizeDatabases:
- sql
- redis
- memcached
- mongo
- opensearch
- es
Database statement sanitization
Arguments and literals are stripped; statement shapes remain. Database statements are sanitized using the Datadog obfuscator. SQL, Redis/Valkey, Memcached, MongoDB, OpenSearch, and Elasticsearch payloads have literals and arguments removed, but their overall command structure is preserved. Sanitizers are enabled by default with Span Metrics. You can customize or disable them using the spanMetricsSanitization preset.
Before:
After:
Layer 3: System-level cardinality limits (protect your metrics pipeline)
Span metrics might generate extremely high cardinality. To mitigate this, Coralogix adopts a mechanism similar to the OpenTelemetry Metrics SDK cardinality limits. This feature introduces an automatic, configurable cardinality control mechanism within the spanmetrics pipeline of the OpenTelemetry Collector. Coralogix detects when services exceed configured cardinality limits and exposes this information so users can identify dropped series and take corrective action early. Detection is currently available at the backend level, with frontend UI visibility planned for future releases.
How the cardinality limit works
- A threshold (for example, 100,000) is applied per service per metric. For example,
calls_total{service="order-service"}will have a 100,000 series cap. - Once the limit is reached, new unique label combinations are not created.
- Instead, data is aggregated into a fallback time series marked with:
Example:
Cardinality limit of 3, with 5 time series sent (each with 50 spans):
calls_total{span_name="uuid1"}
calls_total{span_name="uuid2"}
calls_total{span_name="uuid3"}
calls_total{otel_metric_overflow="true"}
- The first 3 series are preserved.
- The remaining 100 spans are collapsed into a single time series tagged with
otel_metric_overflow="true".
Cardinality limit settings
To set the cardinality limit with aggregation_cardinality_limit, ensure you are using OpenTelemetry Collector version 0.130.0 or later.
Use aggregation temporality
aggregation_temporality controls how metric values accumulate.
- Cumulative: Values increase continuously until the collector restarts; recommended for long-running services.
- Delta: Values represent changes since the last export; recommended for serverless or short-lived workloads.
Kubernetes integration (Helm)
With the Coralogix Kubernetes Complete Observability integration, the cardinality limit is automatically enabled and set to 100,000 by default, starting from Helm chart version v.0.0.203 and later. No additional configuration is required.
- To disable the cardinality limit by overriding the default value, add an
aggregationCardinalityLimitfield under the SpanMetrics connector, and set to it to0. - To edit the cardinality limit, set the
aggregationCardinalityLimitfield to the desired value, as follows:
spanMetrics:
enabled: true
collectionInterval: "{{.Values.global.collectionInterval}}"
metricsExpiration: 5m
histogramBuckets:
[1ms, 4ms, 10ms, 20ms, 50ms, 100ms, 200ms, 500ms, 1s, 2s, 5s]
aggregationCardinalityLimit: 100000
OpenTelemetry Collector (non-Kubernetes)
Add the aggregationCardinalityLimit settings as part of the OTel collector under the spanmetrics connector, and set the limit you want.
- To disable the cardinality limit, either set the
aggregationCardinalityLimitfield to0or remove it entirely. - If you are using the
dbMetricconnector, ensure that theaggregationCardinalityLimitfield is specified under this connector as well.
connectors:
spanmetrics:
namespace: ""
histogram:
explicit:
buckets: [100us, 1ms, 2ms, 4ms, 6ms, 10ms, 100ms, 250ms]
aggregationCardinalityLimit: 100000
Retention behavior
Tracked time series are stored in-memory only and are cleared when the OpenTelemetry Collector or sending pod restarts—no persistent state is maintained.
- If a service stops sending data for 5 minutes, its cache is reset automatically.
- If the service is redeployed without stopping data flow, the cache persists; to reset it, either restart the collector or allow the service to idle for 5 minutes.
Alerting on cardinality overflow
Set up alerts based on the presence of the label otel_metric_overflow="true". This allows early detection of cardinality issues—as soon as overflow begins, even if only a single value is dropped.
Recommended PromQL expression:
Validation checklist
- Check that dimension values and span names are normalized (for example,
/user/{id}instead of/user/1234). - Ensure URL parameters are masked when needed.
- Verify database statements have been sanitized.
- Confirm histogram buckets and
levalues are consistent across environments. - Use the overflow query to detect dropped series:
Best practices summary
| Layer | Focus | Techniques |
|---|---|---|
| 1 | Prevent cardinality at source | Use generic names, avoid dynamic labels |
| 2 | Normalize remaining dynamic values | Sanitize HTTP spans, rewrite DB statements; Use replace patterns or transform processors |
| 3 | Protect the system from leftover cardinality | Add Coralogix cardinality limit setup; Set aggregationCardinalityLimit, alert on overflow |

