## Goal

By the end of this guide you should be able to use [`aggregate`](https://coralogix.com/docs/dataprime/language-reference/commands-reference/aggregate/index.md), [`groupby`](https://coralogix.com/docs/dataprime/language-reference/commands-reference/groupby/index.md), [`avg`](https://coralogix.com/docs/dataprime/language-reference/functions-reference/aggregation/avg/index.md), [`count_if`](https://coralogix.com/docs/dataprime/language-reference/functions-reference/aggregation/count_if/index.md), and other functions to calculate metrics such as averages, totals, and conditional counts from your logs or traces.

## Why it matters

Understanding what your systems are doing at scale often requires more than just viewing raw logs, it requires summarization. Aggregations let you answer questions like “Which endpoints are the slowest?”, “How many error traces exceeded 1s?” or “What is the average request size per service?” without manually inspecting individual events.

## Count all logs or traces

### Description

Use [`count()`](https://coralogix.com/docs/dataprime/language-reference/commands-reference/count/index.md) to return the total number of documents in the dataset or current time range. This is the simplest and fastest way to verify volume before applying filters or grouping.

### Syntax

```dataprime
count(): number
```

### Example: Count total number of trace spans

#### Sample data

```json
{ "trace_id": "t1", "duration": 500 }
{ "trace_id": "t2", "duration": 620 }
{ "trace_id": "t3", "duration": 80 }
```

#### Query

```dataprime
source spans
| aggregate count() as total_spans
```

#### Result

| total_spans |
| ----------- |
| 3           |

______________________________________________________________________

## Group by a field and calculate an average

### Description

Use [`groupby`](https://coralogix.com/docs/dataprime/language-reference/commands-reference/groupby/index.md) with [`avg()`](https://coralogix.com/docs/dataprime/language-reference/functions-reference/aggregation/avg/index.md) to measure how a numerical value (e.g., duration, size, latency) varies by category.

### Syntax

```dataprime
groupby <key> aggregate avg(<number_field>) as <alias>
```

### Example: Compute average span duration per region

#### Sample data

```json
{ "cloud_region": "eu-north-1", "duration": 500 }
{ "cloud_region": "eu-north-1", "duration": 654 }
{ "cloud_region": "eu-west-1", "duration": 2100 }
{ "cloud_region": "us-east-2", "duration": 150 }
```

#### Query

```dataprime
source spans
| groupby cloud_region aggregate avg(duration) as avg_duration
```

#### Result

| cloud_region | avg_duration |
| ------------ | ------------ |
| eu-north-1   | 577.0        |
| eu-west-1    | 2100.0       |
| us-east-2    | 150.0        |

This highlights regional performance differences. You can use this pattern to monitor latency by path, status code, or service.

______________________________________________________________________

## Add multiple aggregations at once

### Description

You can combine multiple aggregation functions in a single query to analyze different metrics simultaneously.

### Example: Measure average, max, and count of spans per region

#### Sample data

```json
{ "cloud_region": "eu-north-1", "duration": 500 }
{ "cloud_region": "eu-north-1", "duration": 900 }
{ "cloud_region": "eu-west-1", "duration": 2100 }
{ "cloud_region": "us-east-2", "duration": 150 }
```

#### Query

```dataprime
source spans
| groupby cloud_region aggregate
    avg(duration) as avg_duration,
    max(duration) as max_duration,
    count() as request_count
```

#### Result

| cloud_region | avg_duration | max_duration | request_count |
| ------------ | ------------ | ------------ | ------------- |
| eu-north-1   | 700.0        | 900          | 2             |
| eu-west-1    | 2100.0       | 2100         | 1             |
| us-east-2    | 150.0        | 150          | 1             |

This approach is useful for analyzing scale and performance in the same query.

______________________________________________________________________

## Filter before aggregating

### Description

Use [`filter`](https://coralogix.com/docs/dataprime/language-reference/commands-reference/filter/index.md) to narrow down the dataset *before* aggregating. This improves performance and ensures your calculations reflect only relevant data.

### Example: Count only slow requests per region

#### Sample data

```json
{ "cloud_region": "eu-north-1", "duration": 900 }
{ "cloud_region": "eu-north-1", "duration": 300 }
{ "cloud_region": "us-east-2", "duration": 800 }
```

#### Query

```dataprime
source spans
| filter duration > 500
| groupby cloud_region aggregate count() as slow_request_count
```

#### Result

| cloud_region | slow_request_count |
| ------------ | ------------------ |
| eu-north-1   | 1                  |
| us-east-2    | 1                  |

The filter ensures that only spans above 500ms are included in the aggregation.

______________________________________________________________________

## Use `count_if` for conditional metrics

### Description

Use [`count_if()`](https://coralogix.com/docs/dataprime/language-reference/functions-reference/aggregation/count_if/index.md) to count how many documents meet a specific condition within a group.

### Syntax

```dataprime
count_if(condition: bool): number
```

### Example: Compare total and slow spans per region

#### Sample data

```json
{ "cloud_region": "eu-west-1", "duration": 700 }
{ "cloud_region": "eu-west-1", "duration": 400 }
{ "cloud_region": "us-east-2", "duration": 300 }
{ "cloud_region": "us-east-2", "duration": 1200 }
```

#### Query

```dataprime
source spans
| groupby cloud_region aggregate
    count() as total_requests,
    count_if(duration > 500) as slow_requests
```

#### Result

| cloud_region | total_requests | slow_requests |
| ------------ | -------------- | ------------- |
| eu-west-1    | 2              | 1             |
| us-east-2    | 2              | 1             |

This lets you track the proportion of problematic spans per group.

______________________________________________________________________

## Group by calculated severity level

### Description

You can dynamically group documents using logic via a [`case`](https://coralogix.com/docs/dataprime/language-reference/functions-reference/cases/case/index.md) expression. This is useful for creating buckets like "fast", "slow", and "critical".

### Syntax

```dataprime
groupby case {
  <condition1> -> <label1>,
  <condition2> -> <label2>,
  _ -> <default_label>
} as <alias>
aggregate count() as <alias>
```

### Example: Group spans into latency tiers

#### Sample data

```json
{ "duration": 1200 }
{ "duration": 800 }
{ "duration": 300 }
```

#### Query

```dataprime
source spans
| groupby case {
    duration > 1000 -> 'super slow',
    duration > 500 -> 'slow',
    _ -> 'ok'
  } as latency_level
| aggregate count() as request_count
```

#### Result

| latency_level | request_count |
| ------------- | ------------- |
| super slow    | 1             |
| slow          | 2             |
| ok            | 8             |

______________________________________________________________________

## Aggregate over spans (traces)

### Description

You can aggregate directly on spans using basic metrics like count and max. This is often used to get an overview of tracing volume and performance.

### Example: Count spans and find longest one

#### Sample data

```json
{ "duration": 8000 }
{ "duration": 20000 }
{ "duration": 1000 }
```

#### Query

```dataprime
source spans
| aggregate
    count() as trace_count,
    max(duration) as max_trace_duration
```

#### Result

| trace_count | max_trace_duration |
| ----------- | ------------------ |
| 3           | 20000              |

This is helpful for validating trace ingestion and spotting extreme latency outliers.

______________________________________________________________________

## Common pitfalls or gotchas

- `groupby` removes all fields not explicitly included. Add fields as grouping keys or aggregations if you want to preserve them.
- Aggregations like `avg`, `count_if`, and `max` ignore `null` values. Make sure fields you're aggregating on are populated.
- You can only group by scalar values—arrays and objects are not supported unless flattened or transformed first.
