Skip to content

How to use DataPrime to detect patterns and anomalies in your data

Goal

By the end of this guide you should be able to use DataPrime to filter, group, and statistically analyze your data to uncover frequent patterns and rare anomalies. You'll learn how to apply operators like matches, countby, percentile, and bottom to identify expected and unexpected behavior in your logs.

Why it matters

Every system has a baseline of "normal" behavior—and detecting when things deviate is critical. Whether you're tracking user behavior, spotting error spikes, or monitoring rare edge cases, DataPrime gives you powerful tools to surface these patterns in real time or retroactively.


matches – Filter logs that match a regular expression

Description

Use matches to test if a string field matches a given regular expression. This is useful for detecting structured patterns in log messages.

Syntax

matches(string: string, pattern: regexp): bool

Example: Detect well-formed user purchase logs

Description: Filter logs where the msg field matches the pattern "User [Name] bought [Number] [Item]".

Sample data

{ "msg": "User Chris bought 10 sunglasses" }
{ "msg": "User James bought 1 bed" }
{ "msg": "User X bo" }

Query

filter matches(msg, /User [A-Za-z]+ bought \d+ [A-Za-z]+/)

Result

{ "msg": "User Chris bought 10 sunglasses" }
{ "msg": "User James bought 1 bed" }

Example: Detect malformed or truncated messages

Description: Flip the logic using !matches to find logs that deviate from the expected pattern.

Query

filter !matches(msg, /User [A-Za-z]+ bought \d+ [A-Za-z]+/)

Result

{ "msg": "User X bo" }

countby – Count grouped values

Description

Use countby to group documents by a key and count how many times each value appears. Ideal for identifying common or rare categories.

Syntax

countby <expression> [as <alias>] [into <keypath>]

Example: Count logs by HTTP status code

Sample data

{ "status_code": 200 }
{ "status_code": 404 }
{ "status_code": 500 }
{ "status_code": 200 }
{ "status_code": 403 }

Query

countby status_code into status_count

Result
status_codestatus_count
2002
4041
5001
4031


Example: Group status codes into buckets

Query

filter responseStatus.code != null
| create status_range from 
    case_greaterthan {
      responseStatus.code,
      499 -> '5xx',
      399 -> '4xx',
      299 -> '3xx',
      199 -> '2xx',
      99  -> '1xx',
      _   -> 'other'
    }
| countby status_range into status_count

Result
status_rangestatus_count
2xx6619
4xx172
5xx101


percentile – Find statistical outliers

Description

Use percentile to calculate the Nth percentile of a numeric field within a group. Useful for identifying slowest responses or largest payloads.

Syntax

percentile(percent: number, value: number): number

Example: Find 99th percentile latency by path

Sample data

{ "operationName": "SELECT", "duration": 120 }
{ "operationName": "SELECT", "duration": 350 }
{ "operationName": "DELETE", "duration": 90 }
{ "operationName": "SELECT", "duration": 900 }

Query

source spans
| groupby path agg percentile(0.99, latency) as latency_p99

Result
pathlatency_p99
/home900
/about90


bottom – Return the least frequent items

Description

Use bottom to return the bottom N values based on count or another metric. This is ideal for spotting rarely used accounts, endpoints, or hosts.

Syntax

bottom <limit> <expression> by <aggregation>

Example: Least active users by count

Sample data

{ "user": "Alice" }
{ "user": "Bob" }
{ "user": "Alice" }
{ "user": "Charlie" }

Query

bottom 2 user by count()

Result
usercount
Bob1
Charlie1


Example: Least active users by total time

Sample data

{ "user": "Alice", "duration_ms": 500 }
{ "user": "Alice", "duration_ms": 800 }
{ "user": "Bob", "duration_ms": 100 }
{ "user": "Charlie", "duration_ms": 200 }

Query

bottom 2 user, count() as activity_count by sum(duration_ms) as total_time

Result
useractivity_counttotal_time
Bob1100
Charlie1200


Visualize result

Each of the above commands transforms your dataset and gives you focused insight:

  • matches filters for structural conformity.
  • countby helps visualize dominant categories.
  • percentile highlights statistical outliers.
  • bottom finds rare or underused values.

You should see logs, tables, or metrics showing these patterns clearly—ready for dashboards or alerts.


Common pitfalls

  • Too narrow filters: If your regular expressions or filters are too strict, you might exclude valuable data. Try starting broad, then refining.
  • Misusing bottom: Without an explicit by clause, bottom may return confusing results. Always pair it with count() or another aggregation.