How to use DataPrime to detect patterns and anomalies in your data
Goal
By the end of this guide you should be able to use DataPrime to filter, group, and statistically analyze your data to uncover frequent patterns and rare anomalies. You'll learn how to apply operators like matches, countby, percentile, and bottom to identify expected and unexpected behavior in your logs.
Why it matters
Every system has a baseline of "normal" behavior—and detecting when things deviate is critical. Whether you're tracking user behavior, spotting error spikes, or monitoring rare edge cases, DataPrime gives you powerful tools to surface these patterns in real time or retroactively.
matches – Filter logs that match a regular expression
Description
Use matches to test if a string field matches a given regular expression. This is useful for detecting structured patterns in log messages.
Syntax
Example: Detect well-formed user purchase logs
Description: Filter logs where the msg field matches the pattern "User [Name] bought [Number] [Item]".
Sample data
{ "msg": "User Chris bought 10 sunglasses" }
{ "msg": "User James bought 1 bed" }
{ "msg": "User X bo" }
Query
Result
Example: Detect malformed or truncated messages
Description: Flip the logic using !matches to find logs that deviate from the expected pattern.
Query
Result
countby – Count grouped values
Description
Use countby to group documents by a key and count how many times each value appears. Ideal for identifying common or rare categories.
Syntax
Example: Count logs by HTTP status code
Sample data
{ "status_code": 200 }
{ "status_code": 404 }
{ "status_code": 500 }
{ "status_code": 200 }
{ "status_code": 403 }
Query
Result
| status_code | status_count |
|---|---|
| 200 | 2 |
| 404 | 1 |
| 500 | 1 |
| 403 | 1 |
Example: Group status codes into buckets
Query
filter responseStatus.code != null
| create status_range from
case_greaterthan {
responseStatus.code,
499 -> '5xx',
399 -> '4xx',
299 -> '3xx',
199 -> '2xx',
99 -> '1xx',
_ -> 'other'
}
| countby status_range into status_count
Result
| status_range | status_count |
|---|---|
| 2xx | 6619 |
| 4xx | 172 |
| 5xx | 101 |
percentile – Find statistical outliers
Description
Use percentile to calculate the Nth percentile of a numeric field within a group. Useful for identifying slowest responses or largest payloads.
Syntax
Example: Find 99th percentile latency by path
Sample data
{ "operationName": "SELECT", "duration": 120 }
{ "operationName": "SELECT", "duration": 350 }
{ "operationName": "DELETE", "duration": 90 }
{ "operationName": "SELECT", "duration": 900 }
Query
Result
| path | latency_p99 |
|---|---|
| /home | 900 |
| /about | 90 |
bottom – Return the least frequent items
Description
Use bottom to return the bottom N values based on count or another metric. This is ideal for spotting rarely used accounts, endpoints, or hosts.
Syntax
Example: Least active users by count
Sample data
Query
Result
| user | count |
|---|---|
| Bob | 1 |
| Charlie | 1 |
Example: Least active users by total time
Sample data
{ "user": "Alice", "duration_ms": 500 }
{ "user": "Alice", "duration_ms": 800 }
{ "user": "Bob", "duration_ms": 100 }
{ "user": "Charlie", "duration_ms": 200 }
Query
Result
| user | activity_count | total_time |
|---|---|---|
| Bob | 1 | 100 |
| Charlie | 1 | 200 |
Visualize result
Each of the above commands transforms your dataset and gives you focused insight:
matchesfilters for structural conformity.countbyhelps visualize dominant categories.percentilehighlights statistical outliers.bottomfinds rare or underused values.
You should see logs, tables, or metrics showing these patterns clearly—ready for dashboards or alerts.
Common pitfalls
- Too narrow filters: If your regular expressions or filters are too strict, you might exclude valuable data. Try starting broad, then refining.
- Misusing
bottom: Without an explicitbyclause,bottommay return confusing results. Always pair it withcount()or another aggregation.