Skip to content

Using DataPrime to isolate and shape logs for deeper analysis

Goal

By the end of this guide you should be able to use filter, block, choose, and create to isolate relevant log data, transform it, and prepare it for further analysis.

Why it matters

When debugging issues or investigating anomalies, you’ll rarely get what you need from a single filter. Real investigations require peeling back layers: filtering what matters, cutting what doesn’t, shaping the remaining data, and adding context for further questions. This guide shows you how to combine multiple DataPrime commands into a focused, intermediate-level workflow.


Filter logs based on key conditions

Description

Use filter to include only logs that meet specific conditions. This is usually your first step in narrowing down the dataset to only the relevant events. You can use simple comparisons, functions like ipInSubnet, or even combine multiple conditions with && and ||.

Syntax

filter <boolean_expression>

Example: Keep only internal IP traffic

Sample data

{ "ip_address": "10.8.0.45", "status_code": 200 }
{ "ip_address": "192.168.1.10", "status_code": 500 }

Query

filter ipInSubnet(ip_address, '10.8.0.0/16')

Result

{ "ip_address": "10.8.0.45", "status_code": 200 }

Only the logs where ip_address falls inside the 10.8.0.0/16 subnet are kept.


Remove unwanted noise with block

Description

Use block to explicitly exclude logs that match a condition. It works as the inverse of filter. If the condition is true, the log is discarded. This is helpful for trimming down “happy path” events such as successful HTTP requests.

Syntax

block <boolean_expression>

Example: Drop successful requests

Sample data

{ "status_code": "200", "path": "/health" }
{ "status_code": "500", "path": "/login" }

Query

block status_code.startsWith('2')

Result

{ "status_code": "500", "path": "/login" }

All logs with 2xx status codes are removed, leaving only errors and other non-successful responses.


Reduce and standardize the shape with choose

Description

Logs often contain many fields, and different sources may name the same field inconsistently. Use choose to project only the fields you care about, while renaming or unifying them. Combine it with functions like firstNonNull to standardize schema across sources.

Syntax

choose <field1> [as alias], <field2>, ...

Example: Unify user ID field names

Sample data

{ "user_id": "123", "path": "/checkout", "status_code": 200 }
{ "userId": "456", "path": "/home", "status_code": 500 }

Query

choose firstNonNull(user_id, userId) as canonical_user_id, path, status_code

Result

{ "canonical_user_id": "123", "path": "/checkout", "status_code": 200 }
{ "canonical_user_id": "456", "path": "/home", "status_code": 500 }

Now all records share the same standardized canonical_user_id field.


Add computed or contextual fields with create

Description

Use create to enrich your logs with new fields. These can be derived from existing values, lookups, or constants. This helps prepare logs for deeper analysis, without repeatedly recalculating the same expressions.

Syntax

create <new_field> from <expression>

Example 1: Tag internal IP traffic

Sample data

{ "ip_address": "10.1.2.3" }
{ "ip_address": "203.0.113.8" }

Query

create is_internal from ipInSubnet(ip_address, '10.0.0.0/8')

Result

{ "ip_address": "10.1.2.3", "is_internal": true }
{ "ip_address": "203.0.113.8", "is_internal": false }

Example 2: Generate batch IDs

Sample data

{ "event": "login", "user": "alice" }
{ "event": "purchase", "user": "bob" }

Query

create analysis_batch_id from randomUuid()

Result

{ "event": "login", "user": "alice", "analysis_batch_id": "a17f8f0c-5b2c-4c9f-a96a-2d4e93c5e678" }
{ "event": "purchase", "user": "bob", "analysis_batch_id": "e39b6a90-0b71-4427-8f53-1a2c5fa47de0" }

Each log gets a unique identifier, useful for tagging export batches or investigations.


Common pitfalls

When shaping and isolating logs with filter, block, choose, and create, a few issues come up often:

  • Mixing up filter and block: filter keeps matching events, while block removes them.
  • Null values in conditions: null works only on scalar values (strings, numbers, timestamps).
  • Overwriting fields with create: create overwrites existing fields if the key already exists.
  • Performance trade-offs: Running expensive functions (e.g., regex extract, ipInSubnet) inside a filter can slow queries on large datasets. Where possible, pre-filter with simpler conditions to minimize the scanned set.
  • Ambiguous field names: Inconsistent field naming (like user_id vs. userId) can cause incomplete results. Use helpers such as firstNonNull to standardize schema.