Using DataPrime to isolate and shape logs for deeper analysis
Goal
By the end of this guide you should be able to use filter
, block
, choose
, and create
to isolate relevant log data, transform it, and prepare it for further analysis.
Why it matters
When debugging issues or investigating anomalies, you’ll rarely get what you need from a single filter
. Real investigations require peeling back layers: filtering what matters, cutting what doesn’t, shaping the remaining data, and adding context for further questions. This guide shows you how to combine multiple DataPrime commands into a focused, intermediate-level workflow.
Filter logs based on key conditions
Description
Use filter
to include only logs that meet specific conditions. This is usually your first step in narrowing down the dataset to only the relevant events. You can use simple comparisons, functions like ipInSubnet
, or even combine multiple conditions with &&
and ||
.
Syntax
Example: Keep only internal IP traffic
Sample data
{ "ip_address": "10.8.0.45", "status_code": 200 }
{ "ip_address": "192.168.1.10", "status_code": 500 }
Query
Result
Only the logs where ip_address
falls inside the 10.8.0.0/16
subnet are kept.
Remove unwanted noise with block
Description
Use block
to explicitly exclude logs that match a condition. It works as the inverse of filter
. If the condition is true, the log is discarded. This is helpful for trimming down “happy path” events such as successful HTTP requests.
Syntax
Example: Drop successful requests
Sample data
Query
Result
All logs with 2xx
status codes are removed, leaving only errors and other non-successful responses.
Reduce and standardize the shape with choose
Description
Logs often contain many fields, and different sources may name the same field inconsistently. Use choose
to project only the fields you care about, while renaming or unifying them. Combine it with functions like firstNonNull
to standardize schema across sources.
Syntax
Example: Unify user ID field names
Sample data
{ "user_id": "123", "path": "/checkout", "status_code": 200 }
{ "userId": "456", "path": "/home", "status_code": 500 }
Query
Result
{ "canonical_user_id": "123", "path": "/checkout", "status_code": 200 }
{ "canonical_user_id": "456", "path": "/home", "status_code": 500 }
Now all records share the same standardized canonical_user_id
field.
Add computed or contextual fields with create
Description
Use create
to enrich your logs with new fields. These can be derived from existing values, lookups, or constants. This helps prepare logs for deeper analysis, without repeatedly recalculating the same expressions.
Syntax
Example 1: Tag internal IP traffic
Sample data
Query
Result
{ "ip_address": "10.1.2.3", "is_internal": true }
{ "ip_address": "203.0.113.8", "is_internal": false }
Example 2: Generate batch IDs
Sample data
Query
Result
{ "event": "login", "user": "alice", "analysis_batch_id": "a17f8f0c-5b2c-4c9f-a96a-2d4e93c5e678" }
{ "event": "purchase", "user": "bob", "analysis_batch_id": "e39b6a90-0b71-4427-8f53-1a2c5fa47de0" }
Each log gets a unique identifier, useful for tagging export batches or investigations.
Common pitfalls
When shaping and isolating logs with filter
, block
, choose
, and create
, a few issues come up often:
- Mixing up
filter
andblock
:filter
keeps matching events, whileblock
removes them. - Null values in conditions:
null
works only on scalar values (strings, numbers, timestamps). - Overwriting fields with
create
:create
overwrites existing fields if the key already exists. - Performance trade-offs: Running expensive functions (e.g., regex
extract
,ipInSubnet
) inside afilter
can slow queries on large datasets. Where possible, pre-filter with simpler conditions to minimize the scanned set. - Ambiguous field names: Inconsistent field naming (like
user_id
vs.userId
) can cause incomplete results. Use helpers such asfirstNonNull
to standardize schema.