Using DataPrime to clean and normalize data with functions
Goal
By the end of this guide you should be able to use the firstNonNull, ipInSubnet, parseInterval, and urlDecode functions to clean, transform, and standardize inconsistent or encoded data in your logs.
Why it matters
Raw logs are rarely consistent. Fields can be missing, renamed, or encoded—making it hard to search, group, or build alerts. These functions let you create structure out of inconsistency, enriching your logs without requiring upstream changes.
firstNonNull – Use the first available non-null value
Description
Use firstNonNull to return the first non-null value from a list of scalar fields. Ideal for normalizing inconsistent key names or fallback values.
Syntax
Example – Normalize user identifiers across schemas
Sample data
{ "userId": "123", "user_id": null, "user_identifier": null }
{ "userId": null, "user_id": "456", "user_identifier": null }
{ "userId": null, "user_id": null, "user_identifier": "789" }
Query
Result
Use this when multiple versions of a field may exist, and you want to unify them under a single key.
ipInSubnet – Check if an IP address belongs to a given subnet
Description
ipInSubnet tests whether an IP address is within a given CIDR block. Useful for filtering internal vs external traffic, identifying environments, or tagging traffic by region.
Syntax
Example – Filter logs by internal IP range
Sample data
Query
Result
Only logs where the IP is within 10.0.0.0/8 are retained. Useful for segmenting or excluding traffic.
parseInterval – Convert a string duration to an interval type
Description
With parseInterval you can parse duration strings like "2d3h" into an interval so you can perform time-based arithmetic (e.g., add to a timestamp). Enables accurate time math for scheduling, tracking, and alerts.
Syntax
Example – Add a duration to a start time
Sample data
Query
Result
{
"timestamp": 1728763337,
"completed_in": "35m10s",
"completed_time": 1728782447 // +2110 seconds
}
Now the log includes a concrete end time, useful for SLA tracking or time-based filtering.
urlDecode – Decode URL-encoded values
Description
Decode percent-encoded characters (e.g., %20) back into human-readable form with urlDecode. Use when logs contain encoded query strings or URLs.
Syntax
Example – Decode a URL-encoded name field
Sample data
Query
extract query_string into query_params using kv(pair_delimiter='&', key_delimiter='=')
| replace query_params.name with urlDecode(query_params.name)
Result
Now your logs contain readable names and values, improving searchability and visualization.
Common pitfalls
firstNonNullonly works on scalar values (not arrays or objects).ipInSubnetrequires valid CIDR strings like'192.168.0.0/24'.parseIntervalfails silently if the input format is invalid (1s1dis not allowed).urlDecodeworks only on strings—decoding must be done field by field.