Skip to content

extract - Parse strings into objects

The extract function allows you to transform raw strings into structured data by parsing out embedded values and storing them as objects. It supports various extraction strategies to convert unstructured fields into clean, queryable formats.

Syntax

(e|extract) <expression> into <keypath> using <extraction-type>(<extraction-params>) [datatypes keypath:datatype,keypath:datatype,...]

Extractor functions

Extractor functions define how raw strings are parsed and transformed into structured objects when using the extract keyword. Each function handles a specific format, such as regular expressions, key-value pairs, delimited lists, or escaped JSON. You specify the extractor using the using clause, which determines how the string will be interpreted. With the right extractor, you can convert unstructured data into clean, queryable objects for filtering, analysis, and visualization.

1. regexp

Parses data using named capture groups in a regular expression.

Input

{
  "message": "user Chris has logged in"
}

Query

extract message into user_data using regexp(e=/user (?<user>.*) has logged in/)

Output

{
  "message": "user Chris has logged in",
  "user_data": {
    "user": "Chris"
  }
}

2. multi_regexp

Extracts all matches of a pattern into an array.

Input

{
  "log": "user 1 did 2 things on 3 pages"
}

Query

extract log into numbers using multi_regexp(e=/\d+/)

Output

{
  "log": "user 1 did 2 things on 3 pages",
  "numbers": ["1", "2", "3"]
}

3. kv

Parses a string of key-value pairs into an object.

Input

{
  "query_string": "a=b&b=c&c=d"
}

Query

extract query_string into query_params using kv(pair_delimiter='&', key_delimiter='=')

Output

{
  "query_string": "a=b&b=c&c=d",
  "query_params": {
    "a": "b",
    "b": "c",
    "c": "d"
  }
}

4. jsonobject

Unescapes and parses a stringified JSON object.

Input

{
  "nested_json": "{\"key\": \"value\"}"
}

Query

extract nested_json into parsed_json using jsonobject

Output

{
  "nested_json": "{\"key\": \"value\"}",
  "parsed_json": {
    "key": "value"
  }
}

5. split

Splits a string by a delimiter into an array of primitive values.

Input

{
  "csv_codes": "10,20,30"
}

Query

extract csv_codes into codes using split(delimiter=',', element_datatype=number)

Output

{
  "csv_codes": "10,20,30",
  "codes": [10, 20, 30]
}

Using datatypes to annotate extracted fields

You can provide explicit type annotations to specific fields using the datatypes clause. This ensures values are stored with the correct type, enabling numerical comparisons, aggregations, and more.

Input

{
  "msg": "query_type=fetch query_id=100 query_results_duration_ms=232"
}

Query

extract msg into query_data using kv() datatypes query_results_duration_ms:number

Output

{
  "msg": "query_type=fetch query_id=100 query_results_duration_ms=232",
  "query_data": {
    "query_type": "fetch",
    "query_id": "100",
    "query_results_duration_ms": 232
  }
}

query_results_duration_ms is now a number, while query_id remains a string.

Note

You only need to specify the extracted field name in datatypes, not the full keypath.


Summary

The extract keyword, combined with extractor functions, provides a flexible and powerful way to transform messy strings into usable structured data. Whether you're parsing JSON blobs, splitting CSV-like fields, or decoding regex patterns, the extractor system helps you build clean logs and metrics pipelines in a declarative, readable way.