Skip to content

engine.schema_fields

Purpose

The engine.schema_fields dataset captures metadata and historical snapshots of dataset schemas within Coralogix's system dataspace. This dataset provides a detailed record of a dataset's structural evolution over time, allowing users to track changes to dataset fields, their types, and other critical attributes. It is designed to help teams understand how the structure of a dataset has evolved, monitor data schema changes, and ensure that datasets maintain consistency across different versions.

The schema metadata includes information such as the field types, partitioning schemes, and labels, along with additional contextual data, such as dataset names and snapshot timestamps. This dataset is crucial for debugging, auditing schema changes, and ensuring data governance across datasets.

Schema description

Full JSON pathField data typeField data exampledescription
dataprimePathString"$d.dataset"Path used for Dataprime processing.
datasetString"engine.schema_fields"The dataset name.
dataspaceString"system"Dataspace this dataset belongs to.
distinctValueCountNumber1Number of distinct values observed for this field.
examplesArray\["rumEventsPoc"]Example values seen in the data.
flatPathString"$d.dataset"Flattened path representation used for referencing contexts.
labelsObject{ "category":"io", "className":"com.acme.Handler", "methodName":"handle", "thread":"pool-1-7" }Optional labels associated with the field.
labels.categoryString"io"Label describing a category for the field.
labels.classNameString"com.acme.Handler"Fully qualified class name (if applicable).
labels.methodNameString"handle"Method name (if applicable).
labels.threadString"pool-1-7"Thread identifier/name (if applicable).
metadataObject{ "entityType":"log", "pillar":"observability", "priorityClass":"Medium", "severity":"3" }Additional metadata about the field.
metadata.entityTypeString"log"Logical entity type for this field.
metadata.pillarString"observability"Product/solution pillar associated with the field.
metadata.priorityClassString"Medium"Priority classification.
metadata.severityString"3"Severity level.
partitioningSchemeString"dt/hr"Partitioning scheme of the dataset.
pathPartsArray\["$d","dataset"]Path breakdown used for data reference resolution.
snapshotDurationString (ns)"3600000000000"Duration of the snapshot in nanoseconds (string-encoded).
snapshotIdString (UUID)"043c5e02-0924-45f0-83dd-2aa5659e323e"Unique identifier of the dataset snapshot.
snapshotStartTimeNumber (ns since epoch)1750611600000000000Start time of the snapshot in epoch nanoseconds.
typeString"string"Field’s logical/type classification.

How the data in this dataset can be used

Tracking schema changes over time

By querying the snapshotStartTime fields, users can track how a dataset's schema has evolved over time.

Example query:

source system/engine.schema_fields
| groupby snapshotStartTime
    aggregate count() as snapshots
| sortby snapshotStartTime asc

Auditing schema field types and values

By summarizing schema records for each dataset, you can reveal how many different data types appear, then rank datasets by those with the greatest type variety.

Example query:

source system/engine.schema_fields
| groupby dataset
    aggregate
        distinct_count(type) as type_count,
        any_value(type) as sample_type
| orderby type_count desc

Monitoring dataset changes for compliance

By focusing on INFO-level schema events and tallying unique snapshotId values per dataset, you can highlight which datasets have experienced the most schema changes, and list those with the highest change-counts first.

Example query:

source system/engine.schema_fields
| filter $m.severity == INFO
| groupby dataset
    aggregate distinct_count(snapshotId) as high_severity_changes
| sortby high_severity_changes desc

engine.schema_fields schema

{ engine.schema_fields
Schema field metadata and characteristics for a dataset in the 'system' dataspace.
dataspace

Indicates the dataspace this dataset belongs to. Example: "system".

dataset

The dataset name. Example: "engine.schema_fields".

flatPath

Flattened path representation used for referencing contexts. Example: "$d.dataset".

dataprimePath

Path used for Dataprime processing. Example: "$d.dataset".

[ pathParts
Path breakdown used for data reference resolution.
pathParts[0]

The first element of the path. Example: "$d".

pathParts[1]

The second element of the path. Example: "dataset".

]
snapshotId

Unique identifier of the dataset snapshot. Example: "043c5e02-0924-45f0-83dd-2aa5659e323e".

snapshotStartTime

Start time of the snapshot in epoch nanoseconds. Example: 1750611600000000000.

snapshotDuration

Duration of the snapshot in epoch nanoseconds. Example: "3600000000000".

partitioningScheme

Partitioning scheme of the dataset. Example: "dt/hr".

type

Type of the field. Example: "string".

[ examples

Example values of the dataset.

Example[0]

Example: ["rumEventsPoc"].

]
distinctValueCount

Number of distinct values. Example: 1.

labels

Optional labels associated with the field.

highCardinalityLabels

High-cardinality labels, if any.

highCardinalityLabels[0]
highCardinalityLabels[0]
{ metadata
Additional metadata.
entityType
pillar
priorityClass

Priority classification. Example: "Medium".

severity

Severity level. Example: "3".

}
}