engine.schema_fields
Purpose
The engine.schema_fields
dataset captures metadata and historical snapshots of dataset schemas within Coralogix's system
dataspace. This dataset provides a detailed record of a dataset's structural evolution over time, allowing users to track changes to dataset fields, their types, and other critical attributes. It is designed to help teams understand how the structure of a dataset has evolved, monitor data schema changes, and ensure that datasets maintain consistency across different versions.
The schema metadata includes information such as the field types, partitioning schemes, and labels, along with additional contextual data, such as dataset names and snapshot timestamps. This dataset is crucial for debugging, auditing schema changes, and ensuring data governance across datasets.
Schema description
Full JSON path | Field data type | Field data example | description |
---|---|---|---|
dataprimePath | String | "$d.dataset" | Path used for Dataprime processing. |
dataset | String | "engine.schema_fields" | The dataset name. |
dataspace | String | "system" | Dataspace this dataset belongs to. |
distinctValueCount | Number | 1 | Number of distinct values observed for this field. |
examples | Array\ | ["rumEventsPoc"] | Example values seen in the data. |
flatPath | String | "$d.dataset" | Flattened path representation used for referencing contexts. |
labels | Object | { "category":"io", "className":"com.acme.Handler", "methodName":"handle", "thread":"pool-1-7" } | Optional labels associated with the field. |
labels.category | String | "io" | Label describing a category for the field. |
labels.className | String | "com.acme.Handler" | Fully qualified class name (if applicable). |
labels.methodName | String | "handle" | Method name (if applicable). |
labels.thread | String | "pool-1-7" | Thread identifier/name (if applicable). |
metadata | Object | { "entityType":"log", "pillar":"observability", "priorityClass":"Medium", "severity":"3" } | Additional metadata about the field. |
metadata.entityType | String | "log" | Logical entity type for this field. |
metadata.pillar | String | "observability" | Product/solution pillar associated with the field. |
metadata.priorityClass | String | "Medium" | Priority classification. |
metadata.severity | String | "3" | Severity level. |
partitioningScheme | String | "dt/hr" | Partitioning scheme of the dataset. |
pathParts | Array\ | ["$d","dataset"] | Path breakdown used for data reference resolution. |
snapshotDuration | String (ns) | "3600000000000" | Duration of the snapshot in nanoseconds (string-encoded). |
snapshotId | String (UUID) | "043c5e02-0924-45f0-83dd-2aa5659e323e" | Unique identifier of the dataset snapshot. |
snapshotStartTime | Number (ns since epoch) | 1750611600000000000 | Start time of the snapshot in epoch nanoseconds. |
type | String | "string" | Field’s logical/type classification. |
How the data in this dataset can be used
Tracking schema changes over time
By querying the snapshotStartTime
fields, users can track how a dataset's schema has evolved over time.
Example query:
source system/engine.schema_fields
| groupby snapshotStartTime
aggregate count() as snapshots
| sortby snapshotStartTime asc
Auditing schema field types and values
By summarizing schema records for each dataset, you can reveal how many different data types appear, then rank datasets by those with the greatest type variety.
Example query:
source system/engine.schema_fields
| groupby dataset
aggregate
distinct_count(type) as type_count,
any_value(type) as sample_type
| orderby type_count desc
Monitoring dataset changes for compliance
By focusing on INFO
-level schema events and tallying unique snapshotId values per dataset, you can highlight which datasets have experienced the most schema changes, and list those with the highest change-counts first.
Example query:
source system/engine.schema_fields
| filter $m.severity == INFO
| groupby dataset
aggregate distinct_count(snapshotId) as high_severity_changes
| sortby high_severity_changes desc
engine.schema_fields
schema
Indicates the dataspace this dataset belongs to. Example: "system".
The dataset name. Example: "engine.schema_fields".
Flattened path representation used for referencing contexts. Example: "$d.dataset".
Path used for Dataprime processing. Example: "$d.dataset".
The first element of the path. Example: "$d".
The second element of the path. Example: "dataset".
Unique identifier of the dataset snapshot. Example: "043c5e02-0924-45f0-83dd-2aa5659e323e".
Start time of the snapshot in epoch nanoseconds. Example: 1750611600000000000.
Duration of the snapshot in epoch nanoseconds. Example: "3600000000000".
Partitioning scheme of the dataset. Example: "dt/hr".
Type of the field. Example: "string".
Example values of the dataset.
Example: ["rumEventsPoc"].
Number of distinct values. Example: 1.
Optional labels associated with the field.
High-cardinality labels, if any.
Priority classification. Example: "Medium".
Severity level. Example: "3".