Snapshot Analysis

sz_snapshot takes a point-in-time snapshot of the entity resolution results and generates summary reports. These reports answer high-level questions about the data: how many entities exist, how records are distributed across data sources, and where cross-source matches occur.

To run sz_snapshot, see Step 3: Take a snapshot in Truth Set Setup .

The examples and screenshots on this page are based on the truth set demo data. ENTITY_ID values in your database will most likely differ from those shown here, as they depend on load order. Use the ENTITY_ID values returned by your commands in subsequent steps. If you are using the truth set, DATA_SOURCE and RECORD_ID values will be the same.

Viewing snapshot results in `sz_explorer`

To view the snapshot reports interactively, load the snapshot file when starting sz_explorer:

sz_explorer -s truthset_snapshot.json

Or load it after sz_explorer is already running:

load truthset_snapshot.json

This unlocks the snapshot report commands: data_source_summary, cross_source_summary, entity_source_summary, entity_size_breakdown, and principles_used. Each command displays a table with a prompt to drill into specific rows for more detail.

`data_source_summary`

The data_source_summary command shows how records from each data source resolved into entities.

data_source_summary table

Column	Description
Data Source	Name of the data source
Records	Total number of records from this data source
Entities	Number of distinct entities these records resolved to
Compression	Percentage of records that were duplicates (higher means more deduplication)
Matched Records	Number of records that matched at least one other record
Matched Entities	Number of entities containing matched records
Ambiguous Matches	Entities where a record could plausibly belong to more than one entity
Possible Matches	Entities sharing important attributes but with some disagreements
Possible Relationships	Entities related through lesser attributes like shared addresses

An entity count significantly lower than the record count indicates many records belong to the same real-world person or organization. The Compression column shows the percentage of records that were duplicates within each data source.

A high compression ratio in a data source often indicates data quality issues. Use sz_explorer to drill into specific entities and understand why records resolved together.

Selecting a data source drills into its match level breakdown:

data_source_summary match level

This shows how the matched "DATA_SOURCE": "CUSTOMERS" records break down by match level. Most matches are high-confidence Matches, with smaller counts for Possible Matches and Possible Relationships.

Selecting a match level drills into the match keys that fired:

data_source_summary match keys

Each match key shows which combination of attributes caused the match. For example, +NAME+DOB means a name and date of birth matched, while +NAME+ADDRESS means a name and address matched.

Selecting a match key drills into individual entities:

data_source_summary entity detail

The entity detail view shows all records resolved to this entity, including the specific attribute values that drove the match.

`cross_source_summary`

The cross_source_summary command shows matches between records from different data sources, revealing connections that span data silos.

cross_source_summary table

Column	Description
From Data Source	The originating data source in the cross-source pair
To Data Source	The target data source in the cross-source pair
Matched Records	Number of records that matched across these two data sources
Matched Entities	Number of entities containing cross-source matched records
Ambiguous Matches	Cross-source matches where entity membership is uncertain
Possible Matches	Cross-source entities sharing important attributes but with disagreements
Possible Relationships	Cross-source entities related through lesser attributes

Cross-source matches drive compliance screening, fraud detection, and risk assessment workflows. For example, "DATA_SOURCE": "CUSTOMERS"-to-"DATA_SOURCE": "WATCHLIST" matches represent customer records that resolved to the same entity as a known risk entry.

Cross-source matches depend on having multiple data sources loaded. If only one data source was loaded, this section will be empty. The truth set demo includes three data sources specifically to demonstrate cross-source analysis.

Selecting a data source pair drills into its match level breakdown:

cross_source_summary match level

The "DATA_SOURCE": "CUSTOMERS"-to-"DATA_SOURCE": "WATCHLIST" pair shows the count of matches at each confidence level. High-confidence Matches indicate records that Senzing is confident belong to the same real-world entity across these two data sources.

Selecting a match level drills into the match keys:

cross_source_summary match keys

The match keys show which attributes drove each cross-source match and what types of identifying information connect records across data sources.

Selecting a match key drills into individual entities:

cross_source_summary entity detail

This entity resolved records from both "DATA_SOURCE": "CUSTOMERS" and "DATA_SOURCE": "WATCHLIST", meaning the same real-world person appeared in both data sources.

`entity_source_summary`

The entity_source_summary command groups entities by which combination of data sources contributed records.

entity_source_summary table

Column	Description
Data Sources	The combination of data sources that contributed records to entities in this group
Entities	Number of entities composed of records from exactly this combination of sources

Selecting a row drills into the entities for that source combination, showing a paginated entity list with full entity detail:

entity_source_summary drilldown

`entity_size_breakdown`

The entity_size_breakdown command shows the distribution of how many records make up each entity.

entity_size_breakdown table

Column	Description
Size Group	Number of records per entity in this group
Entity Count	Number of entities with this many records
Review Count	Number of entities flagged for review due to feature anomalies
Review Features	Feature types that triggered the review flag (e.g., GENDER, DOB, ADDRESS)

The Entity Count column shows how many entities exist at each size. Entities in Size Group 1 are “singletons,” records that did not match any other record in the system. Large entities (those with many records) should be reviewed. They may represent:

Legitimate matches: A long-time customer with records across multiple systems and name/address changes over the years.
Over-resolution: Records that Senzing matched but that belong to different people. Use the how command in sz_explorer to review how records joined an entity step by step.
Data quality issues: Duplicate submissions, test records, or data entry errors inflating entity size.

The Review Count and Review Features columns flag entities that contain more of a specific attribute than expected. For example, 5 entities in Size Group 2 have conflicting GENDER values, 1 entity in Size Group 4 has conflicting DOB values, and 1 entity in Size Group 5 has conflicting ADDRESS values.

Entities flagged in the Review Features column typically indicate one of three situations:

Data quality problems: Typos, misspellings, or bad data causing attribute conflicts within a legitimately resolved entity.
Intentional obfuscation: Someone altered identifying information to avoid detection, resulting in conflicting attributes.
Overmatching: Records that belong to separate people were incorrectly resolved into the same entity.

Selecting a row drills into the entities of that size, showing a paginated entity list with full entity detail:

entity_size_breakdown drilldown

From the entity detail view, press H to run how and see the step-by-step resolution path:

entity_size_breakdown how

The how decision tree shows how each record entered the entity, including the MATCH_KEY and principle that fired at each step. This is useful for large or flagged entities to see which attributes drove the resolution. For a full walkthrough of the how command and its different views, see Using how .

`principles_used`

The principles_used report shows how many entity relationships were established at each match level. Senzing performs Principle Based Entity Resolution where each match level represents a different level of confidence.

principles_used table

Column	Description
Match level	The confidence level at which entities were related
Count	Number of entity relationships at this match level

Selecting a match level drills into the specific principles, then into match keys, and finally into individual entity detail:

principles_used drilldown

Match Level	What It Means
Matches	Records resolved to the same entity with high confidence.
Possible matches	Records that share enough in common to warrant review but did not resolve.
Possibly related	Records that appear to be related (such as family members) but are distinct entities.
Ambiguous matches	Records that could plausibly belong to the same entity but the evidence is not definitive.
Disclosed relations	Known relationships declared in the input data (for example, a company and its subsidiaries).

Using snapshot analysis effectively

Snapshot reports are most valuable when compared over time or against known baselines:

Initial assessment: Run a snapshot after loading data to establish a baseline.
After changes: After updating data mapping or adjusting entity resolution configuration with Senzing Support , take a new snapshot to measure the impact.
Ongoing monitoring: Periodic snapshots track data quality trends as new records are added and existing records are updated or deleted.

Next steps

If you have any questions, contact Senzing Support. Support is 100% FREE!

Snapshot Analysis

Viewing snapshot results in sz_explorer

data_source_summary

cross_source_summary

entity_source_summary

entity_size_breakdown

principles_used

Using snapshot analysis effectively

Next steps

Viewing snapshot results in `sz_explorer`

`data_source_summary`

`cross_source_summary`

`entity_source_summary`

`entity_size_breakdown`

`principles_used`