Snapshot Analysis

sz_snapshot takes a point-in-time snapshot of the entity resolution results and generates summary reports. These reports answer high-level questions about the data: how many entities exist, how records are distributed across data sources, and where cross-source matches occur.

To run sz_snapshot, see Step 3: Take a snapshot in Loading the Truth Set .
The examples and screenshots on this page are based on the truth set demo data. ENTITY_ID values in your database will most likely differ from those shown here, as they depend on load order. Use the ENTITY_ID values returned by your commands in subsequent steps. If you are using the truth set, DATA_SOURCE and RECORD_ID values will be the same.

Viewing snapshot results in sz_explorer

To view the snapshot reports interactively, load the snapshot file when starting sz_explorer:

sz_explorer -s truthset_snapshot.json

Or load it after sz_explorer is already running:

load truthset_snapshot.json

This unlocks the snapshot report commands: data_source_summary, cross_source_summary, entity_source_summary, entity_size_breakdown, and principles_used. Each command displays a table with a prompt to drill into specific rows for more detail.

data_source_summary

The data_source_summary command shows how records from each data source resolved into entities.

data_source_summary table

Column Description
Data Source Name of the data source
Records Total number of records from this data source
Entities Number of distinct entities these records resolved to
Compression Percentage of records that were duplicates (higher means more deduplication)
Matched Records Number of records that matched at least one other record
Matched Entities Number of entities containing matched records
Ambiguous Matches Entities where a record could plausibly belong to more than one entity
Possible Matches Entities sharing important attributes but with some disagreements
Possible Relationships Entities related through lesser attributes like shared addresses

An entity count significantly lower than the record count indicates many records belong to the same real-world person or organization. The Compression column shows the percentage of records that were duplicates within each data source.

A high compression ratio in a data source often indicates data quality issues. Use sz_explorer to drill into specific entities and understand why records resolved together.

Selecting a data source drills into its match level breakdown:

data_source_summary match level

This shows how the matched CUSTOMERS records break down by match level. Most matches are high-confidence Matches, with smaller counts for Possible Matches and Possible Relationships.

Selecting a match level drills into the match keys that fired:

data_source_summary match keys

Each match key shows which combination of attributes caused the match. For example, +NAME+DOB means a name and date of birth matched, while +NAME+ADDRESS means a name and address matched.

Selecting a match key drills into individual entities:

data_source_summary entity detail

The entity detail view shows all records resolved to this entity, including the specific attribute values that drove the match.

cross_source_summary

The cross_source_summary command shows matches between records from different data sources, revealing connections that span data silos.

cross_source_summary table

Column Description
From Data Source The originating data source in the cross-source pair
To Data Source The target data source in the cross-source pair
Matched Records Number of records that matched across these two data sources
Matched Entities Number of entities containing cross-source matched records
Ambiguous Matches Cross-source matches where entity membership is uncertain
Possible Matches Cross-source entities sharing important attributes but with disagreements
Possible Relationships Cross-source entities related through lesser attributes

Cross-source matches drive compliance screening, fraud detection, and risk assessment workflows. For example, CUSTOMERS-to-WATCHLIST matches represent customer records that resolved to the same entity as a known risk entry.

Cross-source matches depend on having multiple data sources loaded. If only one data source was loaded, this section will be empty. The truth set demo includes three data sources specifically to demonstrate cross-source analysis.

Selecting a data source pair drills into its match level breakdown:

cross_source_summary match level

The CUSTOMERS-to-WATCHLIST pair shows the count of matches at each confidence level. High-confidence Matches indicate records that Senzing is confident belong to the same real-world entity across these two data sources.

Selecting a match level drills into the match keys:

cross_source_summary match keys

The match keys show which attributes drove each cross-source match and what types of identifying information connect records across data sources.

Selecting a match key drills into individual entities:

cross_source_summary entity detail

This entity resolved records from both CUSTOMERS and WATCHLIST, meaning the same real-world person appeared in both data sources.

entity_source_summary

The entity_source_summary command groups entities by which combination of data sources contributed records.

entity_source_summary table

Column Description
Data Sources The combination of data sources that contributed records to entities in this group
Entities Number of entities composed of records from exactly this combination of sources

Selecting a row drills into the entities for that source combination, showing a paginated entity list with full entity detail:

entity_source_summary drilldown

entity_size_breakdown

The entity_size_breakdown command shows the distribution of how many records make up each entity.

entity_size_breakdown table

Column Description
Size Group Number of records per entity in this group
Entity Count Number of entities with this many records
Review Count Number of entities flagged for review due to feature anomalies
Review Features Feature types that triggered the review flag (e.g., GENDER, DOB, ADDRESS)

The Entity Count column shows how many entities exist at each size. Entities in Size Group 1 are “singletons,” records that did not match any other record in the system. Large entities (those with many records) should be reviewed. They may represent:

  • Legitimate matches: A long-time customer with records across multiple systems and name/address changes over the years.
  • Over-resolution: Records that Senzing matched but that belong to different people. Use the how command in sz_explorer to review how records joined an entity step by step.
  • Data quality issues: Duplicate submissions, test records, or data entry errors inflating entity size.

The Review Count and Review Features columns flag entities that contain more of a specific attribute than expected. For example, 5 entities in Size Group 2 have conflicting GENDER values, 1 entity in Size Group 4 has conflicting DOB values, and 1 entity in Size Group 5 has conflicting ADDRESS values.

Entities flagged in the Review Features column typically indicate one of three situations:

  • Data quality problems: Typos, misspellings, or bad data causing attribute conflicts within a legitimately resolved entity.
  • Intentional obfuscation: Someone altered identifying information to avoid detection, resulting in conflicting attributes.
  • Overmatching: Records that belong to separate people were incorrectly resolved into the same entity.

Selecting a row drills into the entities of that size, showing a paginated entity list with full entity detail:

entity_size_breakdown drilldown

From the entity detail view, press H to run how and see the step-by-step resolution path:

entity_size_breakdown how

The how decision tree shows how each record entered the entity, including the MATCH_KEY and principle that fired at each step. This is useful for large or flagged entities to see which attributes drove the resolution. For a full walkthrough of the how command and its different views, see Using how .

principles_used

The principles_used report shows how many entity relationships were established at each match level. Senzing performs Principle Based Entity Resolution where each match level represents a different level of confidence.

principles_used table

Column Description
Match level The confidence level at which entities were related
Count Number of entity relationships at this match level

Selecting a match level drills into the specific principles, then into match keys, and finally into individual entity detail:

principles_used drilldown

Match Level What It Means
Matches Records resolved to the same entity with high confidence.
Possible matches Records that share enough in common to warrant review but did not resolve.
Possibly related Records that appear to be related (such as family members) but are distinct entities.
Ambiguous matches Records that could plausibly belong to the same entity but the evidence is not definitive.
Disclosed relations Known relationships declared in the input data (for example, a company and its subsidiaries).

Using snapshot analysis effectively

Snapshot reports are most valuable when compared over time or against known baselines:

  • Initial assessment: Run a snapshot after loading data to establish a baseline.
  • After changes: After updating data mapping or adjusting entity resolution configuration with Senzing Support , take a new snapshot to measure the impact.
  • Ongoing monitoring: Periodic snapshots track data quality trends as new records are added and existing records are updated or deleted.

Next steps

If you have any questions, contact Senzing Support. Support is 100% FREE!