EDA Tools
Senzing includes three Exploratory Data Analysis (EDA) command-line tools for understanding entity resolution results:
| Tool | Purpose | Learn More |
|---|---|---|
sz_explorer |
Interactive CLI for ad-hoc entity search, retrieval, explainability, and export | Entity Exploration |
sz_snapshot |
Takes a snapshot of entity resolution results and generates summary reports | Snapshot Analysis
, Running sz_snapshot
|
sz_audit |
Compares snapshot results against truth set keys to measure precision and recall | Auditing
, Running sz_audit
|
senzingsdk-tools
package, which is also installed as part of senzingsdk-poc
.Truth set data
The truth set contains 159 records across three data sources:
| Data Source | Description | Records |
|---|---|---|
CUSTOMERS |
Primary subjects of interest, such as customers, employees, or vendors. Includes duplicates, name variations, and address changes. | 120 |
REFERENCE |
External data about people (demographics, past addresses, contact methods) or companies (firmographics, corporate structure, ownership) that enriches entity profiles. | 22 |
WATCHLIST |
Entities to screen against, such as known fraud actors or sanctioned parties. | 17 |
EDA tools in action
The following examples use the truth set demo data to illustrate the kinds of questions EDA tools can answer.
Deduplication
The data_source_summary report shows that 86 of 120 "DATA_SOURCE": "CUSTOMERS" records matched other records, compressing into 71 entities. The report shows how many records resolved to the same entity within each data source.

Cross-source screening
The cross_source_summary report shows 11 "DATA_SOURCE": "CUSTOMERS" records matched against "DATA_SOURCE": "WATCHLIST" across 6 entities, identifying entities that appear in both data sources.

Ambiguous match investigation
The why command in sz_explorer shows the scoring details behind ambiguous matches, where a record could plausibly belong to more than one entity.

Accuracy measurement
sz_audit compares entity resolution results against a truth set, reporting precision, recall, and F1 scores for measuring accuracy.
