Auditing
sz_audit compares entity resolution results against a truth set to measure accuracy. It calculates precision, recall, and F1 scores that indicate how well Senzing is performing on the data.
ENTITY_ID values in your database will most likely differ from those shown here, as they depend on load order. Use the ENTITY_ID values returned by your commands in subsequent steps. If you are using the truth set, DATA_SOURCE and RECORD_ID values will be the same.Viewing audit results in sz_explorer
To view the audit results interactively, load the audit file when starting sz_explorer:
sz_explorer -a truthset_audit.json
Or load it after sz_explorer is already running:
load truthset_audit.json
This unlocks the audit_summary command.
audit_summary
The audit_summary command displays a statistics table at the top, followed by review categories for mismatches:

Statistics table
The top table has three column groups read left to right, and three rows read top to bottom.
The column groups are:
- Statistic / Entities / Pairs: Compares entity and pair counts between the two inputs: the
-pprior input (the truth set key file, or an older snapshot) and the-nnewer input (the snapshot from Senzing). - Statistic / Pairs: Breaks down where the two inputs agree and disagree on record pairs.
- Statistic / Accuracy: Shows the accuracy metrics as decimals (where 1.0 = 100%). See Understanding the accuracy metrics for what these mean.
Each row connects across all three groups:
- Prior Count: The prior input has 84 entities and 110 record pairs. Of those pairs, 106 are Same Positives (both inputs agree they belong together). Precision is 0.98148.
- Newer Count: The newer input has 85 entities and 108 record pairs. Of those, 2 are New Positives (pairs Senzing created that the prior input did not expect). Recall is 0.96364.
- Common Count: 78 entities and 106 record pairs are common to both inputs. 4 pairs are New Negatives (pairs the prior input expected but Senzing did not create). F1 Score is 0.97248.
Review categories
The review categories MERGE and SPLIT represent the two types of discrepancies between the truth set and Senzing’s results:
MERGE: Senzing resolved records into the same entity that the truth set kept separate.SPLIT: The truth set expected records to resolve together, but Senzing kept them as separate entities.
These are not necessarily errors. Selecting a review category shows the MATCH_KEY values responsible, and selecting a MATCH_KEY shows the specific records involved for evaluating each case. Use the why command in sz_explorer
to see the full scoring details for any discrepancy that warrants investigation.
Discrepancy walkthrough
The four discrepancies in this audit illustrate how entity resolution decisions work in practice. Discrepancies are not necessarily errors. They are places where Senzing’s automated decisions differ from the truth set’s expectations, and each one is an opportunity to understand the data better.
The alternate key was designed with two philosophical differences from Senzing’s defaults that explain these specific discrepancies: it uses more aggressive name matching (prioritizing recall over precision) and does not use employer as a matching feature. This context explains why the MERGE cases involve employer-based matches and the SPLIT cases involve name variants.
New Positives (MERGE cases)

Both MERGE cases share the same +NAME+EMPLOYER match key:
-

"DATA_SOURCE":"REFERENCE", "RECORD_ID":"2081"+"DATA_SOURCE":"WATCHLIST", "RECORD_ID":"2082"merged into"ENTITY_ID": 100001:"DATA_SOURCE":"REFERENCE", "RECORD_ID":"2081"has the name Howard Hughes with employer Universal Exports Worldwide, while"DATA_SOURCE":"WATCHLIST", "RECORD_ID":"2082"has Hughes, Howie with employer Universal Exports. Senzing recognized Howie as a nickname for Howard and matched on bothNAMEandEMPLOYER. The truth set expected these to be separate entities, but Senzing grouped them because they share both a name and an employer. -

"DATA_SOURCE":"REFERENCE", "RECORD_ID":"2091"+"DATA_SOURCE":"WATCHLIST", "RECORD_ID":"2092"merged into"ENTITY_ID": 100008:"DATA_SOURCE":"REFERENCE", "RECORD_ID":"2091"has the name Margaret Charney with employer Universal Exports Worldwide, while"DATA_SOURCE":"WATCHLIST", "RECORD_ID":"2092"has Charney, Peggie with employer Universal Exports. Senzing recognized Peggie as a nickname for Margaret and matched on bothNAMEandEMPLOYER. The truth set expected them to be separate entities, but the shared name and employer combination was enough for Senzing to merge them.
Investigate these cases with the why command to see the scoring that led to each merge. Two people sharing both a name and an employer is often a genuine match, but it can also be coincidental, especially with a common or large employer like Universal Exports.
New Negatives (SPLIT cases)

The two SPLIT cases produced four new negative pairs total. A new negative pair is created for every pair of records that the truth set expected to resolve together but Senzing resolved into separate entities:
-

SPLITcase 1 (+NAME+DOB):"DATA_SOURCE":"CUSTOMERS", "RECORD_ID":"1025","DATA_SOURCE":"CUSTOMERS", "RECORD_ID":"1026", and"DATA_SOURCE":"WATCHLIST", "RECORD_ID":"1027"were expected by the truth set to be one entity. The three records have the names Darla Anderson, Darlene Anderson, and Darletta Anderson, all sharing the date of birth 1/7/80. Senzing resolved them into three separate entities:"DATA_SOURCE":"CUSTOMERS", "RECORD_ID":"1025"in"ENTITY_ID": 17,"DATA_SOURCE":"CUSTOMERS", "RECORD_ID":"1026"in"ENTITY_ID": 19, and"DATA_SOURCE":"WATCHLIST", "RECORD_ID":"1027"in"ENTITY_ID": 200001. Although the records share a last name and date of birth, the first name variants (Darla, Darlene, Darletta) are distinct enough that Senzing’s scoring determined the overall evidence was not strong enough to confirm they are the same person. Because three records that should be one entity ended up in three separate entities, this produces three new negative pairs:"DATA_SOURCE":"CUSTOMERS", "RECORD_ID":"1025"and"DATA_SOURCE":"CUSTOMERS", "RECORD_ID":"1026","DATA_SOURCE":"CUSTOMERS", "RECORD_ID":"1025"and"DATA_SOURCE":"WATCHLIST", "RECORD_ID":"1027", and"DATA_SOURCE":"CUSTOMERS", "RECORD_ID":"1026"and"DATA_SOURCE":"WATCHLIST", "RECORD_ID":"1027". -

SPLITcase 2 (+NAME+DOB-GENERATION):"DATA_SOURCE":"CUSTOMERS", "RECORD_ID":"1089"and"DATA_SOURCE":"CUSTOMERS", "RECORD_ID":"1090"were expected by the truth set to be one entity. The two records have the names Morris I Klein and Morris II Klein, both sharing the date of birth 4/12/82. Senzing resolved"DATA_SOURCE":"CUSTOMERS", "RECORD_ID":"1089"into"ENTITY_ID": 75and"DATA_SOURCE":"CUSTOMERS", "RECORD_ID":"1090"into"ENTITY_ID": 78. The-GENERATIONsuffix in the match key indicates Senzing detected a generational name difference (I vs II) that prevented the merge. Because two records that should be one entity ended up in two separate entities, this produces one new negative pair.
The SPLIT cases highlight an important tradeoff: Senzing does not merge records when the evidence is ambiguous. In the generational name case, keeping Morris I Klein and Morris II Klein as separate entities is often the correct decision, even when a truth set groups them. Use the how command to see the step-by-step resolution path and understand exactly where the scoring fell short of the merge threshold.
Understanding the accuracy metrics
The Accuracy column in the audit_summary statistics table reports three metrics as decimals (where 1.0 = 100%):
Precision
Precision measures how many of the matches the newer input made were correct.
Formula: Same Positives / Newer Count pairs
In the truth set example: 106 / 108 = 0.98148. Low precision indicates false positives (over-resolution): records are being grouped together that should remain separate.
Recall
Recall measures how many of the expected matches the newer input actually found.
Formula: Same Positives / Prior Count pairs
In the truth set example: 106 / 110 = 0.96364. Low recall indicates false negatives (under-resolution): records that belong to the same entity are not being connected.
F1 score
The F1 score is the harmonic mean of precision and recall, providing a single number that balances both concerns.
Formula: 2 * (precision * recall) / (precision + recall)
Interpreting results
The audit report breaks down results by data source pair, showing accuracy separately for within-source matches (e.g., CUSTOMERS to CUSTOMERS) and cross-source matches (e.g., CUSTOMERS to WATCHLIST).
Score interpretation
| Score Range | Interpretation |
|---|---|
| 0.98 - 1.0 | Excellent. Entity resolution is highly accurate for this data source pair. |
| 0.95 - 0.97 | Very good. A few edge cases may need investigation. |
| 0.90 - 0.94 | Good, but review the mismatches. Contact Senzing Support for help investigating. |
| Below 0.90 | Investigate. Data quality issues may be present. Contact Senzing Support for help. |
Ambiguous matches
Some entity resolution decisions are genuinely ambiguous. A record might plausibly belong to more than one entity, and the “correct” answer depends on context that Senzing cannot determine from the data alone.
The audit report flags these cases separately. Ambiguous matches are not counted as errors because the truth set itself recognizes them as borderline. When ambiguous matches appear:
- Use
sz_explorerto examine the entities involved. - The
whycommand shows the scoring details that made the match ambiguous. - The
howcommand shows the step-by-step resolution path for evaluating whether the grouping is correct.
Ambiguous matches highlight areas where additional data or business rules could improve resolution confidence.
Auditing in practice
An audit quantifies the algorithmic differences between two entity resolution approaches and surfaces the specific records responsible. The audit report identifies cases where one approach was too aggressive (merging records that should stay separate) or too conservative (keeping apart records that belong to the same entity), producing concrete examples to evaluate rather than abstract accuracy claims.
Senzing is tunable. Its matching rules, thresholds, and feature usage can all be adjusted to align with organizational requirements. If the audit reveals cases where a different matching philosophy is preferred, contact Senzing Support to discuss tuning options.
Comparing snapshots over time
Beyond truth set auditing, sz_audit can compare two snapshots taken at different points in time. This is useful for:
Against a truth set
The best run has the highest F1 score, even if another run has higher recall but lower precision. These statistics support the comparison. If recall matters more than precision, the run with the highest recall may be preferable as long as the lower precision is acceptable.
Between two engines
One engine’s new positives are the other’s new negatives. The scores indicate how close the runs are to each other. Browse the entities that were split or merged to evaluate which results are more accurate. For instance, if the splits in the second run are correct, they represent false positives in the first run.
After configuration changes
If the goal was 10% more matches, precision should be in the 90s. If it is in the 80s, the change produced 20% more matches. Recall should remain at 100 if no prior good matches were lost. If recall dropped, prior good matches were lost.
To compare two snapshots, use the newer snapshot as the -n input and the older snapshot as the -p input:
sz_audit -n new_snapshot.csv -p old_snapshot.csv -o comparison_audit
Next steps
If you have any questions, contact Senzing Support. Support is 100% FREE!