Auditing

sz_audit compares entity resolution results against a truth set to measure accuracy. It calculates precision, recall, and F1 scores that indicate how well Senzing is performing on the data.

To run sz_audit, see Step 4: Perform an audit in Truth Set Setup .

The examples and screenshots on this page are based on the truth set demo data. ENTITY_ID values in your database will most likely differ from those shown here, as they depend on load order. Use the ENTITY_ID values returned by your commands in subsequent steps. If you are using the truth set, DATA_SOURCE and RECORD_ID values will be the same.

To learn more about creating truth sets from production data, see How to create an entity resolution truth set .

Viewing audit results in `sz_explorer`

To view the audit results interactively, load the audit file when starting sz_explorer:

sz_explorer -a truthset_audit.json

Or load it after sz_explorer is already running:

load truthset_audit.json

This unlocks the audit_summary command.

`audit_summary`

The audit_summary command displays a statistics table at the top, followed by review categories for mismatches:

audit_summary

Statistics table

The top table has three column groups read left to right, and three rows read top to bottom.

The column groups are:

Statistic / Entities / Pairs: Compares entity and pair counts between the two inputs: the -p prior input (the truth set key file, or an older snapshot) and the -n newer input (the snapshot from Senzing).
Statistic / Pairs: Breaks down where the two inputs agree and disagree on record pairs.
Statistic / Accuracy: Shows the accuracy metrics as decimals (where 1.0 = 100%). See Understanding the accuracy metrics for what these mean.

Each row connects across all three groups:

Prior Count: The prior input has 84 entities and 110 record pairs. Of those pairs, 106 are Same Positives (both inputs agree they belong together). Precision is 0.98148.
Newer Count: The newer input has 85 entities and 108 record pairs. Of those, 2 are New Positives (pairs Senzing created that the prior input did not expect). Recall is 0.96364.
Common Count: 78 entities and 106 record pairs are common to both inputs. 4 pairs are New Negatives (pairs the prior input expected but Senzing did not create). F1 Score is 0.97248.

Review categories

The review categories MERGE and SPLIT represent the two types of discrepancies between the truth set and Senzing’s results:

MERGE: Senzing resolved records into the same entity that the truth set kept separate.
SPLIT: The truth set expected records to resolve together, but Senzing kept them as separate entities.

These are not necessarily errors. Selecting a review category shows the MATCH_KEY values responsible, and selecting a MATCH_KEY shows the specific records involved for evaluating each case. Use the why command in sz_explorer to see the full scoring details for any discrepancy that warrants investigation.

Discrepancy walkthrough

The four discrepancies in this audit illustrate how entity resolution decisions work in practice. Discrepancies are not necessarily errors. They are places where Senzing’s automated decisions differ from the truth set’s expectations, and each one is an opportunity to understand the data better.

The alternate key was designed with two philosophical differences from Senzing’s defaults that explain these specific discrepancies: it uses more aggressive name matching (prioritizing recall over precision) and does not use employer as a matching feature. This context explains why the MERGE cases involve employer-based matches and the SPLIT cases involve name variants.

New Positives (`MERGE` cases)

Selected Merge review

Both MERGE cases share the same +NAME+EMPLOYER match key:

"DATA_SOURCE":"REFERENCE", "RECORD_ID":"2081" + "DATA_SOURCE":"WATCHLIST", "RECORD_ID":"2082" merged into "ENTITY_ID": 100001: "DATA_SOURCE":"REFERENCE", "RECORD_ID":"2081" has the name Howard Hughes with employer Universal Exports Worldwide, while "DATA_SOURCE":"WATCHLIST", "RECORD_ID":"2082" has Hughes, Howie with employer Universal Exports. Senzing recognized Howie as a nickname for Howard and matched on both NAME and EMPLOYER. The truth set expected these to be separate entities, but Senzing grouped them because they share both a name and an employer.
"DATA_SOURCE":"REFERENCE", "RECORD_ID":"2091" + "DATA_SOURCE":"WATCHLIST", "RECORD_ID":"2092" merged into "ENTITY_ID": 100008: "DATA_SOURCE":"REFERENCE", "RECORD_ID":"2091" has the name Margaret Charney with employer Universal Exports Worldwide, while "DATA_SOURCE":"WATCHLIST", "RECORD_ID":"2092" has Charney, Peggie with employer Universal Exports. Senzing recognized Peggie as a nickname for Margaret and matched on both NAME and EMPLOYER. The truth set expected them to be separate entities, but the shared name and employer combination was enough for Senzing to merge them.

Investigate these cases with the why command to see the scoring that led to each merge. Two people sharing both a name and an employer is often a genuine match, but it can also be coincidental, especially with a common or large employer like Universal Exports.

New Negatives (`SPLIT` cases)

Selected Split review

The two SPLIT cases produced four new negative pairs total. A new negative pair is created for every pair of records that the truth set expected to resolve together but Senzing resolved into separate entities:

SPLIT case 1 (+NAME+DOB): "DATA_SOURCE":"CUSTOMERS", "RECORD_ID":"1025", "DATA_SOURCE":"CUSTOMERS", "RECORD_ID":"1026", and "DATA_SOURCE":"WATCHLIST", "RECORD_ID":"1027" were expected by the truth set to be one entity. The three records have the names Darla Anderson, Darlene Anderson, and Darletta Anderson, all sharing the date of birth 1/7/80. Senzing resolved them into three separate entities: "DATA_SOURCE":"CUSTOMERS", "RECORD_ID":"1025" in "ENTITY_ID": 17, "DATA_SOURCE":"CUSTOMERS", "RECORD_ID":"1026" in "ENTITY_ID": 19, and "DATA_SOURCE":"WATCHLIST", "RECORD_ID":"1027" in "ENTITY_ID": 200001. Although the records share a last name and date of birth, the first name variants (Darla, Darlene, Darletta) are distinct enough that Senzing’s scoring determined the overall evidence was not strong enough to confirm they are the same person. Because three records that should be one entity ended up in three separate entities, this produces three new negative pairs: "DATA_SOURCE":"CUSTOMERS", "RECORD_ID":"1025" and "DATA_SOURCE":"CUSTOMERS", "RECORD_ID":"1026", "DATA_SOURCE":"CUSTOMERS", "RECORD_ID":"1025" and "DATA_SOURCE":"WATCHLIST", "RECORD_ID":"1027", and "DATA_SOURCE":"CUSTOMERS", "RECORD_ID":"1026" and "DATA_SOURCE":"WATCHLIST", "RECORD_ID":"1027".
SPLIT case 2 (+NAME+DOB-GENERATION): "DATA_SOURCE":"CUSTOMERS", "RECORD_ID":"1089" and "DATA_SOURCE":"CUSTOMERS", "RECORD_ID":"1090" were expected by the truth set to be one entity. The two records have the names Morris I Klein and Morris II Klein, both sharing the date of birth 4/12/82. Senzing resolved "DATA_SOURCE":"CUSTOMERS", "RECORD_ID":"1089" into "ENTITY_ID": 75 and "DATA_SOURCE":"CUSTOMERS", "RECORD_ID":"1090" into "ENTITY_ID": 78. The -GENERATION suffix in the match key indicates Senzing detected a generational name difference (I vs II) that prevented the merge. Because two records that should be one entity ended up in two separate entities, this produces one new negative pair.

The SPLIT cases highlight an important tradeoff: Senzing does not merge records when the evidence is ambiguous. In the generational name case, keeping Morris I Klein and Morris II Klein as separate entities is often the correct decision, even when a truth set groups them. Use the how command to see the step-by-step resolution path and understand exactly where the scoring fell short of the merge threshold.

Understanding the accuracy metrics

The Accuracy column in the audit_summary statistics table reports three metrics as decimals (where 1.0 = 100%):

Precision

Precision measures how many of the matches the newer input made were correct.

Formula: Same Positives / Newer Count pairs

In the truth set example: 106 / 108 = 0.98148. Low precision indicates false positives (over-resolution): records are being grouped together that should remain separate.

Recall

Recall measures how many of the expected matches the newer input actually found.

Formula: Same Positives / Prior Count pairs

In the truth set example: 106 / 110 = 0.96364. Low recall indicates false negatives (under-resolution): records that belong to the same entity are not being connected.

F1 score

The F1 score is the harmonic mean of precision and recall, providing a single number that balances both concerns.

Formula: 2 * (precision * recall) / (precision + recall)

Scores above 0.95 are strong. Scores below 0.90 indicate areas that need investigation.

Interpreting results

The audit report breaks down results by data source pair, showing accuracy separately for within-source matches (e.g., "DATA_SOURCE": "CUSTOMERS" to "DATA_SOURCE": "CUSTOMERS") and cross-source matches (e.g., "DATA_SOURCE": "CUSTOMERS" to "DATA_SOURCE": "WATCHLIST").

Score interpretation

Score Range	Interpretation
0.98 - 1.0	Excellent. Entity resolution is highly accurate for this data source pair.
0.95 - 0.97	Very good. A few edge cases may need investigation.
0.90 - 0.94	Good, but review the mismatches. Contact Senzing Support for help investigating.
Below 0.90	Investigate. Data quality issues may be present. Contact Senzing Support for help.

Ambiguous matches

Some entity resolution decisions are genuinely ambiguous. A record might plausibly belong to more than one entity, and the “correct” answer depends on context that Senzing cannot determine from the data alone.

The audit report flags these cases separately. Ambiguous matches are not counted as errors because the truth set itself recognizes them as borderline. When ambiguous matches appear:

Use sz_explorer to examine the entities involved.
The why command shows the scoring details that made the match ambiguous.
The how command shows the step-by-step resolution path for evaluating whether the grouping is correct.

Ambiguous matches highlight areas where additional data or business rules could improve resolution confidence.

Auditing in practice

An audit quantifies the algorithmic differences between two entity resolution approaches and surfaces the specific records responsible. The audit report identifies cases where one approach was too aggressive (merging records that should stay separate) or too conservative (keeping apart records that belong to the same entity), producing concrete examples to evaluate rather than abstract accuracy claims.

Senzing is tunable. Its matching rules, thresholds, and feature usage can all be adjusted to align with organizational requirements. If the audit reveals cases where a different matching philosophy is preferred, contact Senzing Support to discuss tuning options.

Comparing snapshots over time

Beyond truth set auditing, sz_audit can compare two snapshots taken at different points in time. This is useful for:

Against a truth set

The best run has the highest F1 score, even if another run has higher recall but lower precision. These statistics support the comparison. If recall matters more than precision, the run with the highest recall may be preferable as long as the lower precision is acceptable.

Between two engines

One engine’s new positives are the other’s new negatives. The scores indicate how close the runs are to each other. Browse the entities that were split or merged to evaluate which results are more accurate. For instance, if the splits in the second run are correct, they represent false positives in the first run.

After configuration changes

If the goal was 10% more matches, precision should be in the 90s. If it is in the 80s, the change produced 20% more matches. Recall should remain at 100 if no prior good matches were lost. If recall dropped, prior good matches were lost.

To compare two snapshots, use the newer snapshot as the -n input and the older snapshot as the -p input:

sz_audit -n new_snapshot.csv -p old_snapshot.csv -o comparison_audit

Next steps

If you have any questions, contact Senzing Support. Support is 100% FREE!

Auditing

Viewing audit results in sz_explorer

audit_summary

Statistics table

Review categories

Discrepancy walkthrough

New Positives (MERGE cases)

New Negatives (SPLIT cases)

Understanding the accuracy metrics

Precision

Recall

F1 score

Interpreting results

Score interpretation

Ambiguous matches

Auditing in practice

Comparing snapshots over time

Against a truth set

Between two engines

After configuration changes

Next steps

Viewing audit results in `sz_explorer`

`audit_summary`

New Positives (`MERGE` cases)

New Negatives (`SPLIT` cases)