Most entity resolution systems don’t handle ambiguous records properly. This tricky and subtle condition creates false positives that are difficult to find.
In entity resolution, we use the term “ambiguous” to mean “multiple good answers.”
The great American boxer George Foreman named all five of his boys George. Imagine having to perform entity resolution on a record containing only his name, home address, and home phone — nothing else. In a typical household, a record containing a name, home address and phone would likely be unique to a single person. In the case of George Foreman, such a record could be any one of six people.
Look at this simple example:
Most entity resolution algorithms will arbitrarily resolve Record 3 into either Record 1 (the senior, born in 1970) or Record 2 (the junior, born in 1990). For example, imagine this outcome:
Even upon human inspection this match looks good, doesn’t it? That’s the tricky thing about ambiguous records like Record 3 — they can create invisible false positives. Invisible, in that you can’t see the false positive, until becoming aware of Record 2 (the junior).
The existence of Record 2 (the junior) means Record 3 could possibly be Record 1 (the senior) or Record 2 (the junior).
Handling ambiguous records properly is very important, especially when deployed in systems that can impinge on someone’s freedom or opportunity e.g., government watch listing or background check system. Imagine if Record 3 was represented derogatory information e.g., “terrorist” or “criminal record.” Arbitrarily matching this derogatory data to the junior or senior would result in a 50/50 chance of adversely impacting the wrong person.
If you want to see how your entity resolution engine handles this ambiguous condition compared to Senzing, check out these three records and more in our Synthetic Truth Set.
For a more technical article on this topic, click here.
We use cookies on our website to give you the most relevant experience by remembering your preferences and repeat visits. By clicking “Accept All”, you consent to the use of ALL the cookies. However, you may visit "Cookie Settings" to provide a controlled consent.
This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
Necessary cookies are absolutely essential for the website to function properly. These cookies ensure basic functionalities and security features of the website, anonymously.
Cookie
Duration
Description
cookielawinfo-checkbox-analytics
11 months
This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-necessary
11 months
This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-performance
11 months
This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy
11 months
The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
Non-necessaries cookies: these are any cookies that do not fall within the definition of essential cookies, such as cookies used to analyze your behavior on a website (‘analytical’ cookies) or cookies used to display advertisements to you (‘advertising’ cookies).
Cookie
Duration
Description
cookielawinfo-checkbox-non-necessary
11 months
This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Non-necessary.