Entity-Centric Learning vs. Record Matching Methods
Entity-Centric Learning vs. Record Matching Methods in Entity Resolution Systems
28, 2020
Most entity resolution algorithms rely on record matching – a method whereby each record is compared to other records for similarity. Record matching does not learn which ultimately results in missed matches.
More advanced entity resolution uses entity-centric learning – a method that treats resolved records as a single holistic entity. Entity-centric learning gets smarter over time, improves accuracy, and can detect non-obvious relationships that humans can easily miss.
Record Matching
Can you match the record on the left to any of the records on the right?
Nope. Record 1 shares only a name with Sue Jones, Record 2 only an address, and Record 3 only a phone. No single record contains enough information to match.
As a result, record matching systems will incorrectly create a new entity for Sue Jones (Record 4):
Record matching fails to realize Sue Jones is Sue Smith... thus a new entity is created in error.
Record matching works well when every record is rich with features, has reasonable data quality, and no one is trying to evade detection. But the moment some records don’t have many features, the data quality is poor, or the data contains intentionally fabricated lies, record matching misses good matches (aka creates false negatives).
Entity-Centric Learning
When entity resolution uses the entity-centric learning method, Record 4 is compared to “everything we know about Sue” (all the features across all the records) – resulting in a more accurate outcome:
Entity-centric learning is not fooled -- Sue Jones is Sue Smith.
Entity-centric learning is especially important as dozens of attributes, if not a hundred, can be used to describe an entity (from name and address to Twitter and Instagram handles). These attributes are scattered and disconnected across countless data sources. Entity-centric learning builds holistic entity views which retains all information known about an entity. Over time and with more data, entity-centric learning improves the accuracy of entity resolution.
Furthermore, when it comes to catching fraudsters or other bad actors, entity-centric learning is mandatory. This is because bad actors never use the same name, date of birth, address and/or passport number every time they submit a credit card application or apply for a loan. Only the idiots would do that! Clever bad actors do their best to prevent organizations from figuring out who is who. For more about this read: Channel Separation: The Primary Tradecraft of Clever Bad Guys.
[If you would like to get your hands on synthetic data which includes entity-centric learning examples, download the Senzing Synthetic Truth Set.]
We use cookies on our website to give you the most relevant experience by remembering your preferences and repeat visits. By clicking “Accept All”, you consent to the use of ALL the cookies. However, you may visit "Cookie Settings" to provide a controlled consent.
This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
Necessary cookies are absolutely essential for the website to function properly. These cookies ensure basic functionalities and security features of the website, anonymously.
Cookie
Duration
Description
cookielawinfo-checkbox-analytics
11 months
This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-necessary
11 months
This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-performance
11 months
This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy
11 months
The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
Non-necessaries cookies: these are any cookies that do not fall within the definition of essential cookies, such as cookies used to analyze your behavior on a website (‘analytical’ cookies) or cookies used to display advertisements to you (‘advertising’ cookies).
Cookie
Duration
Description
cookielawinfo-checkbox-non-necessary
11 months
This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Non-necessary.