Entity Resolution & Machine Learning - Why You Need Both

By Senzing, published October 26, 2022

Machine learning is learning through experience. When machine learning models learn from data of poor quality or with many incorrect matches, the quality of the models is also poor. Watch this video to learn more from Senzing Founder and CEO Jeff Jonas about how entity resolution enhances machine learning outcomes.

A great way to improve your machine learning models is to entity resolve your data prior to training your system. That way you’re learning from more accurate data. The Senzing team has already used vast amounts of data to improve its entity resolution algorithms and is constantly learning as new data is received, so your system doesn’t have to.

We’ve taken particular care to be experts about nuances in name matching and address parsing. Why would you want your machine to have to learn Dick, Dickey, Richie, Ricardo, are in the same name family, or Elizabeth and Beth, or all the spellings of Muhammad?

By performing entity resolution in advance, your maching learning models will be better on day one. Why wouldn’t you want that?

Senzing® entity resolution delivers accurate results out-of-the-box and uses machine learning to get smarter over time, in a way you’re not going to experience with any other entity resolution technology. Try it for free. Download our free Desktop Eval Tool and see for yourself how Senzing entity resolution improves the quality of your data.

Video Transcript

Timestamps
0:00 Intro
0:10 Machine Learning
0:19 Senzing Name Matching
0:54 Senzing Address Parsing
1:16 Real-Time Machine Learning
1:32 Real-Time Machine Learning Entity Resolution Example
2:07 Correcting the Past Through Entity-Centric Learning
2:37 Local Learning
2:55 Deploying Entity Resolution

Entity resolution and machine learning. Let’s just talk about that briefly.

0:10 Machine Learning

Now, when I say machine learning, I mean learning through experience about the data that’s happening, the data that’s arriving, or the data that one has in its reference set.

0:19 Senzing Name Matching

But first let me say this. Why would you want your machine to have to learn Dick, Dickey, Richie, Ricardo, with the same name family, or Elizabeth and Beth, or all the spellings of Muhammad? Well, that would be wasteful.

Some methods of entity resolution require you to teach things like that. You’re not going to want to do that. So, one of the things we do at Senzing is we come out-of-the-box with an already statistically-learned name library comparison function that’s culturally aware. So, it does out-of-the-box great name matching.

0:54 Senzing Address Parsing

Similarly, you could either try to teach a machine with probably too limited of data how to parse addresses from around the world or why don’t you use our enhanced libpostal open source address parser, which has been machine learned off of hundreds of millions of records already before you even get the system.

1:16 Real-Time Machine Learning

So, even if you only have one million records, that’s not enough information to be machine learned to cause any great entity resolution. Out-of-the-box Senzing comes with some pre-built knowledge that’s baked in, and then on top of that, it’s learning in real-time.

1:32 Real-Time Machine Learning Entity Resolution Example

What kind of learning in real-time? An example would be passport numbers. Let’s say passport numbers are pretty distinct. But what if a passport number 54321 starts appearing and a lot of people have the same number because some lazy operator somewhere’s typing it in.

Senzing learns that in real-time and then does two things. One, it goes, hey, 20 people have the same passport number. Passport numbers are pretty good, not this one. Let’s stop using this passport number as an important match value going forward. That’s part one.

2:07 Correcting the Past Through Entity-Centric Learning

And part two it says, wow, now that I know that that passport number is no good, maybe I’ll look in the past and see if I’ve made any decisions incorrectly now that I know this. And so Senzing is correcting the past based on statistics. It’s learning going forward in time.

The second thing that we do in real-time learning, is local learning. We might learn that somebody has a nickname Rick, you know. Maybe their name’s Ken but they’ve been going by Rick.

2:37 Local Learning

The moment it learns that they also go by the name Rick, it goes wow, well now that I know that, you know, this person also has a nickname Rick, would I have made any decisions differently? You might say, oh yeah, these other two records are actually part of the same entity, and those are a couple of examples of real-time learning.

2:55 Deploying Entity Resolution

When you go to deploy entity resolution, you’re going to really want to make sure it’s self-correcting, self-tuning, self-learning. Otherwise, you’re going to be struggling with do you have enough data and how long, how many years do you want to spend training your system.

Senzing is going to help you deliver accurate results out-of-the-box and only get better over time in a way that you’re not going to experience with any other entity resolution technology.

Interested in what we're up to?
Subscribe to email updates from Senzing.

Please add your email address to opt-in to be subscribed to our email marketing list. You can unsubscribe at any time. For further information, please view our full Privacy Notice.