Why Machine Learning Needs Entity Resolution
Are you training a new machine learning model? Do you want to ensure your ML model is as accurate as possible? Watch this video to learn how entity resolution improves the quality of your ML training data.
Machine learning models are only as good as the data they were trained on. If you haven’t used entity resolution on your machine learning training data set, chances are you’re training your models on data filled with false positives and false negatives.
Get better quality data and data labeling automation with Senzing® entity resolution.
When you resolve data about entities before using it to train your ML models, you’ll get much more accurate results. And entity resolution with Senzing is easy. Get started exploring Senzing on your own data or ours. In about 15 minutes, you’ll see all the matches and possible matches and relationships it can find.
Video Transcript
Timestamps
0:00 Introduction
0:10 False Positives and False Negatives in Machine Learning Data Sets
0:36 Use Entity Resolution on Your Training Data for Better Machine Learning Analytic Models
0:51 Entity Resolution for Data Labeling Automation in Machine Learning Models
1:05 Entity Resolution for Post Marketed Surveillance Machine Learning Models
1:43 Entity Resolution for Data Append and Better Machine Learning Models
I’d like to take a minute and talk about why machine learning needs entity resolution.
0:10 False Positives and False Negatives in Machine Learning Data Sets
For starters, in machine learning, you take a set of data and use it for training. Imagine taking a set of data where [your machine learning model] thinks it’s three different people when it’s really one. Or, it takes two people – a junior and a senior – and thinks they’re one person. Imagine those false positives and false negatives being used as training data. Yeah, that’s right. You heard me right.
What would it mean if you had training data that was wrong? That means your models are wrong.
0:36 Use Entity Resolution on Your Training Data for Better Machine Learning Analytic Models
When you use entity resolution on your training data it means you’re better understanding who is who and who’s related to whom. Then you’re taking that entity-resolved data and passing it in as your training data. Now, duh, you’re going to get better analytic models.
0:51 Entity Resolution for Data Labeling Automation in Machine Learning Models
There’s another use for entity resolution machine learning as well and it’s auto-labeling. Yep, you heard me right: auto-labeling. Labeling can be a real pain. How many people do you need to go in and go, “That is true, that’s not true. This is this, this is that?”
1:05 Entity Resolution for Post Marketed Surveillance Machine Learning Models
[Let’s say] you’re using entity resolution and you’re taking Data Set A and you’re combining it with Data Set B – maybe Data Set A is pharmaceuticals and maybe Data Set B is people who have passed away. When you combine those two data sets, you end up with labeled data and you can find out which people taking which pharmaceuticals have passed away. With that kind of data passed into machine learning models, you can see maybe the negative effects or harmful effects of [pharmaceutical] drugs in what’s called post marketing surveillance. [This helps] recognize after a pharmaceutical has been released, if it’s causing adverse effects that weren’t realized.
1:43 Entity Resolution for Data Append and Better Machine Learning Models
So, that’s entity resolution combining data, some would call a “database append” and that allows you to have better downstream models. So, if you’re doing machine learning and it has anything to do with people, organizations or other entities, entity resolution is going to help you have a better result.