Introducing CORDs:
Collections of Relatable Data
If you have ever worked on a project that involves entity resolution, you already know it is hard work. Inconsistent data structures, messy names and addresses, transpositions, typos, and intentional errors (deceit) make it difficult to understand who is who in your data.
Despite the availability of easy, accurate and affordable AI software for entity resolution like Senzing, there are challenges that slow down entity resolution projects.
The first and biggest blocker is typically getting real (not synthetic) data in hand to begin evaluating entity resolution. This is because most data owners aren’t in a rush to share their precious production data without sufficient “process” e.g., architecture reviews, security reviews, executive sign-off.
What If...
What if there was an easy way to get free, real data containing levels of diversity and messiness like your own data? What if this data was already filtered into bite size, yet statistically relevant, subsets? What if such subsets were already in a format ready for entity resolution processing?
If so, you would be able to start producing entity resolution results in under an hour. Even better, the evaluation would be more relevant given the huge difference in test accuracy and value that real data makes over synthetic data.
Introducing CORDs
A CORD is a Collection Of Relatable Data.
“Relatable” in that the datasets contain overlapping attributes like names and addresses across a common domain like a city or state. A Senzing CORD contains two or more free datasets containing real data. These datasets are historical snapshots pre-formatted in Senzing-ready JSON.
Datasets in a CORD are sourced either from openly available data or via a Senzing collaboration with a data provider contributing CORD snapshots.
For example, a Las Vegas CORD might be as simple as:
A CORD can include other valuable attributes that don’t contribute to relatability e.g., the PPP loan amount and the number of labor violations.
Check out Senzing’s available CORDs. If you need a different CORD, just ask. We have other CORDs available and can make custom CORDs quickly with our CORD factory.
CORDs Are Powerful. Here’s Why…
A CORD allows you to quickly evaluate entity resolution accuracy and the benefits of connected data. CORDs also help illuminate how entity resolution accuracy improves as data diversity increases e.g., as alternate names and spelling variations are learned.
While achieving accurate matching is technically the hardest part, there are several reasons traditional entity resolution projects take so long, namely:
- Getting representative data in hand to evaluate
- Cleaning up this data for entity resolution
- Preparing the data for entity resolution
- Provisioning the hardware for the evaluation
- Configuring the entity resolution algorithm for this specific data
- Running the entity resolution algorithm
- Analyzing the results
- Re-configuring, re-processing, and re-analyzing in pursuit of acceptable results
Considering all this anticipated effort, no surprise such projects get pushed to next year.’
While Senzing vastly accelerates step 5 onward, CORDs accelerate all the pre-requisites (steps 1-4).
Marvel…
First, pick a CORD, any CORD. If you just need to evaluate plain vanilla US data, great. Download the Las Vegas CORD. Need to evaluate global data? Download the London CORD. Need to test non-roman scripts? No problem. Download the Moscow CORD.
Because a CORD contains pre-filtered and pre-formatted data, often totaling under 1M records, very basic hardware (even a laptop) can be used for the entity resolution and its accuracy analysis. While the simple Senzing Desktop Eval Tool is suited for such small record sets, the Senzing SDK is proven at data volumes beyond your wildest dreams. [Pro Tip: Test accuracy first. Once that’s vetted, then do your scalability testing, as a separate motion.]
Indeed, with a CORD, steps 1-4 are effortless. CORDs allow you to jump straight to running entity resolution.
Whether using a Senzing-inside solution from a Senzing partner, or using the Senzing Desktop Eval Tool, the data in a CORD essentially eliminates steps 1-4. Out-of-the-box, Senzing will entity resolve diverse, cross-culture, cross-language, inconsistent, and messy data. Senzing routinely outperforms humans and only gets smarter, in real-time, as more data is resolved.
Drag a CORD snapshot onto the pallet and press LOAD for 1-click entity resolution
Using the Desktop Evaluation Tool, you will be able to explore matches, possible matches, and possible relationships.
Matches in the Moscow CORD. Can you see the name and address similarity?
With Senzing AI for entity resolution, most use cases will have no need for any configuration changes, rendering steps 5-8 effortless. No more needs for highly specialized work related to configuration, training, and tuning entity resolution.
IMAGINE: While teams using other approaches are still discussing objectives in PowerPoint, detailing the project plan in Excel, and staffing experts to toil over training and tuning … your Senzing SDK deployment will have a billion records online, supporting real-time workflows in pre-production.
Try A Senzing CORD
Check out our available CORDs. If you would like to chat first, drop us a note here or if anything seems hard along the way, email us at support@senzing.com. It’s fab, fast, free.
Closing
After decades of innovation, Senzing’s real-time AI engine makes entity resolution easy, accurate, and cost-effective for data engineers, data scientists, and developers.
With Senzing CORDs, you can quickly access representative real-world data, accelerating your entity resolution evaluation while unlocking the added value of combining additional data sources.
The combination of the Senzing SDK and Senzing CORDs will accelerate your entity resolution project, delivering better outcomes across key areas like fraud detection, risk management, compliance, investigations, customer 360, and marketing.
Whether you’re working in banking, insurance, healthcare, or government, Senzing CORDs will make life easier.
Please reach out if you would like to collaborate.
Other Related Materials
OTHER RELATED MATERIALS
Entity Resolution: Insights and Implications for AI Applications (By Ben Lorica of Gradient Flow)
Entity Resolution Step-by-Step (Video by Jeff Jonas)