8 Challenges for Enterprises Building Entity Resolution
By Brian Macy, published June 22, 2023
Building high-quality entity resolution capabilities can take years and demand a team with an array of skills spanning statistics, linguistics and performance engineering. Deep knowledge of entity resolution and its numerous edge cases is also crucial. Few organizations possess all this expertise in-house, or the foresight to build in the flexibility needed to adapt to evolving requirements.
As a result, many organizations are choosing to add commercial entity resolution capabilities to their applications or services rather than building their own. Just like they use third-party mobile communications or payment processing capabilities.
If you do decide to build your own entity resolution technology, be prepared to address several key development challenges. Before you make the decision to build, be sure you clearly understand what’s required to deliver the key capabilities needed for modern entity resolution. Read on to learn more.
The 8 Key Capabilities of Modern Entity Resolution
1. Support for global languages and scripts – If you operate globally, or in the future may expand globally, your entity resolution solution needs to support global languages and scripts (alphabetic, syllabic, ideographic, etc.), as well as be able to compare across scripts. Even if you use only a single script today, and don’t envision any global expansion, the ability to perform entity resolution over culturally diverse data is essential for accuracy.
Developing an entity resolution solution that recognizes cultural differences requires sophisticated name and cultural domain awareness as well as high-quality name matching. Building all this is no simple task and is expensive to license.
2. Quick and easy data onboarding – It’s essential to build flexible, intelligent data ingestion capabilities that make it easy to onboard new data sources quickly. This means designing a system that supports automatic parsing of addresses and date formats while requiring minimal or no training, tuning or data preparation. This is especially critical for enterprises with many data sources, such as those creating an enterprise-wide entity resolution service.
3. Highly accurate data matching – Achieving high levels of accuracy starts with domain awareness that spans global naming and addressing conventions across countries, geographies, languages and scripts. To reach higher levels of accuracy, your solution should also support record-to-entity matching and entity-centric learning capabilities, instead of simple record-to-record matching. For the highest level of accuracy, build an adaptable system that gets smarter over time and supports sequence neutrality – so you still get the same results regardless of the order records are loaded or updated. All these capabilities can be extremely difficult to build and require lengthy development cycles.
Be aware that it may seem easy to achieve high levels of data matching accuracy when you’re testing on a small number of records, but results can change dramatically as data volumes and varieties scale. Therefore, when developing your system, we recommend that you plan, build and test for large numbers of records and a wide variety of data sources.
4. Ease of use – When building an entity resolution system, it’s critical to pay attention to the ease of use for the entire process, from installation and data source integration to system training and tuning. It’s also important to focus on the ease of ongoing management and maintenance. The system should be designed in such a way that non-specialized staff can effectively manage it with relative ease. This includes maintaining the system, adding new data sources, updating existing sources or running new types of analyses.
5. Real-time capabilities – While most enterprises use only batch-based data today, many recognize they will want real-time capabilities in the future. Real-time systems work fine with batch data, but batch-based systems are nearly impossible to convert to real time. So, even if those capabilities aren’t needed right now, you should consider building a real-time system to ensure your solution is future proof.
What is needed for true real-time entity resolution? True real-time entity resolution goes beyond just responding to queries in real time (from batch data) to continuously ingesting, resolving, querying and self-correcting streaming data as it is received. The objective is to build a system that provides insights fast enough for a business to act on them.
6. Explainable to users and auditors – It’s important that your entity resolution results are easily explainable. Business users, data scientists and auditors will all want to know why records matched or didn’t, as well as the details of how decisions were made. Explainable results help you and others better understand and trust how your system works. If you don’t trust your entity resolution system’s results, it’s hard to feel confident making important decisions based on them.
Explainable results are also important for compliance and audits. Regulators and other compliance auditors may require you to explain why specific matches were made, or not made, and exactly how the records involved came together.
7. Scalable to support new features and your largest future size – It’s important to understand the number of data sources and the types of performance-intensive capabilities your entity resolution system will require now and in the future, because scalability is difficult to retrofit.
There are a number of reasons why your system may need to scale over time, including: supporting increasing data volumes and data complexity, adding new advanced features or improving your system’s accuracy level. To prevent your entity resolution system from hitting a wall in a few years, make sure it is scalable and flexible enough to easily meet all your future size and performance requirements.
8. Flexible deployment options – Even though you only plan to deploy your system on premises or in the cloud today, you may want to run in multiple environments in the future. Therefore, you should build deployment flexibility into your entity resolution technology so you can easily support any future deployment needs, whether cloud, on premises or as a service.
It's Smarter to Buy than to Build
No doubt you have some very talented developers and engineers on your enterprise team. However, entity resolution probably isn’t their core competency. If you choose to build your own entity resolution technology, you’ll need a team of entity resolution, statistics, linguistics and performance engineering experts. And be sure you have the time (years) and resources to build an entity resolution system that has the functionality and accuracy of commercial offerings.
Many organizations have looked at building entity resolution and then decided buying it was a much better option. Just like most organizations don’t build their own mobile communications or payment processing capabilities, it usually doesn’t make business sense to build entity resolution in house.
Add Senzing Entity Resolution in Weeks!
You can add Senzing entity resolution to your application or service in a few weeks. The Senzing API makes it so fast, easy and affordable to add world-class entity resolution capabilities, it doesn’t make sense to build it yourself any more.
Whether your software runs on premises, in the cloud, or as a service, Senzing® entity resolution is your best option for superior performance, faster time-to-market and lowest total cost of ownership.