7 Challenges for ISVs Building Entity Resolution
By Brian Macy, published December 15, 2022
Independent software vendors (ISVs) often struggle with data quality issues when their products and services ingest and process data about people and organizations. Many ISVs wanting to build entity resolution models want to add new entity resolution capabilities to their offerings to improve entity matching results, eliminate manual processes, enable real-time processing, or support new deployment models.
Although these are worthy goals, building high-quality capabilities can take years. To be successful, a team requires an array of skills, including statistics, linguistics and performance engineering, plus deep knowledge of entity resolution and its numerous edge cases. Also, addressing data-quality issues is imperative. Most organizations don’t have all this expertise in-house, or the foresight to build in the flexibility required to adapt to unclear future requirements. Furthermore, if you build your own system, you will incur initial development costs as well as ongoing costs to improve and evolve the technology.
The 7 Key Capabilities of Modern Entity Resolution
No doubt you have some very talented developers and engineers on your team. However, entity resolution probably isn’t their core competency. To create your own entity resolution system, you’ll need to consider seven key capabilities, each of which presents its own development challenges. Read on to learn more.
1. Quick and easy data onboarding – You’ll need to build flexible, intelligent data ingestion capabilities that make it easy for all users (including both your team and your end-user customers) to onboard new quality data sources quickly with minimal or no data preparation.
The challenge is to build an entity resolution system that makes adding new quality data sources easy and intuitive for users. This means a system that will require minimal tuning, training, or data preparation, including support for automatic parsing of names and addresses.
2. Highly accurate data matching algorithms – Accuracy is at the core of any effective entity resolution solution. Accuracy rates are often much lower than expected and getting to competitive levels of accuracy is a huge challenge. Name and address parsing and comparison are such complex tasks you’ll probably want to use third party products instead of building them yourself, so you’ll need to factor in licensing costs and integration time.
Achieving high levels of accuracy requires advanced capabilities, starting with domain awareness that spans global naming and addressing conventions across countries, geographies, languages, and scripts.
To reach even greater levels of accuracy, your solution should use entity-centric matching and learning capabilities that enable it to adapt and get smarter over time without reloading data. The most accurate systems will also need to support sequence neutrality – meaning that regardless of the order data is loaded, every new record is used to reevaluate and improve prior decisions – which is an extremely difficult development challenge.
High levels of data matching accuracy may appear easy to achieve when you’re testing on a small number of records. But results can change dramatically as quality data volumes and varieties scale, so plan, build, and test for larger numbers and types of records when developing your own entity resolution system.
3. Flexible deployment options – Even if you only deploy your product on premises or in the cloud today, you may want to run in other environments in the future. It’s more challenging to build deployment flexibility into your entity resolution capabilities but you should consider architecting for multiple environments (cloud, on premises and SaaS) to ensure your investment is future proofed.
4. Explainable to users and auditors – Entity resolution solutions must be able to clearly explain why decisions were made. Explainability enables you to answer inquiries quickly and clearly from end users, auditors, senior management, regulators, or attorneys who want to know exactly why records matched or didn’t match.
These capabilities help provide a better understanding of, and greater confidence in, your entity resolution results. It can be easy to provide explainable results when systems are simple, but for more advanced systems, or those that use machine learning, it can be very complicated.
5. Scalable to support your largest future customers and new features – Don’t make the mistake of scoping your entity resolution capabilities for the business you have now because you may need to support much larger customers or more data sources in the future. Even if your customers or their data sets don’t get larger, you may need to add new advanced features or increase your system’s accuracy levels, both of which can require significantly more scale.
Scalability is difficult to retrofit, so you must ensure the entity resolution capabilities you build today can handle your size and performance requirements well into the future.
6. Real-time capabilities – While many ISVs use only batch-based data today, they recognize they will have to support real-time capabilities in the future. Real-time systems work fine with batch data, but batch-based systems are nearly impossible to convert to real time. Therefore, any new entity resolution development should be natively real time, even if those capabilities aren’t needed today.
What is needed for real-time entity resolution? True real-time entity resolution systems must operate fast enough for businesses to act while results are still relevant. Along with responding to queries in real time, the system must ingest and process data at transaction speeds as soon as data is received, and self-correct as new observations are made.
7. Support for global languages and scripts – If you operate globally, or in the future may expand globally, your entity resolution solution needs to support global languages and scripts (alphabetic, syllabic, ideographic, etc.), as well as be able to compare across scripts. This capability may be last on our list, but it’s no simple task.
Even if you use only a single script today and don’t envision any global expansion, being able to perform entity resolution over culturally diverse data is essential for accuracy. Developing an entity resolution solution that recognizes cultural differences still requires sophisticated name and cultural domain awareness as well as high-quality name matching.
It's Smarter to Embed than to Build
Would you build your own mobile communications or payment processing capabilities today? No. Just like these common software add-ons, you can now add principle-based, pre-trained and tuned entity resolution to your product with just a few lines of code.
Embedding the Senzing entity resolution API is so fast, easy and affordable, it doesn’t make sense to build entity resolution yourself. In just a few sprints, you can add the most advanced entity resolution to your product, whether your software runs on premises, in the cloud, or as a service. Senzing entity resolution is your best option offering superior performance, faster time-to-market, and better economics.