What Are Entity Resolved Knowledge Graphs?
Entity Resolved Knowledge Graphs. How They Work & Why They Matter
This introduction to the topic โWhat Are Entity Resolved Knowledge Graphsโ will explain how the process of entity resolution can be applied to graph data to create a synergistic result called Entity Resolved Knowledge Graphs. This new approach amplifies the power and utility of traditional graph technology.
By applying entity resolution to knowledge graphs, organizations are able to make better decisions from their enterprise data, become more competitive, reduce fraud, and service their customers better.
The process of constructing a knowledge graph usually involves combining data from multiple sources. The value of having data organized in a graph is that it emphasizes relationships i.e., links the data. However, when the nodes in the graph represent duplicate entities or miss links (shared edges) โ itโs not โknowledgeโ, itโs just disconnected facts.
These duplicate nodes and missed relationships are difficult to remedy as data is messy for example, business names or addresses are often spelled differently, abbreviated, or just โone letterโ off. Harder yet, some data has been intentionally obfuscated for example the tradecraft of a clever fraudster.
Entity resolution fixes the duplicates for โreal-world entitiesโ and provides linked data to build more effective graphs. Without entity resolution, all of the analytics and machine learning derived from a graph is inaccurate and misleading.
Combining entity resolution with knowledge graph practices produces Entity Resolved Knowledge Graphs, and these allow connections which would otherwise be hidden to become available for graph analytics, visualizations, and applications in AI.
What Is A Knowledge Graph? (KG)
A knowledge graph (KG) provides a flexible, structured representation of connected data, enabling advanced search, analytics, visualization, reasoning, and other capabilities that are difficult to obtain otherwise. In other words, knowledge graphs help us understand relationships that contextualize and make sense of the data coming from a connected world. Recent uses of retrieval augmented generation (RAG) to make AI applications more robust have turned knowledge graphs into a hot topic.
What Is Entity Resolution? (ER)
Entity resolution (ER) provides advanced data matching and relationship discovery to identify and link data records that refer to the same real-world entities.
For example, your business may have duplicate entities (e.g. customers, suppliers) in a dataset that have been overlooked, (e.g., due to variations in name or address). These redundancies can create data quality problems โ for example, they can cause you to not know exactly how many customers you have.
But the problem gets worse when youโre trying to link entities across disparate data sources. This is required if youโre trying to get an enterprise-wide
360-degree view of your customers.
Whether youโre trying to get accurate customer counts, or linking records across datasets, if youโd like to leverage your data in downstream analytics, decisioning systems, model training, or other AI applications, making sense of millions of different entities becomes very important. And that is exactly what entity resolution helps accomplish.
Problems With Non-Resolved Graphs
One of the truisms about data science is that 80% of the work gets spent on cleaning up data. When youโre working graph data science, this problem becomes even more poignant because unless you can link your data correctly, itโs not really a graph.
Within any single source of data, there tend to be duplicate records. For example, one personโs name could be spelled โElizabeth Restonโ and โLiz Restonโ across different records. Simple approaches for handling data would miss that these are the same person. Loading the data into a graph would split the information for this person across multiple nodes, making any downstream uses incorrect.
This issue becomes even harder to handle as you add more data sources. For example, if thereโs no trusted โkeyโ shared between two data sources โ such as an account number, email address, or other unique identifier โ then connections will get missed. Again, the messy data makes for less โknowledgeโ in a graph, and makes any downstream uses incorrect.
Introducing Entity Resolved Knowledge Graphs (ERKG)
Developing knowledge requires a process of accumulating context. For example, knowledge graphs used in production in industry tend to combine diverse datasets into one consistent graph. Itโs like connecting puzzle pieces in a jigsaw puzzle: the randomly scattered puzzle pieces that spill out of a box wonโt show the full picture for the solved puzzle. Similarly, as we integrate multiple datasets into a knowledge graph, the connections create the โknowledgeโ within the graph.
By using an Entity Resolved Knowledge Graph, we fix problems in graph data which would otherwise lead to false results. Example of popularly used graph analytics include:
- Pathing: one of the most common uses of a knowledge graph is to determine how to get from one node to another node. For example, is this suspect connected in any way to the missing person? Or how does our international distributor connect with this known counterfeiting operation? If an unresolved graph such pathing questions cannot be detected.
- Nearest neighbors: another popular use of knowledge graphs is to find similar elements. Suppose you have a graph which includes enterprise sales data, and youโve identified the nodes representing your top accounts. Running a nearest neighbor algorithm can help find other potential customers with similar characteristics, which could become valuable sales leads โ in other words, finding the proverbial โneedle in a haystackโ within your data. Nearest neighbor searches are also used to power AI applications, for example where knowledge graphs augment an AI chat session, providing results that link to your data and help mitigate โhallucinationsโ in the AI results. Without entity resolution, the duplicates within your KG scramble the nearest neighbor results.
- Centrality scores: to find the โmost connectedโ nodes in a graph, we use a family of graph algorithms for measuring centrality, i.e., the degree of connectedness for nodes. These scores are used to rank the nodes within a graph based on network flows: for example who are the major โconnectorsโ in a social graph, or where was Patient Zero in a graph about the spread of an epidemic. Probably the most famous example of using centrality scores is the PageRank algorithm used by Google Search. Unresolved graphs falsify centrality scores.
- Node/Link prediction: graph machine learning techniques are often used to help improve the knowledge in a graph, by suggesting where some nodes or links might be missing in the source data. For example, node prediction used in an ecommerce graph can help recommend products to a customer, based on their prior history. In law enforcement, link prediction helps inform investigators where some vital missing link could help solve a case. However, without entity resolution, a graph built with duplicate records will cause these kinds of AI models to be trained poorly.
Approaches & Best Practices to ER a KG
One of the best practices for working with knowledge graphs is to think in terms of โmultiple levels of detailโ when handling the data. KG experts talk about a data graph at the lowest level. These provide lots of detail about the provenance for each data element: where did it come from, what version was used, has it been verified, and so on. You will want resolved entities to maintain full attribution โ whereby every record know which source system and record key it represents. Full attribution is essential to support audits as well as feedback loops correction.
The entity resolution process can be either done in the data layer on the way to the graph, or applied to data that was first staged in a graph. Hybrid architectures would work as well.
Either entity resolution architecture eliminates problems of disconnected elements in the graph. Of course, many organizations try to build custom analytics into their graph applications, to approximate entity resolution. Beware that handling duplicates in the later stages of a graph construction workflow becomes much more expensive computationally, itโs difficult to perform correctly, and is not fit for purpose.
Another best practice is to ensure one can explain why duplicate input records have been resolved, including which features contributed to this decisions.
Use Cases for Entity Resolved Knowledge Graphs
- Fraud detection
- Risk scoring
- Supply chain visibility
- Investigation support
- Customer service, upsell, cross-sell
- Marketing
References & Additional Resources
How Should You Evaluate Entity Resolution Solutions?
Download the Entity Resolution Buyerโs Guide to learn all about how to evaluate and select an entity resolution solution. This guide will give you the knowledge and tools to make an informed decision on which type of entity resolution solution is right for your needs today and into the future.
Senzing Smarter Entity Resolution
Senzing is the first purpose-built real-time AI for entity resolution. Senzing software makes it easy and affordable to add advanced entity resolution capabilities to your enterprise systems and commercial applications. The Senzing API provides highly accurate data matching and relationship detection to improve analytics and decision-making, without requiring entity resolution experts..
โข Minimal data preparation is required with Senzing entity resolution.
โข No tuning, training or entity resolution experts are needed.
โข The Senzing API runs on premises, in the cloud or hybrid.
โข No data flows to Senzing, Inc.
How Do You Get Started with Smarter Entity Resolution?
If youโd like to know more about Senzing entity resolution:
Consult with an Expert โ Schedule a call with a Senzing entity resolution expert to discuss your requirements.
Try it Yourself โ There are three easy ways to take Senzing entity resolution for a test drive: a simple desktop evaluation tool (for Windows or Mac) and QuickStarts for Linux and Docker. You can install the software, load data and evaluate results in as little as 15 minutes.