Investigative Analysis · Three Variations
Combining Data for
Investigative Analysis
One recipe, three ways to cook it — from open public data
to your own POC to a production-grade deployment
* your mileage may vary
About this recipe
Investigative processes benefit from combining diverse data sets. The added context gives alerting functions and analysts what they need to make higher-quality decisions. In this recipe, we demonstrate combining three sources of data — a transactional, a derogatory, and a reference data source — and building a custom interactive web interface to explore the results.
-
A coding LLM — Claude Code, Cursor, Kiro, or VS Code with an AI extension
-
Senzing MCP server — connected to your LLM. Gives the LLM access to Senzing’s documentation, anti-patterns, datasets for testing, and guides for building dashboards and reports
-
Senzing SDK — installed locally, automatically via the MCP. No data flow to Senzing, Inc.
-
PPP Loans for Las Vegas — 3,488 Paycheck Protection Program recipients, Senzing-mapped and ready to load. The MCP provides the download URL; your LLM fetches it automatically.
-
US Dept. of Labor Violations for Las Vegas — 1,554 employer compliance actions and citations. Same pattern — MCP locates it, LLM downloads and loads it.
-
National Provider Index (NPI) for Las Vegas — 71,060 records from the CMS registry of licensed healthcare providers, individual and organizational. MCP-assisted download, no manual wrangling.
-
A Senzing evaluation license (250k records free) — ask your LLM to request one via the Senzing MCP. It will arrive by email almost immediately. For larger evaluation licenses, contact [email protected].
Have Claude Code, Cursor, Kiro, or VS Code with an AI assistant open on your machine. This recipe requires a LLM that can write and run code locally — not a web chat window.
In your LLM’s settings, add the Senzing MCP server. In Claude Code it’s under Settings → MCP Servers:
Confirm it’s working: ask your LLM “What Senzing tools do you have available?” — you should see a list including get_sample_data, mapping_workflow, and others.
Ask your LLM: “Request a Senzing evaluation license for me via the MCP.” The MCP’s submit_feedback tool will ask for your work email and send a 250k-record license to your inbox almost immediately. Save the senzing.lic file somewhere on your machine — you’ll attach it to the first prompt below. For larger volumes, contact [email protected].
Paste this into your LLM. Attach your senzing.lic file to the same message.
Important: Use the Senzing MCP for this task.
Goal: Stand up Senzing, load two pre-mapped datasets, and produce an entity match report. Size the setup for ~10M records.
Hard rules:
— Use a production-grade loader, not a demo/single-threaded process.
— Before recommending any loader pattern, check the Senzing MCP’s anti-patterns docs.Preferences:
— Print progress every few seconds as records load.Steps:
1. Deploy Senzing using the attached eval license file.
2. Load these two pre-mapped Senzing-ready snapshots: PPP loan data and Department of Labor compliance actions.
3. When both are fully loaded, generate a basic summary match report.
Example questions worth asking before moving on:
Important: Use the Senzing MCP for this task.
Goal: Build an interactive web UX to review and explore the entity matching results from the previous step.
Hard rules:
— Keep using the Senzing MCP for everything, paying special attention to the reporting guide for graph, dashboard, and why-match patterns.Features:
— Search (using the Senzing interface) across resolved entities by name, address, or other attributes.
— Network graph with labels to visualize the identity graph.
— “Why match?” with feature scores for any selected entity pair.
Try asking for improvements, for example:
Important: Use the Senzing MCP for this task.
Goal: Add NPI data to the existing setup and update the user interface to reflect the new source.
Steps:
1. Find and add the Senzing-ready NPI (National Provider Index) snapshot to the identity graph using the same production-grade loader as before.
2. Update the reporting/visualization user interface as needed to include NPI as a source.
Example ways to explore the identity graph:
-
A coding LLM — Claude Code, Cursor, Kiro, or VS Code with an AI extension
-
Senzing MCP server — connected to your LLM. Gives the LLM access to Senzing’s documentation, anti-patterns, datasets for testing, and guides for building dashboards and reports
-
Senzing SDK — installed locally, automatically via the MCP. No data flow to Senzing, Inc.
-
Bring Your Own Data (BYOD): A good starting point is to bring a customer file, a known fraudster or watchlist extract, and any third-party reference data you already have access to — e.g., OpenData.org, Dun & Bradstreet, Equifax.
-
Optionally, free data: For example opensanctions.org, opendata.org, or other data sources, many of which can be found in the Senzing CORD library (Collection of Relatable Data).
-
A Senzing evaluation license (250k records free) — ask your LLM to request one via the Senzing MCP. It will arrive by email almost immediately. For larger evaluation licenses, contact [email protected].
-
Export a vertical slice from each of your data sources (e.g., CSV) and place the files somewhere your LLM can access on the local filesystem. Note the file path.
-
A vertical slice applies the same selection criteria across multiple data sources so the sample properly represents the resolutions and relationships you’d see at full scale. The most common forms are geographic (e.g., all records for a city, state, or postal code) or alpha range (e.g., last names starting with “A*” or “Ly*”).
-
Apply the same slice criteria to each of your source files — the consistency is what makes the POC results meaningful.
-
Include attributes that inform entity resolution — names, addresses, phone numbers, emails, IDs, dates of birth, and similar identifying fields. The more of these present, the stronger the resolution.
-
It doesn’t need to be clean. Inconsistent names, messy addresses, conflicting dates of birth, historical values and missing fields — Senzing specializes in this.
Drop this prompt into your LLM, replacing the ‘path/to/your/files’ in Step 1 with your path name:
Important: Use the Senzing MCP for this task.
Goal: Map my data sources to Senzing JSON, ready for ingestion.
Hard rules:
— Use the Senzing MCP’s mapping workflow, not general training.
— Show me the mapping before applying it so I can review it.Steps:
1. Inspect the files in this directory: [path/to/your/files].
2. Propose a field mapping to Senzing features.
3. Once I approve, produce the Senzing-ready output files.
Important: Use the Senzing MCP for this task.
Goal: Load my mapped files into Senzing and produce a summary match report.
Hard rules:
— Use a production-grade loader, not a demo/single-threaded process.
— Before recommending any loader pattern, check the Senzing MCP’s anti-patterns docs.Preferences:
— Print progress every few seconds as records load.Steps:
1. Load the files we just mapped into Senzing.
2. When fully loaded, give me summary matching statistics.
Example questions worth asking before moving on:
Important: Use the Senzing MCP for this task.
Goal: Build an interactive web UX to review and explore the entity matching results from the previous step.
Hard rules:
— Keep using the Senzing MCP for everything, paying special attention to the reporting guide for graph, dashboard, and why-match patterns.Features:
— Search (using the Senzing interface) across resolved entities by name, address, or other attributes.
— Network graph with labels to visualize the identity graph.
— “Why match?” with feature scores for any selected entity pair.
Try asking for improvements, for example:
What’s next
Now that you have completed your POC and seen the results, it’s time to start planning your production deployment. Proceed to the Production Deployment recipe.
A production deployment is an engineering and organizational project. The considerations below are a starting point for scoping, not a complete specification. Every deployment is shaped by data volume, team structure, regulatory context, and the specific questions the system needs to answer.
The Senzing MCP is a useful reference throughout this process — tell your LLM your goals, sizing, and environment, and then ask it for architecture guidance, project planning tips, even a test plan.
The gap between a POC and a production deployment is mostly engineering and organizational, not technology. The resolution engine you used in the POC is the same one that runs at enterprise scale — the surrounding infrastructure is what grows.
A reasonable path forward:
What’s next
You have a working POC and a clear picture of what production would involve. The next step is a conversation with the Senzing team about architecture, licensing, and timeline.
Combining diverse data has historically been hard — regardless of the datasets involved. For investigative analysis, Senzing’s agentic entity resolution makes it a breeze.
Senzing resolves diverse data without you writing a single matching rule. The identity graph it builds can not only be used for investigative analysis, but also other workflows including alerting and reporting. For a full range of what’s possible, read this comprehensive guide on Identity Intelligence.
Diverse data sources — internal and external — visualized together in a single link chart. Relationships and connections that were previously invisible become immediately apparent.
Combining more data sources means higher-quality alerts with richer context. Adding a new internal or external source to the identity graph is straightforward — no custom integration work required.
No ETL pipelines to hand build. Accurate matching out of the box without model training or tuning. The Senzing MCP guides the LLM — the result: a fraction of the engineering effort.
Three prompts take you from zero to a working pipeline — ingest, visualize, extend. The same pattern that runs this POC scales to production. Use Agentic entity resolution to add data source N+1 with 100x less effort.
Part of the Senzing Cookbook — practical recipes for putting entity resolution to work, one dish at a time.