Senzing v4 SDK Docker Quickstart Guide

Use Senzing v4 with the example Docker images quickly.

Depending on the speed of your internet connection, this may only take you a few minutes.

Info

Senzing provides 500 source records for ingestion and evaluation for free. If you require additional records for an evaluation, or any assistance when following this guide, please contact Senzing Support. Support is 100% FREE!

Prerequisites

  • A Docker environment for running linux/amd64 containers.
    Note

    To run the docker commands without sudo your user should be a member of the docker group

    • If your userid is not a member of the docker group and you use sudo to run the docker command, you will need to add --preserve-env after sudo
    sudo --preserve-env docker run ...
    
  • Access to dockerhub.com to pull images.
  • A PostgreSQL database set up and tuned, and the username, password, IP address, port, and database name needed to connect.
  • Senzing supports air-gapped deployments but this quickstart won’t cover pulling the images for offline use. Those running air-gapped deployments should be familiar with that already.

Getting it Done!

Setting up - SENZING_ENGINE_CONFIGURATION_JSON

The Senzing engine configuration is set via the SENZING_ENGINE_CONFIGURATION_JSON environment variable. This is further described here.

To set the environment variable, run the following in your environment, replacing the CONNECTION details with those of your database:

export SENZING_ENGINE_CONFIGURATION_JSON='{"PIPELINE":{"CONFIGPATH":"/etc/opt/senzing","RESOURCEPATH":"/opt/senzing/er/resources","SUPPORTPATH":"/opt/senzing/data"},"SQL":{"CONNECTION":"postgresql://username:password@127.0.0.1:5432:database"}}

Formatted for readability:

export SENZING_ENGINE_CONFIGURATION_JSON='{
 "PIPELINE" : {
 	"CONFIGPATH" : "/etc/opt/senzing",
 	"RESOURCEPATH" : "/opt/senzing/er/resources",
 	"SUPPORTPATH" : "/opt/senzing/data",
 },
 "SQL" : {
 	"CONNECTION" : "CONNECTION=postgresql://username:password@127.0.0.1:5432:database" }
 }
Tip

If you have PostgreSQL installed on localhost (127.0.0.1), you must use either:

  • the docker network (--network=host) and localhost (127.0.0.1) docker run --network=host --rm --env SENZING_ENGINE_CONFIGURATION_JSON ...
  • external IP of your host and NOT 127.0.0.1. From the perspective of the docker container, 127.0.0.1 is itself.

Initialize the Senzing repository database

It takes only a few seconds to initialize the Senzing repository database using Docker. Once SENZING_ENGINE_CONFIGURATION_JSON is set, the setup is a single Docker command. Initialization first adds the Senzing schema and then the Senzing Entity Resolution Configuration to the database.

Info

This initialization step only needs to be run once!

docker run --rm --env SENZING_ENGINE_CONFIGURATION_JSON senzing/init-database --install-senzing-er-configuration

DONE! Yes, REALLY!

At this point, the Senzing database is initialized. You can run any of the Senzing v4 Docker images, as well as any senzingsdk-runtime based custom Docker images you create yourself by following this pattern. Yes, REALLY.

To utilize sz_explorer, sz_configtool, or sz_command.py, run the senzingsdk-tools Docker image and execute those commands from that environment:

docker run --rm -it -e SENZING_ENGINE_CONFIGURATION_JSON senzing/senzingsdk-tools

Loading the Truth Set Data

To get started with some data, load the Senzing example truth set by:

Understanding the Truth Set Files

The truth set demo includes three main types of files, each serving a distinct purpose in entity resolution:

  • Customers: Represent your subjects of interest such as these customers. But they could easily be employees for insider threat detection, vendors for supply chain management, or other tracked entities. These records form the core dataset you aim to analyze and resolve.
  • Watchlist: Contains entities you want to avoid due to potential risks. Examples include past fraudsters, known terrorists, money launderers, or entities on mandated exclusion lists (e.g., sanctions lists like OFAC). By integrating watchlist data, Senzing helps you identify high-risk entities by matching them against your subject records. This enables risk assessment by flagging connections to undesirable entities, helping you mitigate threats like fraud, regulatory non-compliance, or reputational damage.
  • Reference List: Includes supplemental data purchased or acquired about individuals (e.g., demographics, past addresses, contact methods) or companies (e.g., firmographics, corporate structure, executives, ownership). This data enriches your understanding of your subjects by providing additional context, such as historical addresses to track entity movement or corporate hierarchies to identify ultimate beneficial owners. This deeper insight improves entity resolution accuracy and supports use cases like customer profiling or due diligence.

Download the files

wget https://raw.githubusercontent.com/Senzing/truth-sets/main/truthsets/demo/customers.jsonl
wget https://raw.githubusercontent.com/Senzing/truth-sets/main/truthsets/demo/reference.jsonl
wget https://raw.githubusercontent.com/Senzing/truth-sets/main/truthsets/demo/watchlist.jsonl

Add the data source

docker run --rm -it -e SENZING_ENGINE_CONFIGURATION_JSON senzing/senzingsdk-tools
sz_configtool
Type help or ? for help
addDataSource CUSTOMERS
Data source successfully added!
addDataSource REFERENCE
Data source successfully added!
addDataSource WATCHLIST
Data source successfully added!
save
WARNING: This will immediately update the current configuration in the Senzing repository with the current configuration!

Are you certain you wish to proceed and save changes? (y/n)
y
Configuration changes saved
quit

Load the files

Keep in mind that the file path is from the perspective of the Docker container and this example script requires the location of the files ${PWD} in this case) to be mapped into /data inside the container.

docker run -it --rm -u $UID -v ${PWD}:/data --env SENZING_ENGINE_CONFIGURATION_JSON senzing/sz_file_loader -f /data/customers.jsonl
docker run -it --rm -u $UID -v ${PWD}:/data --env SENZING_ENGINE_CONFIGURATION_JSON senzing/sz_file_loader -f /data/reference.jsonl
docker run -it --rm -u $UID -v ${PWD}:/data --env SENZING_ENGINE_CONFIGURATION_JSON senzing/sz_file_loader -f /data/watchlist.jsonl

Explore the results

See EDA Tools: Basic Exploration

docker run --rm -it -e SENZING_ENGINE_CONFIGURATION_JSON senzing/senzingsdk-tools
sz_explorer

  ____|  __ \     \
  __|    |   |   _ \   Senzing
  |      |   |  ___ \  Exploratory Data Analysis
 _____| ____/ _/    _\


Type help or ? to list commands.

(szeda) get CUSTOMERS 1070

Entity summary for entity 98: Jie Wang
┼───────────┼────────────────────────────────────────┼─────────────────┼
│ Sources   │ Features                               │ Additional Data │
┼───────────┼────────────────────────────────────────┼─────────────────┼
│ CUSTOMERS │ NAME: Jie Wang (PRIMARY)               │ AMOUNT: 100     │
│ 1069      │ NAME: 王杰 (NATIVE)                    │ AMOUNT: 200     │
│ 1070      │ DOB: 9/14/93                           │ DATE: 1/26/18   │
│           │ GENDER: Male                           │ DATE: 1/27/18   │
│           │ GENDER: M                              │ STATUS: Active  │
│           │ ADDRESS: 12 Constitution Street (HOME) │                 │
│           │ NATIONAL_ID: 832721 Hong Kong          │                 │
│           │ NATIONAL_ID: 832721                    │                 │
│           │ RECORD_TYPE: PERSON                    │                 │
┼───────────┼────────────────────────────────────────┼─────────────────┼
│ REFERENCE │ NAME: Wang Jie (PRIMARY)               │ CATEGORY: Owner │
│ 2013      │ DOB: 1993-09-14                        │ STATUS: Current │
│           │ RECORD_TYPE: PERSON                    │                 │
│           │ REL_POINTER: 2011 (OWNS 60%)           │                 │
┼───────────┼────────────────────────────────────────┼─────────────────┼
└── Disclosed relation (1)
    └── --> OWNS 60% (1)
        └── 182: Hajah Mamunah Jln Pisang CUSTOMERS (1) | REFERENCE (1) +REL_POINTER(OWNS 60%:)

Mapping Your Own Data

At this point you are ready to map and load your own data. Mapping is the process of converting your source data into a structure Senzing understands ready to load.

Info

To learn more about mapping, the dictionary of terms and samples to help prepare your own data sources for loading and entity resolving review the Senzing Entity Specification.

Consider these examples, in your data an attribute describing a personal full name is in a database table with the column name fullname. In Senzing a full name is represented by the term NAME_FULL. Similarly for address line 1, your database column is named addressline1, in Senzing this is represented by the term ADDR_LINE1.

Your task in mapping is to determine which attributes in your data source(s) are appropriate for use in entity resolution, extract those attributes and construct the structure describing those attributes to send to Senzing. The following is an example of a Senzing mapped JSON structure for an entry from a data source.

{
"DATA_SOURCE": "CUSTOMERS",
"RECORD_ID": "1001",
"RECORD_TYPE": "PERSON",
"PRIMARY_NAME_LAST": "Smith",
"PRIMARY_NAME_FIRST": "Robert",
"DATE_OF_BIRTH": "12/11/1978",
"ADDR_TYPE": "MAILING",
"ADDR_LINE1": "123 Main Street, Las Vegas NV 89132",
"PHONE_TYPE": "HOME",
"PHONE_NUMBER": "702-919-1300",
"EMAIL_ADDRESS": "bsmith@work.com",
}

Start developing

Members of our team have created GitHub projects that show more of what you can do quickly:

Info

If you have any questions, contact Senzing Support. Support is 100% FREE!