Senzing v4 Linux Quickstart Guide
This article outlines installing the Senzing SDK on Linux, performing loading and entity resolution, analysis and exploration of the outcomes of entity resolution and how to prepare and load your own data to Senzing.
Senzing provides 500 source records for ingestion and evaluation for free. If you require additional records for an evaluation, or any assistance when following this guide, please contact Senzing Support. Support is 100% FREE!
The installation steps add the Senzing software repository to your Linux distribution, these steps only need to be completed once. During installation you will be asked to accept the End User License Agreement (EULA). On Red Hat based distributions you will also be prompted to accept the Senzing public key.
To expedite getting started an embedded SQLite database is configured for use when creating a Senzing project. SQLite is easy to evaluate with, for production systems an enterprise level RDBMS such as Postgres would be used. For additional information see Technical - Database.
Installing Senzing - Debian Based Distributions
Add APT repository
Add and enable the Senzing APT repository to the currently configured list managed by apt. This only need to be completed once.
sudo apt install apt-transport-https
wget https://senzing-production-apt.s3.amazonaws.com/senzingrepo_2.0.1-1_all.deb
sudo apt install ./senzingrepo_2.0.1-1_all.deb
sudo apt update
Install package
The latest version of Senzing can now be installed. As part of the installation you will be asked to accept the End User License Agreement (EULA).
sudo apt update
sudo apt install senzingsdk-poc
Installing Senzing - Red Hat Based Distributions
Add YUM repository
Add and enable the Senzing YUM repository to the currently configured list managed by yum. This step only needs to be completed once.
sudo yum install https://senzing-production-yum.s3.amazonaws.com/senzingrepo-2.0.1-1.noarch.rpm
Install package
The latest version of Senzing can now be installed.
The latest version of Senzing can now be installed. As part of the installation you will be asked to accept the End User License Agreement (EULA).
sudo yum install senzingsdk-poc
During the first installation of Senzing to a system you will also be prompted to accept the Senzing public key. Accepting the prompt imports the public key to verify future installations come from Senzing.
Retrieving key from https://senzing-production-yum.s3.amazonaws.com/senzing-production.key
Importing GPG key 0xD99E309D:
Userid : "Senzing, Inc. <buildmgr@senzing.com>"
Fingerprint: e38c a28c f7ab 06d5 120b bda7 4f67 bf4d d99e 309d
From : https://senzing-production-yum.s3.amazonaws.com/senzing-production.key
Is this ok [y/N]: y
Create a Senzing Project
To begin using Senzing, first create a project. This deploys an instance of Senzing into a specified path. The project folder must not already exist and will be created by the /opt/senzing/er/bin/sz_create_project
utility.
/opt/senzing/er/bin/sz_create_project <senzing_project_path>
Creating and using projects provides independent and isolated instances of Senzing. Projects can be upgraded from prior Senzing versions.
For example, the following command creates the Senzing project in your current users home path in a new directory named senzing:
/opt/senzing/er/bin/sz_create_project ~/senzing
Configure Environment
To utilize your new project, environment variables need to be set indicating where to find resources for the project. The setupEnv script is project dependent and needs to be run whenever you are working with a project, for example between logging in and out of shell sessions. To setup the environment, change to your project directory and source the setupEnv
file.
cd <senzing_project_path>
source setupEnv
<senzing_project_path>
refers to the path specified with the /opt/senzing/er/bin/sz_create_project
command when creating a project.
Updating Database with Senzing ER Configuration
A Senzing instance is configured with a Senzing Entity Resolution configuration. The Senzing ER configuration is stored as a JSON document. On a fresh installation this configuration needs to be registered in the Senzing database. This step only needs to be performed once initially for a new project. From the root of your project directory, run the following command and enter y
when prompted:
cd <senzing_project_path>
source setupEnv
./bin/sz_setup_config
Loading the Truth Set Data
To get started with some data, load the Senzing example truth set by:
Understanding the Truth Set Files
The truth set demo includes three main types of files, each serving a distinct purpose in entity resolution:
- Customers: Represent your subjects of interest such as these customers. But they could easily be employees for insider threat detection, vendors for supply chain management, or other tracked entities. These records form the core dataset you aim to analyze and resolve.
- Watchlist: Contains entities you want to avoid due to potential risks. Examples include past fraudsters, known terrorists, money launderers, or entities on mandated exclusion lists (e.g., sanctions lists like OFAC). By integrating watchlist data, Senzing helps you identify high-risk entities by matching them against your subject records. This enables risk assessment by flagging connections to undesirable entities, helping you mitigate threats like fraud, regulatory non-compliance, or reputational damage.
- Reference List: Includes supplemental data purchased or acquired about individuals (e.g., demographics, past addresses, contact methods) or companies (e.g., firmographics, corporate structure, executives, ownership). This data enriches your understanding of your subjects by providing additional context, such as historical addresses to track entity movement or corporate hierarchies to identify ultimate beneficial owners. This deeper insight improves entity resolution accuracy and supports use cases like customer profiling or due diligence.
Download the files
wget https://raw.githubusercontent.com/Senzing/truth-sets/main/truthsets/demo/customers.jsonl
wget https://raw.githubusercontent.com/Senzing/truth-sets/main/truthsets/demo/reference.jsonl
wget https://raw.githubusercontent.com/Senzing/truth-sets/main/truthsets/demo/watchlist.jsonl
Add the data source
source setupEnv
sz_configtool
Type help or ? for help
addDataSource CUSTOMERS
Data source successfully added!
addDataSource REFERENCE
Data source successfully added!
addDataSource WATCHLIST
Data source successfully added!
save
WARNING: This will immediately update the current configuration in the Senzing repository with the current configuration!
Are you certain you wish to proceed and save changes? (y/n)
y
Configuration changes saved
quit
Load the files
source setupEnv
sz_file_loader -f customers.jsonl
sz_file_loader -f reference.jsonl
sz_file_loader -f watchlist.jsonl
Explore the results
See EDA Tools: Basic Exploration
source setupEnv
sz_explorer
____| __ \ \
__| | | _ \ Senzing
| | | ___ \ Exploratory Data Analysis
_____| ____/ _/ _\
Type help or ? to list commands.
(szeda) get CUSTOMERS 1070
Entity summary for entity 98: Jie Wang
┼───────────┼────────────────────────────────────────┼─────────────────┼
│ Sources │ Features │ Additional Data │
┼───────────┼────────────────────────────────────────┼─────────────────┼
│ CUSTOMERS │ NAME: Jie Wang (PRIMARY) │ AMOUNT: 100 │
│ 1069 │ NAME: 王杰 (NATIVE) │ AMOUNT: 200 │
│ 1070 │ DOB: 9/14/93 │ DATE: 1/26/18 │
│ │ GENDER: Male │ DATE: 1/27/18 │
│ │ GENDER: M │ STATUS: Active │
│ │ ADDRESS: 12 Constitution Street (HOME) │ │
│ │ NATIONAL_ID: 832721 Hong Kong │ │
│ │ NATIONAL_ID: 832721 │ │
│ │ RECORD_TYPE: PERSON │ │
┼───────────┼────────────────────────────────────────┼─────────────────┼
│ REFERENCE │ NAME: Wang Jie (PRIMARY) │ CATEGORY: Owner │
│ 2013 │ DOB: 1993-09-14 │ STATUS: Current │
│ │ RECORD_TYPE: PERSON │ │
│ │ REL_POINTER: 2011 (OWNS 60%) │ │
┼───────────┼────────────────────────────────────────┼─────────────────┼
└── Disclosed relation (1)
└── --> OWNS 60% (1)
└── 182: Hajah Mamunah Jln Pisang CUSTOMERS (1) | REFERENCE (1) +REL_POINTER(OWNS 60%:)
Mapping Your Own Data
At this point you are ready to map and load your own data. Mapping is the process of converting your source data into a structure Senzing understands ready to load.
To learn more about mapping, the dictionary of terms and samples to help prepare your own data sources for loading and entity resolving review the Senzing Entity Specification.
Consider these examples, in your data an attribute describing a personal full name is in a database table with the column name fullname
. In Senzing a full name is represented by the term NAME_FULL
. Similarly for address line 1, your database column is named addressline1
, in Senzing this is represented by the term ADDR_LINE1
.
Your task in mapping is to determine which attributes in your data source(s) are appropriate for use in entity resolution, extract those attributes and construct the structure describing those attributes to send to Senzing. The following is an example of a Senzing mapped JSON structure for an entry from a data source.
{
"DATA_SOURCE": "CUSTOMERS",
"RECORD_ID": "1001",
"RECORD_TYPE": "PERSON",
"PRIMARY_NAME_LAST": "Smith",
"PRIMARY_NAME_FIRST": "Robert",
"DATE_OF_BIRTH": "12/11/1978",
"ADDR_TYPE": "MAILING",
"ADDR_LINE1": "123 Main Street, Las Vegas NV 89132",
"PHONE_TYPE": "HOME",
"PHONE_NUMBER": "702-919-1300",
"EMAIL_ADDRESS": "bsmith@work.com",
}
Start Developing
Members of our team have created GitHub projects that show more of what you can do quickly:
- Senzing in 3 Python Calls
- Python: Task-based code snippets, Streamlined RabbitMQ consumer, SQS consumer, scalable search, and redo record processing examples
If you have any questions, contact Senzing Support. Support is 100% FREE!