Senzing v4 Linux Quickstart Guide

This article outlines installing the Senzing SDK on Linux, performing loading and entity resolution, analysis and exploration of the outcomes of entity resolution and how to prepare and load your own data to Senzing.

Tip

Senzing provides 250 source records for ingestion and evaluation for free. If you require additional records for an evaluation, or any assistance when following this guide, please contact Senzing Support. Support is 100% FREE!

The installation steps add the Senzing software repository to your Linux distribution, these steps only need to be completed once. During installation you will be asked to accept the End User License Agreement (EULA). On Red Hat based distributions you will also be prompted to accept the Senzing public key.

Info

To expedite getting started an embedded SQLite database is configured for use when creating a Senzing project. SQLite is easy to evaluate with, for production systems an enterprise level RDBMS such as Postgres would be used. For additional information see Technical - Database.

Installing Senzing - Debian Based Distributions

Add APT repository

Add and enable the Senzing beta APT repository to the currently configured list managed by apt. This only need to be completed once.

sudo apt install apt-transport-https

wget https://senzing-beta-apt.s3.amazonaws.com/senzingbetarepo_2.0.1-1_all.deb

sudo apt install ./senzingbetarepo_2.0.1-1_all.deb

sudo apt update

Install package

Info

The latest version of Senzing can now be installed. As part of the installation you will be asked to accept the End User License Agreement (EULA).

sudo apt update

sudo apt install senzingsdk-poc

Installing Senzing - Red Hat Based Distributions

Add YUM repository

Add and enable the Senzing beta YUM repository to the currently configured list managed by yum. This step only needs to be completed once.

sudo yum install https://senzing-beta-yum.s3.amazonaws.com/senzingbetarepo-2.0.1-1.noarch.rpm

Install package

The latest version of Senzing can now be installed.

Info

The latest version of Senzing can now be installed. As part of the installation you will be asked to accept the End User License Agreement (EULA).

sudo yum install senzingsdk-poc

Tip

During the first installation of Senzing to a system you will also be prompted to accept the Senzing public key. Accepting the prompt imports the public key to verify future installations come from Senzing.

Retrieving key from https://senzing-production-yum.s3.amazonaws.com/senzing-production.key
Importing GPG key 0xD99E309D:
 Userid : "Senzing, Inc. <buildmgr@senzing.com>"
 Fingerprint: e38c a28c f7ab 06d5 120b bda7 4f67 bf4d d99e 309d
 From : https://senzing-production-yum.s3.amazonaws.com/senzing-production.key
Is this ok [y/N]: y

Create a Senzing Project

To begin using Senzing, first create a project. This deploys an instance of Senzing into a specified path. The project folder must not already exist and will be created by the /opt/senzing/er/bin/sz_create_project utility.

/opt/senzing/er/bin/sz_create_project <project_path>

Creating and using projects provides independent and isolated instances of Senzing. Projects can be upgraded from prior Senzing versions.

For example, the following command creates the Senzing project in your current users home path in a new directory named senzingv4beta:

/opt/senzing/er/bin/sz_create_project ~/senzingv4beta

Configure Environment

To utilize your new project, environment variables need to be set indicating where to find resources for the project. The setupEnv script is project dependent and needs to be run whenever you are working with a project, for example between logging in and out of shell sessions. To setup the environment, change to your project directory and source the setupEnv file.

cd <project_path>

source setupEnv

Info

<project_path> refers to the path specified with the /opt/senzing/er/bin/sz_create_project command when creating a project.

Updating Database with Senzing Configuration

A Senzing instance is configured with a JSON document, on a fresh installation this document needs to be registered in the Senzing database. This step only needs to be performed once initially for a new project. From the root of your project directory, run the following command and enter y when prompted:

cd <project_path>

source setupEnv

./bin/sz_setup_config

Loading the Sample Truth Set Data

To get started with the Senzing sample truthset data, load the Senzing example truth set:

Download the truth set files in Senzing JSON format
Use sz_configtool to add the data sources that the files use to the Senzing configuration
Use ./bin/sz_file_loader to load the data

Download The Truth Set Files

wget https://raw.githubusercontent.com/Senzing/truth-sets/main/truthsets/demo/customers.json

wget https://raw.githubusercontent.com/Senzing/truth-sets/main/truthsets/demo/reference.json

wget https://raw.githubusercontent.com/Senzing/truth-sets/main/truthsets/demo/watchlist.json

Add Data Source Codes

The three sample files to load represent three different data sources: customers, a watchlist, and reference data. Records loaded into Senzing have an identifier attribute called DATA_SOURCE, this is an arbitrary value to describe and identify where source records originated from and is useful designation when analyzing and reporting on entities.

Each of the records in the three files to load use one of the DATA_SOURCE codes: CUSTOMERS, REFERENCE or WATCHLIST. Before data can be loaded using these values, they need to be added to the Senzing configuration. This only needs to be completed once for each DATA_SOURCE value. The sz_configtool utility performs this configuration change, to start sz_configtool:

cd <project_path>

source setupEnv

./bin/sz_configtool

Once at the (g2cfg) prompt enter the following commands:

addDataSource CUSTOMERS

addDataSource REFERENCE

addDataSource WATCHLIST

save

quit

$ python3 python/G2ConfigTool.py

Initializing Senzing engines...

Welcome to G2Config Tool. Type help or ? to list commands.

(g2cfg) addDataSource CUSTOMERS

Successfully added!

(g2cfg) addDataSource REFERENCE

Successfully added!

(g2cfg) addDataSource WATCHLIST

Successfully added!

(g2cfg) save

WARNING: This will immediately update the current configuration in the Senzing repository with the current configuration!

Are you certain you wish to proceed and save changes? (y/n)  y

Configuration saved to Senzing repository.

Initializing Senzing engines...

(g2cfg) quit

Loading

With the data source codes added, load each file with the following commands:

cd <project_path>

source setupEnv

./bin/sz_file_loader -f customers.json

./bin/sz_file_loader -f reference.json

./bin/sz_file_loader -f watchlist.json

Senzing operates in real-time, as each record is loaded it completes the entity resolution process. The outcome is every record within and across each file has been entity resolved against all other data and the outcomes persisted in the Senzing database.

Info

To learn more about the entity resolution process, check out these Senzing white papers.

Exploring Entity Resolution Outcomes

Loading data into Senzing completes the entity resolution processing which can now be reviewed, explored and evaluated with the Exploratory Data Analysis (EDA) tools:

./bin/sz_explorer for understanding how and why entities are resolved and related
./bin/sz_snapshot for calculating reports to be viewed with G2Explorer
./bin/sz_audit for comparing results between Senzing and other technologies or comparing Senzing results between configurations

To begin exploring the EDA tools, review Exploratory Data Analysis (EDA) tools. Once you have an overview of EDA tools and their functionality it is recommended to explore G2Explorer and G2Snapshot on the previously loaded truth set data.

Tip

The EDA tools articles outline loading the truth set data, this doesn’t need to be completed it was completed in the prior step.

Mapping Your Own Data

At this point you are ready to map and load your own data. Mapping is the process of converting your source data into a structure Senzing understands ready to load.

Info

To learn more about mapping, the dictionary of terms and samples to help prepare your own data sources for loading and entity resolving review the Senzing Generic Entity Specification.

Consider these examples, in your data an attribute describing a personal full name is in a database table with the column name fullname. In Senzing a full name is represented by the term NAME_FULL. Similarly for address line 1, your database column is named addressline1, in Senzing this is represented by the term ADDR_LINE1.

Your task in mapping is to determine which attributes in your data source(s) are appropriate for use in entity resolution, extract those attributes and construct the structure describing those attributes to send to Senzing. The following is an example of a Senzing mapped JSON structure for an entry from a data source.

{
"DATA_SOURCE": "CUSTOMERS",
"RECORD_ID": "1001",
"RECORD_TYPE": "PERSON",
"PRIMARY_NAME_LAST": "Smith",
"PRIMARY_NAME_FIRST": "Robert",
"DATE_OF_BIRTH": "12/11/1978",
"ADDR_TYPE": "MAILING",
"ADDR_LINE1": "123 Main Street, Las Vegas NV 89132",
"PHONE_TYPE": "HOME",
"PHONE_NUMBER": "702-919-1300",
"EMAIL_ADDRESS": "bsmith@work.com",
}

Start Developing

Members of our team have created GitHub projects that show more of what you can do quickly:

Python: Streamlined RabbitMQ, and Redo processing examples