Linux Quickstart Guide
For access to the Senzing v4 Beta SDK, please contact your Senzing sales representative, or email Senzing Sales
Senzing provides 250 source records for ingestion and evaluation for free. If you require additional records for an evaluation, or any assistance when following this guide, please contact Senzing Support. Support is 100% FREE!
The installation steps add the Senzing software repository to your Linux distribution, these steps only need to be completed once. During installation you will be asked to accept the End User License Agreement (EULA). On Red Hat based distributions you will also be prompted to accept the Senzing public key.
To expedite getting started an embedded SQLite database is configured for use when creating a Senzing project. SQLite is easy to evaluate with, for production systems an enterprise level RDBMS such as Postgres would be used. For additional information see Technical - Database.
Installing Senzing - Debian Based Distributions
Add APT repository
For access to the Senzing v4 Beta SDK, please contact your Senzing sales representative, or email Senzing Sales
Install package
The latest version of Senzing can now be installed. As part of the installation you will be asked to accept the End User License Agreement (EULA).
sudo apt update
sudo apt install senzingsdk-poc
Installing Senzing - Red Hat Based Distributions
Add YUM repository
For access to the Senzing v4 Beta SDK, please contact your Senzing sales representative, or email Senzing Sales
Install package
The latest version of Senzing can now be installed. As part of the installation you will be asked to accept the End User License Agreement (EULA), this can be viewed at https://senzing.com/end-user-license-agreement/
sudo yum install senzingsdk-poc
During the first installation of Senzing to a system you will also be prompted to accept the Senzing public key. Accepting the prompt imports the public key to verify future installations come from Senzing.
Retrieving key from https://senzing-production-yum.s3.amazonaws.com/senzing-production.key
Importing GPG key 0xD99E309D:
Userid : "Senzing, Inc. <buildmgr@senzing.com>"
Fingerprint: e38c a28c f7ab 06d5 120b bda7 4f67 bf4d d99e 309d
From : https://senzing-production-yum.s3.amazonaws.com/senzing-production.key
Is this ok [y/N]: y
Create a Senzing Project
To begin using Senzing, first create a project. This deploys an instance of Senzing into a specified path. The project folder must not already exist and will be created by the /opt/senzing/er/bin/sz_create_project
utility.
/opt/senzing/er/bin/sz_create_project <project_path>
Creating and using projects provides independent and isolated instances of Senzing. Projects can be upgraded from prior Senzing versions.
This command creates the Senzing project in your current users home path in a new directory named senzingv4beta.
/opt/senzing/er/bin/sz_create_project ~/senzingv4beta
Configure Environment
To utilize your new project, environment variables need to be set indicating where to find resources for the project. The setupEnv script is project dependent and needs to be run whenever you are working with a project, for example between logging in and out of shell sessions. To setup the environment, change to your project directory and source the setupEnv
file.
cd <project_path>
source setupEnv
<project_path>
refers to the path specified with the /opt/senzing/er/bin/sz_create_project
command when creating a project.
Updating Database with Senzing Configuration
A Senzing instance is configured with a JSON document, on a fresh installation this document needs to be registered in the Senzing database. This step only needs to be performed once initially for a new project. From the root of your project directory, run the following command and enter y
when prompted:
cd <project_path>
source setupEnv
./bin/sz_setup_config
Loading the Sample Truth Set Data
To get started with the Senzing sample truthset data, load the Senzing example truth set:
- Download the truth set files in Senzing JSON format
- Use
sz_configtool
to add the data sources that the files use to the Senzing configuration - Use
./bin/sz_file_loader
to load the data
Download The Truth Set Files
wget https://raw.githubusercontent.com/Senzing/truth-sets/main/truthsets/demo/customers.json
wget https://raw.githubusercontent.com/Senzing/truth-sets/main/truthsets/demo/reference.json
wget https://raw.githubusercontent.com/Senzing/truth-sets/main/truthsets/demo/watchlist.json
Add Data Source Codes
The three sample files to load represent three different data sources: customers, a watchlist, and reference data. Records loaded into Senzing have an identifier attribute called DATA_SOURCE
, this is an arbitrary value to describe and identify where source records originated from and is useful designation when analyzing and reporting on entities.
Each of the records in the three files to load use one of the DATA_SOURCE
codes: CUSTOMERS
, REFERENCE
or WATCHLIST
. Before data can be loaded using these values, they need to be added to the Senzing configuration. This only needs to be completed once for each DATA_SOURCE
value. The sz_configtool
utility performs this configuration change, to start sz_configtool
:
cd <project_path>
source setupEnv
./bin/sz_configtool
Once at the (g2cfg)
prompt enter the following commands:
addDataSource CUSTOMERS
addDataSource REFERENCE
addDataSource WATCHLIST
save
y
quit
$ python3 python/G2ConfigTool.py
Initializing Senzing engines...
Welcome to G2Config Tool. Type help or ? to list commands.
(g2cfg) addDataSource CUSTOMERS
Successfully added!
(g2cfg) addDataSource REFERENCE
Successfully added!
(g2cfg) addDataSource WATCHLIST
Successfully added!
(g2cfg) save
WARNING: This will immediately update the current configuration in the Senzing repository with the current configuration!
Are you certain you wish to proceed and save changes? (y/n) y
Configuration saved to Senzing repository.
Initializing Senzing engines...
(g2cfg) quit
Loading
With the data source codes added, load each file with the following commands:
cd <project_path>
source setupEnv
./bin/sz_file_loader -f customers.json
./bin/sz_file_loader -f reference.json
./bin/sz_file_loader -f watchlist.json
Senzing operates in real-time, as each record is loaded it completes the entity resolution process. The outcome is every record within and across each file has been entity resolved against all other data and the outcomes persisted in the Senzing database.
To learn more about the entity resolution process, check out these Senzing white papers.
Exploring Entity Resolution Outcomes
Loading data into Senzing completes the entity resolution processing which can now be reviewed, explored and evaluated with the Exploratory Data Analysis (EDA) tools:
./bin/sz_explorer
for understanding how and why entities are resolved and related./bin/sz_snapshot
for calculating reports to be viewed with G2Explorer./bin/sz_audit
for comparing results between Senzing and other technologies or comparing Senzing results between configurations
To begin exploring the EDA tools, review Exploratory Data Analysis (EDA) tools. Once you have an overview of EDA tools and their functionality it is recommended to explore G2Explorer and G2Snapshot on the previously loaded truth set data.
The EDA tools articles outline loading the truth set data, this doesn’t need to be completed it was completed in the prior step.
Mapping Your Own Data
At this point you are ready to map and load your own data. Mapping is the process of converting your source data into a structure Senzing understands ready to load.
To learn more about mapping, the dictionary of terms and samples to help prepare your own data sources for loading and entity resolving review the Senzing Generic Entity Specification.
Consider these examples, in your data an attribute describing a personal full name is in a database table with the column name fullname
. In Senzing a full name is represented by the term NAME_FULL
. Similarly for address line 1, your database column is named addressline1
, in Senzing this is represented by the term ADDR_LINE1
.
Your task in mapping is to determine which attributes in your data source(s) are appropriate for use in entity resolution, extract those attributes and construct the structure describing those attributes to send to Senzing. The following is an example of a Senzing mapped JSON structure for an entry from a data source.
{
"DATA_SOURCE": "CUSTOMERS",
"RECORD_ID": "1001",
"RECORD_TYPE": "PERSON",
"PRIMARY_NAME_LAST": "Smith",
"PRIMARY_NAME_FIRST": "Robert",
"DATE_OF_BIRTH": "12/11/1978",
"ADDR_TYPE": "MAILING",
"ADDR_LINE1": "123 Main Street, Las Vegas NV 89132",
"PHONE_TYPE": "HOME",
"PHONE_NUMBER": "702-919-1300",
"EMAIL_ADDRESS": "bsmith@work.com",
}
Start Developing
Members of our team have created GitHub projects that show more of what you can do quickly:
- Python: Streamlined RabbitMQ, and Redo processing examples