Senzing API Linux Quickstart Guide
This article outlines installing the Senzing APIs on Linux, performing loading and entity resolution, analysis and exploration of the outcomes of entity resolution and how to prepare and load your own data to Senzing.
Senzing provides 100k source records for ingestion and evaluation for free. If you require additional records for an evaluation, or any assistance when following this guide, please contact Senzing Support. Support is 100% FREE!
The installation steps add the Senzing software repository to your Linux distribution, these steps only need to be completed once. During installation you will be asked to accept the End User License Agreement (EULA). On Red Hat based distributions you will also be prompted to accept the Senzing public key.
For air-gapped installs, use our air-gapped systems guide to install the packages and then return here to complete.
Installing Senzing - Debian Based Distributions
Add repository
Add and enable the Senzing repository to the currently configured list managed by apt. This only need to be completed once.
sudo apt install apt-transport-https
The new APT senzingrepo
v2 repository package works only for Senzing versions >= 3.10.0 It detects architecture and platform. If a prior Senzing version is required, you must install the older senzingrepo v1 repository package: https://senzing-production-apt.s3.amazonaws.com/senzingrepo-1.0.1-1_amd64.deb
. Please contact Senzing Support if you have any questions.
wget https://senzing-production-apt.s3.us-east-1.amazonaws.com/senzingrepo_2.0.1-1_all.deb
sudo apt install ./senzingrepo_2.0.1-1_all.deb
sudo apt update
Install package
The latest version of Senzing can now be installed. As part of the installation you will be asked to accept the End User License Agreement (EULA).
sudo apt install senzingapi
Continue with Creating a Senzing Project…
Installing Senzing - Red Hat Based Distributions
Add repository
Add and enable the Senzing repository to the currently configured list managed by yum. This step only needs to be completed once.
The new YUM senzingrepo
v2 repository package works only for Senzing versions >= 3.10.0 It detects architecture and platform. If a prior Senzing version is required, you must install the older senzingrepo v1 repository package: https://senzing-production-yum.s3.amazonaws.com/senzingrepo-1.0.0-2.x86_64.rpm
. Please contact Senzing Support if you have any questions.
sudo yum install https://senzing-production-yum.s3.us-east-1.amazonaws.com/senzingrepo-2.0.1-1.noarch.rpm
Install package
The latest version of Senzing can now be installed. As part of the installation you will be asked to accept the End User License Agreement (EULA), this can be viewed at https://senzing.com/end-user-license-agreement/
sudo yum install senzingapi
During the first installation of Senzing to a system you will also be prompted to accept the Senzing public key. Accepting the prompt imports the public key to verify future installations come from Senzing.
Retrieving key from https://senzing-production-yum.s3.amazonaws.com/senzing-production.key
Importing GPG key 0xD99E309D:
Userid : "Senzing, Inc. <buildmgr@senzing.com>"
Fingerprint: e38c a28c f7ab 06d5 120b bda7 4f67 bf4d d99e 309d
From : https://senzing-production-yum.s3.amazonaws.com/senzing-production.key
Is this ok [y/N]: y
Create a Senzing Project
To begin using Senzing, first create a project. This deploys an instance of Senzing into a specified path. The project folder must not already exist and will be created by the G2CreateProject.py
utility.
Creating and using projects provides independent and isolated instances of Senzing. Projects can be upgraded from prior Senzing versions.
This command creates the Senzing project in your current users home path in a new directory named senzing.
python3 /opt/senzing/g2/python/G2CreateProject.py ~/senzing
$ python3 /opt/senzing/g2/python/G2CreateProject.py ~/senzing
Creating Senzing instance at: /home/username/senzing
Senzing version: 3.12.3 - (3.12.3-24323)
Successfully created.
To expedite getting started an embedded SQLite database is configured for use when creating a Senzing project. SQLite is easy to evaluate with, for production systems an enterprise level RDBMS such as Postgres would be used. For additional information see Technical - Database.
Configure Environment
To utilize your new project, environment variables need to be set indicating where to find resources for the project. The setupEnv script is project dependent and needs to be run whenever you are working with a project, for example between logging in and out of shell sessions. To setup the environment, change to your project directory and source the setupEnv
file.
cd <project_path>
source setupEnv
<project_path>
refers to the path specified on the G2CreateProject.py
command when creating a project.
Updating Database with Senzing Configuration
A Senzing instance is configured with a JSON document, on a fresh installation this document needs to be registered in the Senzing database. This step only needs to be performed once initially for a new project. From the root of your project directory, run the following command and enter y
when prompted:
python3 python/G2SetupConfig.py
Loading the Sample Truth Set Data
You can now load some sample demo data into Senzing using the G2Loader utility. G2Loader is a sample application for loading data that calls the Senzing APIs, the same APIs you would call when building your own applications or embedding Senzing into other systems or processes.
Add Data Source Codes
The three sample files to load represent three different data sources: customers, a watchlist, and reference data. Records loaded into Senzing have an identifier attribute called DATA_SOURCE
, this is an arbitrary value to describe and identify where source records originated from and is useful designation when analyzing and reporting on entities.
Each of the records in the three files to load use one of the DATA_SOURCE
codes: CUSTOMERS
, REFERENCE
or WATCHLIST
. Before data can be loaded using these values, they need to be added to the Senzing configuration. This only needs to be completed once for each DATA_SOURCE
value. The G2ConfigTool.py
utility performs this configuration change, to start G2ConfigTool.py
:
python3 python/G2ConfigTool.py
Once at the (g2cfg)
prompt enter the following commands:
addDataSource CUSTOMERS
addDataSource REFERENCE
addDataSource WATCHLIST
save
y
quit
$ python3 python/G2ConfigTool.py
Initializing Senzing engines...
Welcome to G2Config Tool. Type help or ? to list commands.
(g2cfg) addDataSource CUSTOMERS
Successfully added!
(g2cfg) addDataSource REFERENCE
Successfully added!
(g2cfg) addDataSource WATCHLIST
Successfully added!
(g2cfg) save
WARNING: This will immediately update the current configuration in the Senzing repository with the current configuration!
Are you certain you wish to proceed and save changes? (y/n) y
Configuration saved to Senzing repository.
Initializing Senzing engines...
(g2cfg) quit
Loading
With the data source codes added, load each file with the following commands:
python3 python/G2Loader.py -f python/demo/truth/customers.json
python3 python/G2Loader.py -f python/demo/truth/reference.json
python3 python/G2Loader.py -f python/demo/truth/watchlist.json
Senzing operates in real-time, as each record is loaded it completes the entity resolution process. The outcome is every record within and across each file has been entity resolved against all other data and the outcomes persisted in the Senzing database.
To learn more about the entity resolution process, check out these Senzing white papers.
Exploring Entity Resolution Outcomes
Loading data into Senzing completes the entity resolution processing which can now be reviewed, explored and evaluated with the Exploratory Data Analysis (EDA) tools. The EDA tools consist of:
G2Explorer.py
for understanding how and why entities are resolved and relatedG2Snapshot.py
for calculating reports to be viewed with G2ExplorerG2Audit.py
for comparing results between Senzing and other technologies or comparing Senzing results between configurations
To begin exploring the EDA tools, review Exploratory Data Analysis (EDA) tools. Once you have an overview of EDA tools and their functionality it is recommended to explore G2Explorer and G2Snapshot on the previously loaded truth set data.
The EDA tools articles outline loading the truth set data, this doesn’t need to be completed it was completed in the prior step.
G2Explorer
To get started with G2Explorer.py
, run the following command:
python3 python/G2Explorer.py
$ python3 python/G2Explorer.py
____| __ \ \
__| | | _ \ Senzing G2
| | | ___ \ Exploratory Data Analysis
_____| ____/ _/ _\
sucessfully loaded snapshottest.json
Type help or ? to list commands.
(g2)
The EDA tools have built in help!
(g2) help
Adhoc entity commands:
search - search for entities by name and/or other attributes.
get - get an entity by entity ID or record_id.
compare - place two or more entities side by side for easier comparison.
how - get a step by step replay of how an entity came together.
why - see why entities or records either did or did not resolve.
tree - see a tree view of an entity's relationships through 1 or 2 degrees.
export - export the json records for an entity for debugging or correcting and reloading.
Snapshot reports: (requires a json file created with G2Snapshot)
dataSourceSummary – shows how many duplicates were detected within each data source, as well as
the possible matches and relationships that were derived. For example, how many duplicate customers
there are, and are any of them related to each other.
crossSourceSummary – shows how many matches were made across data sources. For example, how many
employees are related to customers.
entitySizeBreakdown – shows how many entities of what size were created. For instance, some entities
are singletons, some might have connected 2 records, some 3, etc. This report is primarily used to
ensure there are no instances of over matching. For instance, it’s ok for an entity to have hundreds
of records as long as there are not too many different names, addresses, identifiers, etc.
Audit report: (requires a json file created with G2Audit)
auditSummary - shows the precision, recall and F1 scores with the ability to browse the entities that
were split or merged.
Other commands:
quickLook - show the number of records in the repository by data source without a snapshot.
load - load a snapshot or audit report json file.
score - show the scores of any two names, addresses, identifiers, or combination thereof.
set - various settings affecting how entities are displayed.
Senzing Knowledge Center: https://senzing.zendesk.com/hc/en-us
Senzing Support Request: https://senzing.zendesk.com/hc/en-us/requests/new
(g2) help get
Displays a particular entity by entity_id or by data_source and record_id.
Syntax:
get <entity_id> looks up an entity's resume by entity ID
get <dataSource> <recordID> looks up an entity's resume by data source and record ID
get search <search index> looks up an entity's resume by search index (requires a prior search)
get detail <entity_id> adding the "detail" tag displays each record rather than a summary by de
get features <entity_id> adding the "features" tag displays the entity features rather than the e
Notes:
Add the keyword ALL to display all the attributes of the entity if there are more than 50.
(g2)
get
The get command displays details for an entity, in this instance looked up by the data source code and record id:
get customers 1070
(g2) get customers 1070
Entity summary for entity 55: Jie Wang
┼───────────┼───────────────────────────────┼─────────────────┼
│ Record ID │ Entity Data │ Additional Data │
┼───────────┼───────────────────────────────┼─────────────────┼
│ CUSTOMERS │ PRIMARY: Wang Jie │ AMOUNT: 100 │
│ 1069 │ NATIVE: 王杰 │ AMOUNT: 200 │
│ 1070 │ DOB: 9/14/93 │ DATE: 1/26/18 │
│ │ GENDER: M │ DATE: 1/27/18 │
│ │ GENDER: Male │ STATUS: Active │
│ │ RECORD_TYPE: PERSON │ │
│ │ NATIONAL_ID: 832721 │ │
│ │ NATIONAL_ID: 832721 Hong Kong │ │
│ │ HOME: 12 Constitution Street │ │
┼───────────┼───────────────────────────────┼─────────────────┼
│ REFERENCE │ PRIMARY: Wang Jie │ CATEGORY: Owner │
│ 2013 │ DOB: 1993-09-14 │ STATUS: Current │
│ │ RECORD_TYPE: PERSON │ │
┼───────────┼───────────────────────────────┼─────────────────┼
└── Disclosed relationships (1)
└── OWNS 60% (1)
└── 91 CUSTOMERS (1) | REFERENCE (1) Hajah Mamunah Jln Pisang
(g2)
search
Perform a search for an entity:
search {"name_full": "robert smith", "date_of_birth": "11/12/1978"}
(g2) search {"name_full": "robert smith", "date_of_birth": "11/12/1978"}
Searching ...
Search Results
┼───────┼───────────┼──────────────┼──────────────────────┼─────────────────────────────┼─────────────┼──┼
│ Index │ Entity ID │ Entity Name │ Data Sources │ Match Key │ Match Score │ R│
┼───────┼───────────┼──────────────┼──────────────────────┼─────────────────────────────┼─────────────┼──┼
│ 1 │ 1 │ Robert Smith │ CUSTOMERS: 4 records │ NAME+DOB │ 200 │ 3│
│ │ │ │ │ Principle 180: SNAME_SSTAB │ │ │
┼───────┼───────────┼──────────────┼──────────────────────┼─────────────────────────────┼─────────────┼──┼
│ 2 │ 100003 │ Robert Smith │ WATCHLIST: 1008 │ NAME │ 100 │ 2│
│ │ │ │ │ Principle 206: CNAME │ │ │
┼───────┼───────────┼──────────────┼──────────────────────┼─────────────────────────────┼─────────────┼──┼
(g2)
You’ll learn about the JSON structure in the next section - Mapping and Loading Your Own Data.
Try out the other examples in the G2Explorer.p article and explore the commands and their options using help.
Mapping and Loading Your Own Data
Mapping
At this point you are ready to map and load your own data. Mapping is the process of converting your source data into a structure Senzing understands ready to load.
To learn more about mapping, the dictionary of terms and samples to help prepare your own data sources for loading and entity resolving review the Senzing Generic Entity Specification.
Consider these examples, in your data an attribute describing a personal full name is in a database table with the column name fullname
. In Senzing a full name is represented by the term NAME_FULL
. Similarly for address line 1, your database column is named addressline1
, in Senzing this is represented by the term ADDR_LINE1
.
Your task in mapping is to determine which attributes in your data source(s) are appropriate for use in entity resolution, extract those attributes and construct the structure describing those attributes to send to Senzing. The following is an example of a Senzing mapped JSON structure for an entry from a data source.
{
"DATA_SOURCE": "CUSTOMERS",
"RECORD_ID": "1001",
"RECORD_TYPE": "PERSON",
"PRIMARY_NAME_LAST": "Smith",
"PRIMARY_NAME_FIRST": "Robert",
"DATE_OF_BIRTH": "12/11/1978",
"ADDR_TYPE": "MAILING",
"ADDR_LINE1": "123 Main Street, Las Vegas NV 89132",
"PHONE_TYPE": "HOME",
"PHONE_NUMBER": "702-919-1300",
"EMAIL_ADDRESS": "bsmith@work.com",
}
Additionally, you can view the files for the sample truth set data under the /python/demo/truth
path in your project. Review the customers.json
, reference.json
, and watchlist.json
truth set files.
Loading
Once you have mapped your own data source(s) it’s time to load them. Before loading your own data, you’ll want to purge the Senzing database which contains the sample truth set data. Purging the Senzing database completely removes all previously loaded data and entity resolution outcomes, use with caution!
The G2Command utility is one method of purging the Senzing database, to start G2Command:
python3 python/G2Command.py
Once at the (g2cmd)
prompt enter the following commands:
purgeRepository
y
quit
$ python3 python/G2Command.py
Welcome to G2Command. Type help or ? for help.
(g2cmd) purgeRepository
********** WARNING **********
This will purge all currently loaded data from the senzing database!
Before proceeding, all instances of senzing (custom code, rest api, redoer, etc.) must be shut down.
********** WARNING **********
Are you sure you want to purge the senzing database? (y/n) y
Purging the Senzing database (and resetting resolver)...
(g2cmd) quit
$
Once at the (g2cfg)
prompt enter the following commands where datasourcecode
is the value you used for DATA_SOURCE
during mapping:
addDataSource datasourcecode
save
y
quit
$ python3 python/G2ConfigTool.py
Initializing Senzing engines...
Welcome to G2Config Tool. Type help or ? to list commands.
(g2cfg) addDataSource PROSPECT
Successfully added!
(g2cfg) save
WARNING: This will immediately update the current configuration in the Senzing repository with the current configuration!
Are you certain you wish to proceed and save changes? (y/n) y
Configuration saved to Senzing repository.
Initializing Senzing engines...
(g2cfg) quit
$
You are now ready to load your data, again using the G2Loader utility as previously used for loading the sample truth set data. For example, assume you have a file containing mapped data describing prospects, the following command would load the file:
python3 python/G2Loader.py -f prospects.json
Once loading completes, revisit using the EDA tools to explore and analyze the outcomes of entity resolution on your data.
Don’t forget you can reach out to support if you need any assistance with getting started with Senzing. Support is 100% FREE!