Senzing API Docker Quickstart Guide
Use Senzing with the example Docker images quickly. With the new example Docker images built on the senzingapi-runtime Linux package (available since Senzing 3.2), running and building your own Docker images is very simple.
Depending on the speed of your internet connection, this may only take you a few minutes. Here is a quick (informal) video
Senzing provides 100k source records for ingestion and evaluation for free. If you require additional records for an evaluation, or any assistance when following this guide, please contact support for free help!
Prerequisites
- A Docker environment for running Intel x86_64 containers
- To run the docker commands without
sudo
your user should be a member of the docker group- If your userid is not a member of the docker group and you use
sudo
to run the docker command, you will need to add--preserve-env
aftersudo
sudo --preserve-env docker run ...
- If your userid is not a member of the docker group and you use
- Access to dockerhub.com to pull images
- Senzing supports air-gapped deployments but this quickstart won’t cover pulling the images for offline use. Those running air-gapped deployments should be familiar with that already
- A PostgreSQL database set up and tuned, and the username, password, IP address, port, and database name needed to connect
Getting it Done!
Set up the Environment
The Senzing engine configuration used in the new Docker images is set via the SENZING_ENGINE_CONFIGURATION_JSON
environment variable. This is further described here. To set the environment variable run the following in your environment, replacing the CONNECTION
details with those of your database.
export SENZING_ENGINE_CONFIGURATION_JSON='{
"PIPELINE" : {
"CONFIGPATH" : "/etc/opt/senzing",
"RESOURCEPATH" : "/opt/senzing/g2/resources",
"SUPPORTPATH" : "/opt/senzing/data"
},
"SQL" : { "CONNECTION" : "postgresql://username:password@10.10.10.10:5432:G2" }
}'
If you have PostgreSQL installed on localhost
(127.0.0.1
), you need to use the docker network or external IP of your host and NOT 127.0.0.1
. From the perspective of the docker container, 127.0.0.1
is itself.
Initialize
It takes about 30 seconds to initialize the Senzing database using Docker. Once SENZING_ENGINE_CONFIGURATION_JSON is set, the setup is a single Docker command.
docker run --rm -it -e SENZING_ENGINE_CONFIGURATION_JSON senzing/init-postgresql mandatory
Senzing doesn’t require schema or configuration changes within the same major product version (e.g, any 3.x version), you don’t need to repeat this step again. “Upgrading” is merely running the new version of any container using this Senzing database.
DONE! Yes, REALLY!
At this point, the Senzing database is initialized. You can run any of the Senzing Docker images as well as any senzingapi-runtime based custom Docker images you create yourself by following this pattern. Yes, REALLY.
To utilize G2Explorer.py
, G2ConfigTool.py
, or G2Command.py
, run the senzingapi-tools
Docker image and execute those commands from that environment:
docker run --rm -it -e SENZING_ENGINE_CONFIGURATION_JSON senzing/senzingapi-tools
Or the demonstrable opensource web application:
docker run -it --rm -p 8251:8251 -e SENZING_ENGINE_CONFIGURATION_JSON senzing/web-app-demo
Start developing
Members of our team have made some GitHub projects that show more of what you can do quickly:
- API reference documentation
- 3 API Calls
- Task-based code-snippets
- Python: Streamlined SQS, RabbitMQ, and Redo processing examples
- Java: Streamlined RabbitMQ and Redo processing examples
Other stuff you can do
Loading the Truth Set Data
To get started with some data, load the Senzing example truth set by:
- Downloading the truth set files in Senzing JSON format
- Add the data sources that the files use to the Senzing configuration
- Use senzing/file-loader to load them via Docker
Download the files
wget https://raw.githubusercontent.com/Senzing/truth-sets/main/truthsets/demo/customers.json
wget https://raw.githubusercontent.com/Senzing/truth-sets/main/truthsets/demo/reference.json
wget https://raw.githubusercontent.com/Senzing/truth-sets/main/truthsets/demo/watchlist.json
Add the data source
$ docker run --rm -it -e SENZING_ENGINE_CONFIGURATION_JSON senzing/senzingapi-tools
# G2ConfigTool.py
Initializing Senzing engines...
Welcome to G2Config Tool. Type help or ? to list commands.
(g2cfg) addDataSource CUSTOMERS
Successfully added!
(g2cfg) addDataSource REFERENCE
Successfully added!
(g2cfg) addDataSource WATCHLIST
Successfully added!
(g2cfg) save
WARNING: This will immediately update the current configuration in the Senzing repository with the current configuration!
Are you certain you wish to proceed and save changes? (y/n) y
Configuration saved to Senzing repository.
Initializing Senzing engines...
(g2cfg) quit
# exit
exit
Load the Files
Keep in mind that the file path is from the perspective of the Docker container and this example script requires the location of the files ${PWD}
in this case) to be mapped into //data
inside the container.
docker run -it --rm -u $UID -v ${PWD}:/data -e SENZING_ENGINE_CONFIGURATION_JSON senzing/file-loader -f /data/customers.json
docker run -it --rm -u $UID -v ${PWD}:/data -e SENZING_ENGINE_CONFIGURATION_JSON senzing/file-loader -f /data/reference.json
docker run -it --rm -u $UID -v ${PWD}:/data -e SENZING_ENGINE_CONFIGURATION_JSON senzing/file-loader -f /data/watchlist.json
Explore the results
$ docker run --rm -it -e SENZING_ENGINE_CONFIGURATION_JSON senzing/senzingapi-tools
# G2Explorer.py
____| __ \ \
__| | | _ \ Senzing G2
| | | ___ \ Exploratory Data Analysis
_____| ____/ _/ _\
Type help or ? to list commands.
(g2) get CUSTOMERS 1070
Entity summary for entity 556800056: Jie Wang
┌───────────┬───────────────────────────────┬─────────────────┐
│ Record ID │ Entity Data │ Additional Data │
├───────────┼───────────────────────────────┼─────────────────┤
│ CUSTOMERS │ PRIMARY: Wang Jie │ AMOUNT: 100 │
│ 1069 │ NATIVE: 王杰 │ AMOUNT: 200 │
│ 1070 │ DOB: 9/14/93 │ DATE: 1/26/18 │
│ │ GENDER: M │ DATE: 1/27/18 │
│ │ GENDER: Male │ STATUS: Active │
│ │ RECORD_TYPE: PERSON │ │
│ │ NATIONAL_ID: 832721 │ │
│ │ NATIONAL_ID: 832721 Hong Kong │ │
│ │ HOME: 12 Constitution Street │ │
├───────────┼───────────────────────────────┼─────────────────┤
│ REFERENCE │ PRIMARY: Wang Jie │ CATEGORY: Owner │
│ 2013 │ DOB: 1993-09-14 │ STATUS: Current │
│ │ RECORD_TYPE: PERSON │ │
└───────────┴───────────────────────────────┴─────────────────┘
1 related entities
┌───────────┬──────────────────────────┬───────────────┬────────────────────┬────────────────────────┐
│ Entity ID │ Entity Name │ Data Sources │ Match Level │ Match Key │
├───────────┼──────────────────────────┼───────────────┼────────────────────┼────────────────────────┤
│ 91 │ Hajah Mamunah Jln Pisang │ CUSTOMERS (1) │ Disclosed Relation │ REL_POINTER(OWNS 60%:) │
│ │ │ REFERENCE (1) │ │ │
└───────────┴──────────────────────────┴───────────────┴────────────────────┴────────────────────────┘
Mapping Your Own Data
At this point you are ready to map and load your own data. Mapping is the process of converting your source data into a structure Senzing understands ready to load.
To learn more about mapping, the dictionary of terms and samples to help prepare your own data sources for loading and entity resolving review the Senzing Generic Entity Specification.
Consider these examples, in your data an attribute describing a personal full name is in a database table with the column name fullname
. In Senzing a full name is represented by the term NAME_FULL
. Similarly for address line 1, your database column is named addressline1
, in Senzing this is represented by the term ADDR_LINE1
.
Your task in mapping is to determine which attributes in your data source(s) are appropriate for use in entity resolution, extract those attributes and construct the structure describing those attributes to send to Senzing. The following is an example of a Senzing mapped JSON structure for an entry from a data source.
{
"DATA_SOURCE": "CUSTOMERS",
"RECORD_ID": "1001",
"RECORD_TYPE": "PERSON",
"PRIMARY_NAME_LAST": "Smith",
"PRIMARY_NAME_FIRST": "Robert",
"DATE_OF_BIRTH": "12/11/1978",
"ADDR_TYPE": "MAILING",
"ADDR_LINE1": "123 Main Street, Las Vegas NV 89132",
"PHONE_TYPE": "HOME",
"PHONE_NUMBER": "702-919-1300",
"EMAIL_ADDRESS": "bsmith@work.com",
}