Senzing SDK AWS Quickstart Guide

Senzing is an Entity Resolution SDK that allows organizations to resolve and deduplicate records across multiple data sources. This AWS-specific deployment guide provides a structured approach for setting up Senzing using Amazon EC2 and Amazon RDS (PostgreSQL or SQLite). Senzing is designed to be source and destination agnostic, meaning customers define their own ingestion and consumption patterns based on their enterprise architecture.

Tip

Senzing provides 100k source records for ingestion and evaluation for free. If you require additional records for an evaluation, or any assistance when following this guide, please contact Senzing Support. Support is 100% FREE!

AWS Deployment Options

There are two options for deploying Senzing on AWS:

  • SQLite-based Deployment (<1M records): Excellent for testing and the quickest no-dependency setup
  • AWS PostgreSQL-based Deployment: Recommended for scalability and performance
Note

SQLite is the default database after following the initial steps 1 & 2. If you choose to use AWS PostgreSQL, continue following the OPTIONAL steps 3, 4, 5 & 6.

SQLite-based Deployment (<1M records)

Excellent for testing and the quickest no-dependency setup.

AWS PostgreSQL-based Deployment

Recommended for scalability and performance.

Security & Data Encryption

Senzing does not require root privileges and can be completely air-gapped and requires no connectivity beside to your database. Data security is managed by the customer’s AWS security policies. All data is stored in the customer’s controlled environment database.

Cost Considerations

Cost Modeling: Compute EC2, RDS usage, Storage, Networking Data transfer costs

  1. Use the AWS EC2 Pricing Calculator to estimate your AWS cost.

Prerequisites

Before deploying Senzing on AWS, ensure you have:

  • AWS Account with administrative privileges
  • Basic Linux command-line knowledge
  • Understanding of AWS IAM, EC2, and RDS
  • Security best practices such as IAM role restrictions and least privilege access

AWS Deployment Instructions - SQLite and PostgreSQL

Step 1: Launch EC2 Instance

  1. Open AWS ConsoleEC2Launch Instance
  2. Choose an AMI: Amazon Linux 2023+ or Ubuntu 22.04+
  3. Choose an instance type (minimum): t3.large (<1M records) or m5.2xlarge
  4. Configure Security Groups: Allow SSH from trusted IPs (22/tcp)
Note

For SQLite, instances with provisioned IOPS >40k or local NVME will dramatically increase performance

Step 2: Install Senzing SDK and Test

  1. SSH into your EC2 instance.
  2. Follow the Senzing SDK Linux Quickstart Guide.

OPTIONAL: Connect to AWS PostgreSQL

Step 3: Launch an Aurora PostgreSQL Cluster

  1. Go to AWS ConsoleRDSCreate Database
  2. Select Amazon Aurora
    • Engine: PostgreSQL
    • Instance: m5.2xlarge
    • Version: 14+ recommended
  3. Configure Networking & Security
    • Setup EC2 Connection
    • Allow EC2 instances to connect 5432/tcp
    • Allow inbound from your trusted Admin IP for management

Step 4: Configure EC2 to Connect to Aurora

  1. Retrieve the Aurora Cluster Endpoint:
aws rds describe-db-instances –query “DBInstances[*].Endpoint.Address”
  1. Allow EC2 instances to access Aurora in the security group:
    • Go to AWS ConsoleEC2Security Groups
    • Edit the RDS security group
    • Add an inbound rule allowing EC2 instances to access PostgreSQL Port 5432
  2. Test the database connection from EC2:
psql -h <AURORA_ENDPOINT> -U <DB_USER> -d <DB_NAME>

Step 5: Set Up Senzing Schema and Connect to PostgreSQL

Senzing provides a detailed PostgreSQL setup guide that works for any environment and covers:

  • Database schema creation
  • Configuring connection parameters
  • Running G2SetupConfig.py
  1. Follow the Senzing PostgreSQL Setup Guide.

Step 6: Apply Aurora Performance Optimizations

  1. Follow the Tuning Your Database Aurora PostgreSQL instructions.
Tip

Don’t forget you can reach out to support if you need any assistance with getting started with Senzing. Support is 100% FREE!