Senzing SDK AWS Quickstart Guide
Senzing is an Entity Resolution SDK that allows organizations to resolve and deduplicate records across multiple data sources. This AWS-specific deployment guide provides a structured approach for setting up Senzing using Amazon EC2 and Amazon RDS (PostgreSQL or SQLite). Senzing is designed to be source and destination agnostic, meaning customers define their own ingestion and consumption patterns based on their enterprise architecture.
Senzing provides 100k source records for ingestion and evaluation for free. If you require additional records for an evaluation, or any assistance when following this guide, please contact Senzing Support. Support is 100% FREE!
AWS Deployment Options
There are two options for deploying Senzing on AWS:
- SQLite-based Deployment (<1M records): Excellent for testing and the quickest no-dependency setup
- AWS PostgreSQL-based Deployment: Recommended for scalability and performance
SQLite is the default database after following the initial steps 1 & 2. If you choose to use AWS PostgreSQL, continue following the OPTIONAL steps 3, 4, 5 & 6.
SQLite-based Deployment (<1M records)
Excellent for testing and the quickest no-dependency setup.
AWS PostgreSQL-based Deployment
Recommended for scalability and performance.
Security & Data Encryption
Senzing does not require root privileges and can be completely air-gapped and requires no connectivity beside to your database. Data security is managed by the customer’s AWS security policies. All data is stored in the customer’s controlled environment database.
Cost Considerations
Cost Modeling: Compute EC2, RDS usage, Storage, Networking Data transfer costs
- Use the AWS EC2 Pricing Calculator to estimate your AWS cost.
Prerequisites
Before deploying Senzing on AWS, ensure you have:
- AWS Account with administrative privileges
- Basic Linux command-line knowledge
- Understanding of AWS IAM, EC2, and RDS
- Security best practices such as IAM role restrictions and least privilege access
AWS Deployment Instructions - SQLite and PostgreSQL
Step 1: Launch EC2 Instance
- Open
AWS Console
→EC2
→Launch Instance
- Choose an AMI:
Amazon Linux 2023+
orUbuntu 22.04+
- Choose an instance type (minimum):
t3.large
(<1M records) orm5.2xlarge
- Configure Security Groups: Allow SSH from trusted IPs (22/tcp)
For SQLite, instances with provisioned IOPS >40k or local NVME will dramatically increase performance
Step 2: Install Senzing SDK and Test
- SSH into your EC2 instance.
- Follow the Senzing SDK Linux Quickstart Guide.
OPTIONAL: Connect to AWS PostgreSQL
Step 3: Launch an Aurora PostgreSQL Cluster
- Go to
AWS Console
→RDS
→Create Database
- Select
Amazon Aurora
- Engine:
PostgreSQL
- Instance:
m5.2xlarge
- Version:
14+ recommended
- Engine:
- Configure Networking & Security
- Setup EC2 Connection
- Allow EC2 instances to connect
5432/tcp
- Allow inbound from your trusted Admin IP for management
Step 4: Configure EC2 to Connect to Aurora
- Retrieve the Aurora Cluster Endpoint:
aws rds describe-db-instances –query “DBInstances[*].Endpoint.Address”
- Allow EC2 instances to access Aurora in the security group:
- Go to
AWS Console
→EC2
→Security Groups
- Edit the RDS security group
- Add an inbound rule allowing EC2 instances to access PostgreSQL
Port 5432
- Go to
- Test the database connection from EC2:
psql -h <AURORA_ENDPOINT> -U <DB_USER> -d <DB_NAME>
Step 5: Set Up Senzing Schema and Connect to PostgreSQL
Senzing provides a detailed PostgreSQL setup guide that works for any environment and covers:
- Database schema creation
- Configuring connection parameters
- Running
G2SetupConfig.py
- Follow the Senzing PostgreSQL Setup Guide.
Step 6: Apply Aurora Performance Optimizations
- Follow the Tuning Your Database Aurora PostgreSQL instructions.
Don’t forget you can reach out to support if you need any assistance with getting started with Senzing. Support is 100% FREE!