Database Tuning

This article is for general guidance. Always work with your DBA when working on a production database. If you have any questions, contact Senzing Support . Support is 100% FREE!

This page addresses perceived poor performance or a warning from Senzing that the entity repository database is performing slowly. The Senzing check_repository_performance() function tests how many auto-commit inserts can be performed on the Senzing entity repository within a few seconds. The faster this completes, the higher scalability the system can achieve. This is not the only performance area impacting Senzing but is the primary one outside of data mapping.

Database performance with Senzing is highly related to latency. Issues with performance, in order, tend to focus around:

  1. Disk IO performance of the database server
  2. Lack of database tuning for an auto-commit OLTP workload
  3. Network bottlenecks preventing high-speed communication between the Senzing SDK and the database
  4. Latency between the database server and non-direct attached storage subsystems
The database parameters outlined here are the minimum required and not an exhaustive list for optimal database performance. Senzing Support can assist with further recommendations, but always include the Database Administrator in monitoring and tuning the underlying Senzing database.
Production systems run as fast as low 10s of milliseconds for searches and mid 10s of milliseconds for loads all while running 100s or 1000s of operations in parallel. Performance depends on data, config, architecture, and hardware. Contact Senzing Support for help optimizing a specific setup.

Auto-commit tuning

For each database system the following parameters should be set for the Senzing auto-commit workload.

SQLite

The Senzing engine automatically sets a couple of SQLite pragmas suited to the Senzing workload. If the database is small and there is RAM to spare, consider using tmpfs to improve performance.

The pragmas set for SQLite:

  • synchronous = 0
  • secure_delete = 0
  • journal_mode = WAL
  • journal_size_limit = 1000000

PostgreSQL

synchronous_commit=off
wal_writer_delay=1000
enable_seqscan=off

Aurora PostgreSQL

Ensure that the ec2 instances running the Senzing SDK are in the same AZ (availability zone), VPC, and subnet as the Aurora PostgreSQL server. This will greatly reduce latency and increase performance.

If an Aurora PostgreSQL read replica is in another AZ, all commits will be synchronous. Either disable replicas for the large historical/initial loads or ensure the replicas are in the same AZ.

Aurora PostgreSQL Server-less v2 is showing significantly degraded performance compared to v1.

The following settings are per cluster database group:

autovacuum_max_workers: 5
enable_seqscan: 0
pglogical.synchronous_commit: 0
synchronous_commit: "off"

Microsoft SQL Server and Azure Hyperscale SQL

On Azure, ensure the Hyperscale database is in the same proximity zone as applications and services that have integrated the Senzing SDK.

For SQL Server 2019 and later, ensure the database is set to a UTF-8 character set. If the database is not in UTF-8, it is critical that the odbc.ini set AutoTranslate = No for the configuration so data is not corrupted.

ALTER DATABASE SET DELAYED_DURABILITY = Forced;
ALTER DATABASE SET AUTO_CREATE_STATISTICS ON;
ALTER DATABASE SET AUTO_UPDATE_STATISTICS_ASYNC ON;

SQL Server also has some other special case items:

  1. It does not update the query plans automatically so the DBA will need to be involved to update statistics and flush the procedure cache.
  2. On RedHat, unixODBC is not built with --enable-fastvalidate which severely limits scaling per process. Microsoft explains the issue in their ODBC Linux/Mac programming guidelines . Ubuntu and Debian ship with this flag enabled.
  3. On Ubuntu, Microsoft supplies a version of unixODBC that does NOT contain --enable-fastvalidate for the build. This creates a 10x performance problem. You must override its versions by downgrading to the Ubuntu-built versions. Currently, that is:
sudo apt install libodbc1=2.3.6-0.1build1 unixodbc=2.3.6-0.1build1

MySQL

Aurora MySQL v3 does not support turning off flush on commit so it will not perform well with Senzing.
innodb_flush_log_at_trx_commit = 0
innodb_flush_method = O_DIRECT
innodb_file_per_table = 1
innodb_doublewrite=0
innodb_flush_neighbors=0
skip-log-bin

On larger Senzing systems running a high number of threads, errors that additional prepared statements cannot be created may occur:

42000 Can't create more than max_prepared_stmt_count statements (current value: 16382)

On larger systems with a higher than default number of threads, or when an error similar to the above occurs, increase the number of prepared statements for the database. For example, on MySQL by increasing max_prepared_stmt_count:

max_prepared_stmt_count = 100000

100,000 is a suggested starting value. The optimal setting depends on thread count and system speed:

max_prepared_stmt_count = (2000 x threads) + 500

For example, 50 threads would need (2000 x 50) + 500 = 100,500.

Make sure that the transaction isolation level is set to READ COMMITTED.

Network bottlenecks

Networking used to be very straightforward, easy to monitor and manage. There was the exceptional case of devices hitting max packets per second or driver/OS/PCI bottlenecks but those were rare.

The popularity of cloud environments extracts much of the network topology from the user. An application and database could be on different sides of a data center (or continent) or even communicating through very limited virtual systems when they are co-located. Cloud networking is often a black box and the hardest component to troubleshoot.

We have seen systems scale from 30 records per second to 1000 records per second simply by switching the type of underlying cloud network fabric:

  1. Co-locate systems as close as possible when provisioning
  2. If there are options between virtual and physical switches, test with both options
  3. Ask about both bandwidth and packets per second limits

If you have any questions, contact Senzing Support. Support is 100% FREE!