Senzing Globalization Guide

What Languages Does Senzing Support?

Senzing utilizes UTF-8 encoding which allows for most languages of the world to be properly captured and processed. Beyond ingesting and storing data, Senzing analytics go further – taking into consideration domain, culture, and cross-script differences for comprehensive global entity resolution. Senzing provides native support for cross-script comparisons across many languages and writing systems, with entity-centric learning capabilities that allow it to discover attribute variations (including script variations) even when attributes cannot be matched in their original forms.

Advanced Personal Name Comparisons

Supported Cultural Groups

Personal names present unique challenges in global entity resolution. Senzing leverages IBM's InfoSphere Global Name Management for culturally-aware name comparison. This world-class name library uses spelling patterns and country-of-association information to determine cultural provenance and optimize matching strategies.

Primary Cultural Groups

Southwest Asian

Culture Original Script Transliteration
Afghan افغان احمد Afghan Ahmad
Arabic محمد حسن الشمري Mohammed Hassan Al-Shamri
Farsi علی رضا Ali Reza
Pakistani محمد علی Muhammad Ali

European

Culture Original Script Transliteration
Anglo Standard Latin script
French François Müller Francois Mueller
German Björn Müller Bjorn Muller
Hispanic José García Jose Garcia

Han

Culture Original Script Transliteration
Chinese 王小明 Wang Xiaoming
Korean 김민수 Kim Min-su
Vietnamese Nguyễn Văn An Nguyen Van An

Additional Cultural Support

Culture Original Script Transliteration
Indian राम कुमार शर्मा Ram Kumar Sharma
Indonesian Standard Latin script with diacritics
Japanese さとう ひろし Satou Hiroshi
Polish Łukasz Kowalski Lukasz Kowalski
Portuguese João da Silva Joao da Silva
East Slavic Александр Петров Aleksandr Petrov (Ukrainian, Belarusian, Russian)
Turkish Mustafa Özkan Mustafa Ozkan
Yoruban Adébáyọ̀ Ọlátúndé Adebayo Olatunde
Generic (catch-all for other cultures)
Note

Japanese Kanji is not directly handled by Senzing and is treated as Chinese Hanzi when provided.

Organizational Names

Senzing provides robust same-script organizational name matching across many writing systems and languages.

Same-Script Organizational Name Matching Examples

Script Examples
Arabic Script الشركة السعودية للصناعات الأساسية ↔ الشركه السعوديه للصناعات الاساسيه
Cyrillic Script ООО “Газпром” ↔ Общество с ограниченной ответственностью “Газпром”
Latin Script with Diacritics Société Générale ↔ Societe Generale
Volkswagen Aktiengesellschaft ↔ Volkswagen AG
Japanese Script トヨタ自動車株式会社 ↔ トヨタ自動車
Korean Script 삼성전자주식회사 ↔ 삼성전자
Chinese Script 中国石油天然气集团公司 ↔ 中国石油天然气集团

CJK+English Cross-Script Matching (New in v4)

Senzing v4 introduces native cross-script matching between CJK (Chinese, Japanese, Korean) and English organizational names without requiring reference data.

CJK+English Cross-Script Matching Examples

CJK English
中國銀行股份有限公司 Bank of China
토요타 자동차 Toyota Motor Corporation
ソニー株式会社 Sony Corporation
阿里巴巴集团 Alibaba Group
삼성전자 Samsung Electronics

For other cross-script language combinations, robust matching of organizational names may still require reference data containing multiple versions of names, as there is no consistency in how organizations handle name translation across scripts. Some organizations represent names phonetically (transliterate), some translate (or translate parts), and some organizations rebrand when moving into new markets/scripts. For these scenarios, data providers or services that offer organizational name enrichment can be beneficial.

Enhanced Address Comparisons

Senzing provides cross-script matching capabilities for addresses. Starting in v4, native cross-script matching between CJK (Chinese, Japanese, Korean) and English addresses is supported without requiring reference data, representing a major improvement for global address resolution.

Address Matching Examples

CJK+English Cross-Script Matching (New in v4):

CJK English
710000陕西未央区西安凤城十路118 118, Fengcheng 10th Road, Xi’an, Weiyang District, Shaanxi 710000
〒540-0002大阪府大阪市中央区1-1 1-1 Chuo-ku, Osaka, Osaka 540-0002
上海市浦东新区陆家嘴环路1000号 1000 Lujiazui Ring Road, Pudong New Area, Shanghai

For other language combinations, addresses can be challenging for entity resolution as they tend to have many data quality issues. Senzing has capabilities to handle native scripts for addresses, with the most effective processing occurring in native-to-native (rather than native-to-Romanized) scenarios.

For cross-script address comparison in non-CJK languages, using an address hygiene product to Romanize addresses and providing both native script and Romanized versions to Senzing can improve matching accuracy.

Getting Started

Senzing provides comprehensive globalization capabilities out of the box. The breakthrough CJK+English cross-script matching capabilities introduced in v4 for both organizational names and addresses require no additional configuration - simply upgrade to v4 to benefit from these improvements.

For the rare cases requiring specific cultural tuning or advanced cross-script scenarios, contact Senzing Support for guidance on optimizing configurations for your specific use cases.

Info

If you have any questions, contact Senzing Support. Support is 100% FREE!