discovery-without-disclosure-importance-of-selective-field-hashing-entity-resolution-software-jeff-jonas-blog
By Jeff Jonas, published February 19, 2019

Let’s say I consider my personal address book private — which is true. Let’s say that you consider your address book private too. Then riddle me this: If we agree to find all the people we know in common, and only this subset of people, how can we accomplish this without revealing everyone else?

Following this line of thinking…

  • If two banks are planning to merge, they will want to know how many customers they currently have in common. Keep in mind that customer data is usually messy and requires a technique called fuzzy matching e.g., recognizing Bill and William functionally the same name. How can the two banks accurately determine customer overlap, without exposing their customer data unnecessarily? Answer: Discovery without disclosure.
  • A law enforcement team is secretly planning a drug bust where undercover officers will pose as drug dealers. A second law enforcement team is secretly planning a drug bust where undercover officers will pose as drug buyers. Since neither team’s secret can be revealed to the other, without potentially risking leaks, how can they safely prevent both operations from going down on the same night at the same address? Answer: Discovery without disclosure.

So exactly how do you achieve discovery without disclosure?

  1. The parties that need to share data begin by cryptographically hashing their data’s identity attributes, e.g., names, addresses, IDs, phones, etc.
  2. The hashed data is sent to a neutral (ideally trusted) third party who can rehash the hashed data again for extra protection.
  3. The third party uses Senzing software on the double-hashed data to identify who is who and who is related to who e.g., revealing common customers or a shared location of a drug operation.
  4. Pointers to specific records sharing a nexus are revealed by the third party e.g., the first law enforcement team will be informed its file #43 relates to the second team’s file #A1707.
  5. Both groups check those files and unilaterally decide if it is legal, and within policy, to have a more revealing discussion. If not, nothing more is learned.

If a solution like the one described above had been in place, it would have averted the hilarious Keystone Cops incident featured in UPI’s November 15, 2017 story entitled “Undercover Detroit police attempt to arrest each other in ‘embarrassing’ drug bust.”

Selective field hashing is one of the important Privacy by Design (PbD) features built into our Senzing® real-time AI for entity resolution. The Senzing team has been delivering selective field hashing for well over a decade.

One of my favorite examples of selective field hashing in action is its 2012 deployment by the Electronic Registration Information Center (ERIC), an organization helping to modernize voter registration for half of America. ERIC wanted to use hashing to reduce the risk of unintended disclosure — to prevent voter data from being revealed if someone sniffs the communication pipes of the network, or steals the entity-resolved database. For more about ERIC and Senzing, watch the introductory video, read the technically-oriented security FAQ, or read the New York times article “Another Use for A.I.: Finding Million of Unregistered Voters.”

Discovery without disclosure is a capability unique in our Senzing software, because we baked-in selective field hashing. And we’re damn proud of it!