Structuring Unstructured Data

By Jeff Jonas, published February 20, 2018

When asked about unstructured data this is all I have to say:

“Unstructured data is only useful if structure can be extracted from it.”

Let me explain: A picture taken in pitch black without a flash is useless as it contains no discernible features. The mobile phone call that suddenly goes bonkers and becomes all garbled is equally useless as there is no way to extract meaning from the noise.

On the other hand, a parking garage video has the potential to be much more useful because license plate reading software can extract plate numbers.

The principle that observations are only useful if features can be extracted from them has helped me simplify system architectures:

Observe -> Feature Extract -> Contextualize -> Decide -> Act

When an observation arrives pre-structured e.g., a database transaction, the Feature Extract step is skipped. Because all inputs to Contextualizing are structured, Contextualization processing can be streamlined — indifferent to the nature of the original observation (structured or unstructured).

Some common feature extraction algorithms you may have heard of:

Optical character recognition e.g., converting a picture of words into a text document
Object recognition e.g., detecting pictures of cats
Facial recognition e.g., unlocking the iPhone 10 without a password
Acoustic fingerprinting e.g., detecting an artist/song based on a small audio sample
Named entity recognition e.g., suggesting a new contact based on an email’s contents

Unfortunately, commercially available feature extraction technology has a long way to go. The error rates are often just too high. As a consequence, downstream processes (e.g., entity resolution) become the victim. Technology breakthroughs in the field of unstructured feature extraction is much needed. I keep waiting — come on already.

FOLLOW SENZING ON LINKEDIN

Tags:

Structuring Unstructured Data

Tags:

Product

Capabilities

Use Cases

Partners

Resources

Company

Structuring Unstructured Data

Tags:

Related Posts

Entity Resolution as Mission Critical for Government and Security

Entity Resolution for SNAP Investigations

Put a Senzing Expert in Your Toolbox

When AI Makes the Right Decision About the Wrong Thing

Product

Capabilities

Use Cases

Partners

Resources

Company