Skip to main content
search

Structuring Unstructured Data

Structuring-unstructured-data-blog-by-jeff-jonas-senzing-800x480

When asked about unstructured data this is all I have to say:

“Unstructured data is only useful if structure can be extracted from it.”

Let me explain: A picture taken in pitch black without a flash is useless as it contains no discernible features. The mobile phone call that suddenly goes bonkers and becomes all garbled is equally useless as there is no way to extract meaning from the noise.

On the other hand, a parking garage video has the potential to be much more useful because license plate reading software can extract plate numbers.

The principle that observations are only useful if features can be extracted from them has helped me simplify system architectures:

Observe -> Feature Extract -> Contextualize -> Decide -> Act

When an observation arrives pre-structured e.g., a database transaction, the Feature Extract step is skipped. Because all inputs to Contextualizing are structured, Contextualization processing can be streamlined — indifferent to the nature of the original observation (structured or unstructured).

Some common feature extraction algorithms you may have heard of:

Unfortunately, commercially available feature extraction technology has a long way to go. The error rates are often just too high. As a consequence, downstream processes (e.g., entity resolution) become the victim. Technology breakthroughs in the field of unstructured feature extraction is much needed. I keep waiting — come on already.

Close Menu