Skip to main content
search

Fantasy Analytics

By Jeff Jonas, published July 24, 2019

Fantasy-analytics-business-observations-versus-observation-space

It often amazes me what people think is computable given their actual observation space.

Hereโ€™s an example conversation:

Me: โ€œTell me about your company.โ€

Customer: โ€œWe are in the business of moving things through supply chains.โ€

Me: โ€œWhat do you want to achieve with analytics?โ€

Customer: โ€œWe want to find bombs in the supply chain.โ€

Me: โ€œCOOL!โ€

Me: โ€œTell me about your available observation space.โ€

Customer: โ€œWe have information on the shipper and receiver. We also know the owner of the plane, train, truck, car, etc. and the people who operate these vehicles.โ€

Me: โ€œNice. What else do you have?โ€

Customer: โ€œWe have the manifest โ€” a claim made by the sender about the contents.โ€

Me: โ€œExcellent. What else?โ€

Customer: โ€œThatโ€™s it.โ€

Me: โ€œWHAT?!โ€

Me: โ€œYOU ARE NEVER GONNA FIND A BOMB!โ€

Me: โ€œNO ONE WRITES โ€˜BOMBโ€™ ON THE MANIFEST!โ€

The problem being; oftentimes business objectives (e.g., finding a bomb) are impossible to achieve given the proposed observation space (data sources).

Unless, in this case, the bad actor writes the word โ€œBOMBโ€ on the manifest. And only idiots do that. Luckily we donโ€™t have to worry much about people who truly donโ€™t know what theyโ€™re doing, as they run out of gas on the way to the operation.

When we software engineering folks get overly excited, and run off and build systems with little forethought about the balance between the mission objectives and the observation space, there is a risk the finished system will utterly fail on its business objectives.

As I have no interest in spending intense chunks of my life working on pointless projects, when initially scoping a system, I first qualify the available observation space to determine if it is sufficient to deliver on the mission objectives. If the available observation space is insufficient, then I must first figure out if/how the observation space can be appropriately widened.

Here are a few of my best practices:

How to Qualify Observation Spaces

  1. Have them name their data sources and the data elements (key features).
  2. Then, just because they say a data source has certain features, go look yourself โ€” I canโ€™t tell you how many times Iโ€™ve taken a look only to find key columns empty or so dirty that the value of this data is negligible.
  3. If the data sources share common features between them (e.g., customer number, address, email, phone number, etc.), then generally more is good.
  4. For those data sources that have few, if any, shared features (e.g., one data source has name and address and the other data source has stock symbol and stock price) then generally this is not good.
  5. Ask for real examples from the past โ€” things they would like to detect (opportunity or risk) โ€” and then look in the real data to see if, upon inspection, it is discoverable. If real examples from the past cannot be detected in the provided data sources, I tell the them โ€œnot even a sentient being could discover this.โ€

There will be many cases where it becomes necessary to help the customer think about widening their observation space if they want to make their hopes and dreams (business objectives) a reality.

Conjuring up additional data to expand the observation space is quite an art and requires real-world understanding of what and how data flows inside the walls and outside the walls, as well the legal and policy ramifications.

How to Widen Observation Spaces

  1. Generally one starts looking for new data sources in this order: (i) other stuff inside the walls that you already collect (e.g., product returns); (ii) collecting more data (e.g., adding a field to a web page so customers can score feedback); and (iii) external data (e.g., marketing flags like โ€œpresence of childrenโ€ and โ€œincome indicatorsโ€ as routinely sold by data aggregators).
  2. Beware of social media: there is allure to the idea that one can computationally associate social media (e.g., Tweets about your company/brand) to which customer said it. Easier said than done. Different kinds of social sites will yield different results.
  3. If you are trying to catch bad guys, hope that some of the data sources are unknown or non-intuitive to their adversary (if the bad guys know you have cameras on these four streets, then they will take the fifth street).
  4. Now letโ€™s say one has a list of potential new data sources. The next question is how to prioritize all of these possibilities. Again, there are a lot of ways to think about this โ€” but here are a few ways I think about this:
  • Data that improves the ability to perform more entity resolutions (e.g., a source that contains new identifiers like email addresses) so that one can discover that two customers are really the same;
  • Data that brings more facts (e.g., what, where, when, how many, how much);
  • Diverse data potentially containing identifiers and facts in disagreement (e.g., this fact indicates they are here, but that fact shows they were over there) โ€” helpful in finding lies like identity theft.

Finally, donโ€™t forget there will be plenty of times that the mission objectives cannot be achieved because the necessary observation space is not available.

Please consider the above โ€œhow toโ€ sections as starter kits โ€ฆ hacking them any which way you likeโ€ฆ

Close Menu