The Query is the Data
By Jeff Jonas, published March 11, 2019
A big revelation hit me the day a law enforcement officer explained the yellow stickies plastered around his desktop computer screen.
He said, “Every week, or at least once a month, I search for the people on these stickies.”
These subjects of interest included wanted criminals, missing kids, and so on.
With this process, he would periodically search system A for the name, date of birth and other identifying attributes on sticky number one. Then he would rekey the same information into systems B, then C, D, and so on. Once he searched all the systems, he moved on to sticky number two, then three, and so on.
I thought: You gotta be kidding me! This organization could’ve at least implemented stored queries.
NOTE: With stored queries, the information on the stickies is entered into a list, like a spreadsheet. Each row contains the name and identifiers from each sticky, so instead of 42 stickies there is a 42-row list (aka stored queries). Then every so often- daily, weekly, etc. the stored queries are automatically run against systems A, B, C, etc. This approach is not great, but it is still much faster than searching each system by hand!
But no. This investigator’s system didn’t have stored queries. So I stood there noodling his stickies:
This stickie needing to find system data.
This stickie info needing to find system data info.
This info needing to find that info.
It’s all info!
Where the heck in the course of systems design did we start thinking about queries and data so differently? It’s all data.
Idea: Why not store the queries in the same place we store the data? Like this:
The benefits are pretty clear:
- Real-time notification — If the data shows up after the query, the data finds the query. For example, the investigator would receive an instant notification that reads: “new information has arrived related to your missing kid!” (no more need for stickies).
- Queries find queries — That’s right, if two people ask similar questions, they find each other. Even if there was no “data.”
- No performance consequences — The file could contain 100% data, 100% queries or any ratio in between. In any case, the performance is the same.
Pure magic. Hence, why we have been building entity resolution systems which treat data and queries with equality… for decades.
Armed with this little thought for the day, my blog post ‘Data Finds Data’ may mean a bit more to you than it did before.