New libpostal Data Model From Senzing
By Brian Macy, published August 30, 2023
If you are a user of libpostal – the open-source international address parser – Senzing has great news for you! We’ve created a new improved data model for libpostal that is more accurate and up to date than the original model released in 2016. The new Senzing libpostal data model is available for free on GitHub and can be installed in minutes.
The Senzing team trained the new model on 40% more records than the original. We created 1.2 billion training records from addresses in OpenStreetMap (OSM) and OpenAddresses and addresses were from more than 230 countries in 100+ languages. We evaluated the records we extracted to filter out any badly formed addresses, correct misclassified address tokens, and remove tokens that didn’t belong in the addresses.
With the new model, libpostal users can expect better address parsing results and broader coverage!
Senzing libpostal Data Model Increases Accuracy
We tested the model on 12,950 addresses from 89 countries Test data was comprised of random addresses from OSM, a minimum of 50 addresses per country, and additional hard-to-parse addresses provided by the Senzing support team, Senzing customers and the libpostal GitHub page. Senzing team members compared the results in the new model and the original model and produced a statistical comparison. Accuracy improvements averaged more than 4% for all countries, but improvements in specific countries were as high as 87%.
Why did Senzing Update the libpostal Data Model?
Updating the data model required a significant investment. So why did Senzing commit the resources to this project? In short, we needed an updated model to ensure accurate address parsing in our entity resolution software. The Senzing® entity resolution API has been using libpostal to parse addresses since 2017. We initially selected it because of its superior capabilities. By 2022, the model hadn’t been updated and it wasn’t clear when, or if, it would be, so Senzing decided to take on the project of updating it.
The first version of the model was added to the Senzing API in April of 2023. In late May, we shared the first version of our model with the libpostal community. Moving forward, Senzing plans to create and publish a new open-source data model every 6 to 12 months as we continue to invest in and support the libpostal project.
We’ve also formed a libpostal group on LinkedIn. If you are a libpostal user, please join!
In Case You’re Wondering about Senzing Entity Resolution
The Senzing entity resolution API allows developers to quickly add advanced data matching and relationship discovery capabilities to applications and services. In addition to parsing addresses using libpostal and other methods, the Senzing API matches names, addresses, dates of birth and other attributes about people and organizations – and also detects relationships between them. You can try Senzing entity resolution for free, and be up and running in minutes, with no entity resolution expertise required.