New Senzing libpostal Data Model
Now There's A New, More Accurate & Up To Date libpostal Data Model from Senzing
If you are a user of libpostal โ the open-source international address parser โ Senzing has great news for you! Weโve created a new improved data model for libpostal that is more accurate and up to date than the original model released in 2016. The new Senzing libpostal data model is available for free on GitHub and can be installed in minutes.
The Senzing team trained the new libpostal data model on 40% more records than the original. We created 1.2 billion training records from addresses in OpenStreetMap (OSM) and OpenAddresses, with address data from more than 230 countries in 100+ languages. We then evaluated the records we extracted to filter out any badly formed addresses, correct misclassified address tokens, and remove tokens that didn’t belong in the addresses.
With our new and improved data model, libpostal users can expect better address parsing results and broader coverage!
Senzing libpostal Data Model Increases Accuracy
Our new libpostal data model is more accurate… But how do we know? Well, we tested it on 12,950 addresses from 89 countries. Test data was comprised of random addresses from OSM, with a minimum of 50 addresses per country, and additional hard-to-parse addresses were provided by the Senzing support team, Senzing customers and the libpostal GitHub page. Senzing team members then compared the results in the new libpostal data model and the original model, and produced a statistical comparison.
The results? Accuracy improvements averaged more than 4% for all countries, but improvements in specific countries were as high as 87%.
Why did Senzing Update the libpostal Data Model?
Updating the libpostal data model was no small feat, and required a significant investment. So why did Senzing commit time and resources to this project? In short, we needed an updated model to ensure accurate address parsing in our entity resolution software. The Senzingยฎ entity resolution API has been using libpostal to parse addresses since 2017. We initially selected it because of its superior capabilities. By 2022, the model hadnโt been updated and it wasnโt clear when, or if, it would be, so Senzing decided to take on the project of updating it ourselves.
The first version of the new libpostal data model was added to the Senzing API in April of 2023. In late May, we shared the first version of our model with the libpostal community. Moving forward, Senzing plans to create and publish a new open-source libpostal data model every 6 to 12 months as we continue to invest in and support the libpostal project.
Weโve also formed a libpostal group on LinkedIn. If you are a libpostal user, please join!
In Case Youโre Wondering about Senzing Entity Resolution
The Senzing entity resolution API allows developers to quickly add advanced data matching and relationship discovery capabilities to applications and services. In addition to parsing addresses using libpostal and other methods, the Senzing API matches names, addresses, dates of birth and other attributes about people and organizations โ and also detects relationships between them. You can try Senzing entity resolution for free, and be up and running in minutes, with no entity resolution expertise required.