We can't find the internet
Attempting to reconnect
Something went wrong!
Hang in there while we get back on track
Philip Blair - Dealing With Names and Addresses Around the World | PyData Amsterdam 2024
Learn how names and addresses vary across cultures and countries, with practical tips for handling international data formats in Python using libraries like PyPostal.
-
Names and addresses vary dramatically across cultures and countries - assuming Western/English conventions leads to poor user experiences
-
Avoid using regex or fixed validation patterns for international names/addresses since they break when dealing with different formats and scripts
-
Key challenges with names include:
- Variable ordering (first/last name order differs by culture)
- Patronymics and matronymics
- Multiple last names
- Single names with no surname
- Different writing scripts and transliteration
- Marriage name changes varying by country
-
For address handling:
- Street number and name order varies by country
- Postal code formats differ significantly
- House number formats vary
- City/state conventions change
-
Best practices:
- Use existing libraries like PyPostal for address parsing
- Avoid unnecessary parsing - only structure data that’s truly needed
- Store original unstructured data alongside parsed versions
- Consider fuzzy matching instead of exact matching
- Use additional disambiguating data when available (birthdates, etc)
-
Commercial solutions are recommended for complex name matching needs like:
- Know Your Customer (KYC) verification
- Border screening
- Healthcare record linkage
- International sanctions checking
-
Focus on user experience by:
- Only collecting name/address details actually needed
- Supporting international formats
- Avoiding assumptions about name structures
- Making input forms flexible enough for global users