Philip Blair - Dealing With Names and Addresses Around the World | PyData Amsterdam 2024

Philip Blair

Learn how names and addresses vary across cultures and countries, with practical tips for handling international data formats in Python using libraries like PyPostal.

Key takeaways
  • Names and addresses vary dramatically across cultures and countries - assuming Western/English conventions leads to poor user experiences

  • Avoid using regex or fixed validation patterns for international names/addresses since they break when dealing with different formats and scripts

  • Key challenges with names include:

    • Variable ordering (first/last name order differs by culture)
    • Patronymics and matronymics
    • Multiple last names
    • Single names with no surname
    • Different writing scripts and transliteration
    • Marriage name changes varying by country
  • For address handling:

    • Street number and name order varies by country
    • Postal code formats differ significantly
    • House number formats vary
    • City/state conventions change
  • Best practices:

    • Use existing libraries like PyPostal for address parsing
    • Avoid unnecessary parsing - only structure data that’s truly needed
    • Store original unstructured data alongside parsed versions
    • Consider fuzzy matching instead of exact matching
    • Use additional disambiguating data when available (birthdates, etc)
  • Commercial solutions are recommended for complex name matching needs like:

    • Know Your Customer (KYC) verification
    • Border screening
    • Healthcare record linkage
    • International sanctions checking
  • Focus on user experience by:

    • Only collecting name/address details actually needed
    • Supporting international formats
    • Avoiding assumptions about name structures
    • Making input forms flexible enough for global users