Philip Blair - Dealing With Names and Addresses Around the World | PyData Amsterdam 2024

Learn how names and addresses vary across cultures and countries, with practical tips for handling international data formats in Python using libraries like PyPostal.

Key takeaways
  • Names and addresses vary dramatically across cultures and countries - assuming Western/English conventions leads to poor user experiences

  • Avoid using regex or fixed validation patterns for international names/addresses since they break when dealing with different formats and scripts

  • Key challenges with names include:

    • Variable ordering (first/last name order differs by culture)
    • Patronymics and matronymics
    • Multiple last names
    • Single names with no surname
    • Different writing scripts and transliteration
    • Marriage name changes varying by country
  • For address handling:

    • Street number and name order varies by country
    • Postal code formats differ significantly
    • House number formats vary
    • City/state conventions change
  • Best practices:

    • Use existing libraries like PyPostal for address parsing
    • Avoid unnecessary parsing - only structure data that’s truly needed
    • Store original unstructured data alongside parsed versions
    • Consider fuzzy matching instead of exact matching
    • Use additional disambiguating data when available (birthdates, etc)
  • Commercial solutions are recommended for complex name matching needs like:

    • Know Your Customer (KYC) verification
    • Border screening
    • Healthcare record linkage
    • International sanctions checking
  • Focus on user experience by:

    • Only collecting name/address details actually needed
    • Supporting international formats
    • Avoiding assumptions about name structures
    • Making input forms flexible enough for global users