Impersonation in Data Engineering: No More Credentials in Your Code! — Marian Špilka

Learn how to eliminate credentials from your code using impersonation in cloud environments. Covers IAM, workload identity federation, and secure access patterns.

Key takeaways
  • Impersonation allows applications to securely access cloud resources without storing credentials in code by letting them act under different identities

  • Solution rests on four main pillars:

    • Identity and Access Management (IAM)
    • Application Default Credentials
    • Workload Identity Federation
    • Impersonation Feature
  • Key benefits:

    • No credentials stored in code
    • Code can be safely versioned in Git
    • Reduced service desk requests
    • Faster developer onboarding
    • Production remains secure
  • Implementation rules:

    • Production service account can only access production services
    • Create empty avatar service accounts for testing
    • Applications can run under developer identity locally
    • Use Docker volume sharing to transfer credentials
  • Security improvements:

    • Clear separation between test and production environments
    • No shared credentials
    • Automated access management during onboarding
    • Transparent access control
    • No need for credential rotation
  • Solution works well for:

    • Kubernetes deployments
    • Cloud-based applications
    • Data engineering pipelines
    • Multi-project environments
    • Teams handling sensitive data
  • Process flow:

    • Developer logs in locally
    • Application gets appropriate credentials automatically
    • Impersonation allows access to test services
    • Production remains isolated and secure