Proxy User IDs

Building services today often require integration with other web services. That means sharing your customer’s data with another company. Your users have never heard of this company before, yet they will hold their information. That sucks, and it can seem like a betrayal of your users’ trust!

Hopefully, you are a good steward of your user’s data and would avoid sharing their data if possible. I cannot stress this enough: as a software engineer in the 2020’s — the age of data breaches and data reselling — you have a moral imperative to safeguard your user’s data to the best of your ability. If a moral duty is not good enough and you are in a regulated field, you might have a legal duty too! Either way, a good rule of thumb is to not go blasting your user’s data all over the internet.

That being said, some features require sharing data. Oftentimes there is a tradeoff between user experience and privacy. Sometimes sharing data is unavoidable without significant user experience degradation. In this case, you might have to share data.

I have two pieces of advice for doing this safely and while respecting your users. First: be honest with your users. Second: mask their data from your service provider.

Be Honest

Google Cloud Platform emails me every time they change who they send my data to. Thanks, GDPR!

Be upfront and honest with your users about a) what types and b) with whom their data is being shared. Consider prompting the user in a modal before their first use of a feature that would share data. It’s like a permission popup for data sharing! Data sharing should be opt-in (not opt-out!). If they decline, that’s fine! Dependent features can be disabled for them.

The GDPR promulgated requirements for identifying “data subprocessors” for companies operating (or having users) in the European Union. It requires companies to disclose who they are sharing data with publicly! US users benefit from this EU regulation as well: large multinationals already have to do data bookkeeping for the EU, and we can see their disclosures.

In the US, we have regulations like HIPAA that apply only to particular use cases in healthcare. Under HIPAA, sharing PHI requires signing a Business Associate Agreement. This forms a “chain of trust” between a patient, their healthcare provider (covered entity), and the healthcare provider’s successive data subprocessors.

HIPAA requires signing a business associate agreement (BAA) with data processors before sharing PHI. This forms a recursive chain of trust for the data.

Finally, it’s worth noting that some states are making progress in legislating this area (like the California Consumer Privacy Act). Get your act together now before Congress forces you to do it against a deadline!

Minimize Data Shared & Use Proxy User IDs

I will state the obvious: share the least amount of data possible with the other service. Seriously, leave data fields blank if you can. Provide the minimum amount of data necessary and no more. Most people can reason this out on their own. However, providing the service only encrypted User IDs is important and non-obvious.

Every third-party service you share data with should get a different apparent user ID for the same user. This might seem like a circus to implement and consume, but it provides your users with robust anonymization and privacy guarantees.

If breached, Service A and Service B cannot correlate user information using User ID. It’s an excellent quality to have in the age of data breaches.

Encrypting User IDs doesn’t apply to just third-party services you push data to: you should also apply it to services that integrate with your service and read your APIs. For example, every OAuth application that calls your APIs should get different User ID values.

In addition to privacy benefits, we gain anti-abuse features too: a side effect of unique/per-app User IDs is that an API abuser cannot use multiple API integrations to consume more quota than they are allowed. For example, the Riot League of Legends API encrypts player IDs. This disincentivizes running numerous projects in parallel to get around per-API key rate limits.