Data Residency Is an Architecture Constraint, Not a Checkbox

The line in the regulation was one sentence. Personal data and transaction records of in-country users must be stored and processed within the country, and the keys that protect them must be held under domestic control. I have watched teams read a sentence like that, nod, and file it under compliance, as if it were a form to sign before launch. It is not a form. It is a constraint that reaches into your topology, your key management, your analytics, and your on-call rotation, and if you treat it as a checkbox you will find out the expensive way.

I led the platform for a payments business that moved billions in volume across a region where one large market had exactly this rule. We already ran a clean, centralized stack. One set of regions, one data lake, one analytics warehouse, one place where the data scientists worked. That centralization was most of why the thing was cheap and fast. The residency rule asked us to take a knife to it.

What the rule actually says, translated to architecture

A residency requirement is three claims wearing one coat. First, the data at rest lives inside the border. Second, the processing that touches raw records also happens inside the border, not just the storage. Third, the cryptographic keys are held domestically, which quietly means your central security team does not get to be the sole root of trust anymore. People hear the first claim and miss the other two. The third one is the one that breaks naive designs, because it severs the assumption that there is one global place where everything can be decrypted and joined.

So the question stops being “where do we put the database.” It becomes “what is the smallest set of things that must stay inside the border, and what is the largest set of things we can still run centrally without violating the rule.” Get that line wrong in either direction and you pay. Draw it too tight and you have rebuilt your entire platform once per country, which is a cost that compounds for the life of the business. Draw it too loose and you have a finding in an audit, a regulator conversation you do not want, and possibly a market you can no longer operate in.

The topology it forced

We landed on a hybrid that I would draw the same way today.

In-country data boundary: raw records and keys stay inside, only aggregates and tokens leave for central analytics

The boundary is the architecture. Raw records and keys never cross it. Only de-identified aggregates do.

Inside the border we ran a full in-country data plane: the transactional databases, the raw event store, the processing that operated on identifiable records, and a domestic key management service that the in-country deployment held. Authorization, settlement, anything that had to read a real card token or a real customer record, ran there and only there. That cluster was, for legal purposes, a small self-contained version of the whole company.

Outside the border, in our central region, ran everything that did not need to see a raw in-country record to do its job. Global routing config, the merchant catalog, the fraud model training that worked on features rather than raw PII, the company-wide dashboards. The connective tissue between the two was the part that took the real design work, because the rule is not “never share anything.” The rule is “raw identifiable records do not leave.” Aggregates and de-identified derivatives can.

Here is the part nobody tells you. The hard boundary is not the storage. It is the keys, and specifically the fact that residency turns key management into a topology decision rather than a security checkbox. Once the in-country deployment holds its own keys, your central plane physically cannot decrypt those records even if it pulled the bytes. That is the property you actually want, because it means a residency violation has to be an active mistake, not a default. You are not trusting a policy on a bucket. You are trusting math, and the absence of a key.

Cross-border analytics on aggregates only

The business still wanted one view of the world. Country managers wanted to compare markets. The fraud team wanted patterns that only show up across borders. The board wanted a single number. None of that requires raw records to cross the line, and the moment you accept that, the design opens up.

We ran the heavy aggregation inside the border, on the raw data, and let only the output leave. Country-level rollups. Cohort counts above a minimum group size so a row could never be traced back to a person. Model features that had already been stripped of the identifiers. The central warehouse stitched these together into the global picture and never once held an in-country customer record. The rule we enforced was blunt and I would keep it blunt: nothing crosses the boundary unless it has been aggregated or de-identified inside the boundary first. No exceptions for “just this one debug pull.” Every exception is the one the auditor finds.

The cost of this is real and you should say it out loud before you start. You give up the ability to do ad-hoc joins across all your raw data in one place. A data scientist who wants to correlate a raw behavior in one market with a raw behavior in another cannot, by design, and that is not a bug to be fixed later. It is the constraint. You move the analysis to where the data lives instead of moving the data to where the analyst sits, which feels backwards if you grew up on a central lake, and is correct.

The engineering cost of getting it wrong

I will tell you the failure mode because I have cleaned it up. A team treats residency as a deployment detail, ships the global architecture, and bolts a regional database on at the end with a replication job feeding the central lake “for analytics.” It works in the demo. It passes the first internal review because the diagram has a box labeled in-country. Then someone looks at what the replication job is actually copying, and it is raw records, leaving the border, into a lake the central team can read with central keys. Now you are not fixing a config. You are re-architecting a live payments system under a regulator’s deadline, which is the worst possible time to discover that decryption was centralized.

The other expensive mistake is the opposite one: panicking and rebuilding the entire platform per country because someone decided the safest reading is to keep everything in. That is how you end up operating five copies of your company, each drifting from the others, each with its own on-call, each a place a bug can hide. The skill is not maximizing what stays inside. It is finding the true minimum that must stay inside and being disciplined about everything else leaving as aggregates.

Two things made the hybrid survivable in the end, and both are unglamorous. The boundary had to be enforced in code, not in a slide. We made the central pipelines structurally unable to request raw in-country fields, so a residency violation would fail at build time rather than show up in an audit a year later. And the in-country deployment had to be a first-class part of the platform, deployed by the same automation as everything else, not a hand-managed pet that the central team forgot how to operate. A residency island that only one person understands is its own kind of outage waiting to happen.

Residency is sold to engineers as a legal problem, something the compliance team owns and hands you at the end. It is the other way around. The legal sentence is short. The architecture it implies is the largest single force shaping where your data lives, what your keys can do, and which queries are even possible. Decide the boundary first, on purpose, with the people who write the code in the room. Or have it decided for you later, by an auditor, on a schedule you do not control.