Suppose you work in a hospital.Suppose you are an employee at a hospital. Data that has been collected over the years, which you may have, that you might be able to use to train a machine learning model that could detect cancer earlier than any current diagnostic test. However, it is not possible to deliver these data without compromising Patient privacy, GDPR and patient trust – not with a Research Lab, not even with a cloud server.
This is not a question of IF – this is a question of WHEN. Hospitals, banks, governments are facing it, and so are they. That’s why Privacy-Enhancing Technologies (PETs) have become from papers to real systems.
This isn’t a cryptography tutorial on how to encrypt the simplest text messages. A first floor view with an honest, realistic account of what PETs can actually achieve now, where they still fall short, and what will be delivered in the next 2-3 years. If you’re any of the above, or simply someone with careful thoughts on how AI systems are constructed, this breakdown will be useful to you all.
Table of Contents
Privacy-Enhancing Technologies (PETs): Beyond Encryption – The Mental Model You Actually Need
For most, the term privacy tech evokes encrypted database, strong passwords, and perhaps a VPN. But PETs are a category of a different kind.
They concern not so much with the preservation of data in the absence of movement or transition from one location to another. They’re to do with protecting data when it’s actively used, on, run on, queried, trained with. That’s where classical encryption cannot be used.
A helpful three-layer way to think about it, based on the frameworks set forth by the OECD and industry groups listed below:
- Ensure that data are protected at rest / in transit: Standard encryption / TLS / key management – basically addressed.
- Protect data in use: PETs at work (Federated Learning (FL), Differential Privacy (DP), Secure MPC, Homomorphic Encryption (HE), Secure Trusted Execution Environments (TEE).)
- Allow for control release – DP-protected reports, synthetic data, aggregation-only APIs – useful outputs that do not share out individual records.
The PETs are not simply one technology. They’re a family of methods, with each method appropriate to various threat models, types of data and performance requirements. The error that teams can make is to use them interchangeably.
What’s Already Running in Production (2025–2026 Reality Check)
A lot of discussion on “PETs – the future of privacy.A lot of chatter around ‘PETs – the future of privacy’. A couple of them, however, are already in use, under the radar, in systems, that you likely use every day.
Differential Privacy – Accuracy vs Privacy, Tuned Daily
requirements compromise on accuracy and are their accuracy fair in terms of privacy?
If you’ve ever interacted with the Google telemetry report or Apple’s iOS usage analytics you have met Differential Privacy. In DP, mathematically introduced noise into the data output so that no single individual’s data record can be gleaned from the output, even from a person with access to external data sets.
It is used by the national statistical offices for releases of censuses. It is also available for companies in the tech sector for behavioral analytics. The (epsilon) ε parameter in the core knob regulates the strength of privacy and the level of noise in the result given by the voting.Lower numbers of ε result in stronger v otive privacy and higher noise to the result given by the voting. I have seen teams set an epsilon that for some notion on the DP definition would be adhered to, but the resulting statistics were so degraded as to be of no operational use. Tuning matters enormously.
DP is mature. Not perfect (with repeated queries and misconfigured pipelines there are real risks!), but is the most production-ready PET for analytics and telemetry outputs.
Federated Learning – Training Without Moving Data
To teach your keyboard to guess your phrase without Google even seeing your every keystroke is the idea of Federated Learning. The model trains are local on your device, all gradient updates are not – but only – data changes (not models) leave your device. Increasingly, FL & MPC are being combined in healthcare and financial pilots.
FL is being used in mobile keyboards and recommendation systems at hospital networks as well as healthcare research pilot initiatives. But most introductory articles miss one crucial point: FL doesn’t always ensure privacy. Training data can leak to gradients by inference attacks. The standard production fix is to recode the production with both FL and DP (clipping and noise) and then capture the production together with secure aggregation to make sure it is summed data and not individual updates that the server sees.
The experience I gained demonstrated that even well-designed FL systems can be susceptible to poisoning attacks if the nodes participating in the FL are not properly authenticated. The structure is good, the deployment details are important.
Trusted Execution Environments – Hardware as the Last Line of Trust
Today all the major cloud vendors (AWS, Azure, Google Cloud) support confidential computing instances from the trusted execution environment (TEE): Intel SGX, AMD SEV, ARM TrustZone. They are enclaves created in the hardware and part of a memory region inaccessible to the cloud operator.
This is utilized in sensitive applications such as inference processes for AI, medical analysis of data, and financial fraud detection where the workload is performing computation on third-party infrastructure, but the infrastructure provider cannot be trusted completely.
The downside: TEEs have been the target of side-channel attacks, such as the Spectre, Meltdown and more specific enclave attacks, AEPIC Leak. One needs to actively watch vendor patches. Unlike a “set and forget” device, TEEs are a powerful control and one that must be maintained to mitigate threats.
Synthetic Data and Secure Aggregation — Already Mainstream
Synthetic data generation has become popular to share external test data, to train models when real data is unsuitable to leave a jurisdiction, and for regulatory compliance testing.
More limited types of MPCs have been used for secure aggregation in privacy preserving ad measurement (Apple’s Private Click Measurement, Google’s Privacy Sandbox), cross bank fraud prevention, and federated analytics platforms.
Quick Reference: PET Landscape at a Glance
| PET Type | Best For | Key Limitation | Maturity |
|---|---|---|---|
| Differential Privacy | Telemetry, analytics releases | Accuracy trade-off with low ε | Production |
| Federated Learning | On-device / silo training | Gradient leakage risk | Production |
| Homomorphic Encryption | Encrypted computation | High compute cost | Emerging |
| Secure MPC | Multi-party analytics | Communication overhead | Narrow production |
| TEEs / Confidential Compute | Cloud-sensitive workloads | Side-channel attacks | Production |
| Synthetic Data | External data sharing | Can preserve bias | Production |
What’s Just Beginning – The Next 2–3 Years
At this point, things get really interesting and really unpredictable.
General-Purpose Homomorphic Encryption — The Holy Grail Getting Closer
Randy, thanks to you, I can see how to improve general-purpose homomorphic public key encryption.Thanks to you, I can see ways to improve general-purpose homomorphic public key cryptography, the Holy Grails.
The PET with an esoteric name: Homomorphic Encryption: arbitrary computations can be performed directly on the encrypted data, and the result is the same as if the computations were performed on plaintext. No one interconnected with the computation is aware of any real numbers. If you dig into the details of Homomorphic Encryption it may be clear why it is still largely the story of research to production in 2025.
The difficulty: HE is expensive, given by the computer. Milliseconds of operations can be into seconds or minutes on cipher text. Today, deployments are more practical: restricted static encryption of operations in a specific pipeline instead of workloads as a whole, but also mixed with TEEs and MPC to hide away the costly HE operations within the most sensitive part of the pipeline.
The top 3 open source HE libraries are Microsoft SEAL, OpenFHE, and TFHE-rs. The years of general purpose HE for real-time AI inference are a few years off, but with hardware acceleration, Mario is getting better from year to year.
Composed PET Stacks — FL + DP + TEE + Synthetic Data Working Together
This new architecture pattern is not “pick one PET”—it’s having several PETs that are stacked together to suit the threat model. A healthcare AI pipeline could involve data remaining within healthcare silos (FL architecture), DP-clipping gradients before they exit each silo, aggregating within a TEE on cloud server, and output model results being validated by synthetic data before their release beyond the healthcare environment.
Such a deliberate arrangement is currently being structured in Health Care networks, financial fraud rings, and across-border data analytics initiatives. The engineering difficulty is complex and heavy — complexity of each PET, attestation requirements of each PET, debugging load of each PET. The governance story is much purer, however – you can show at each layer what privacy guarantee it affords.
PETs for IoT and Edge Computing
However, IoT does bring constraints not addressed by traditional PET frameworks, such as intermittent connectivity, limited/embedded compute resources on the edge devices, and mobility across network topologies. Papers for 2024-2025 are thus benchmarking FL frameworks and DP implementations on the edge.
This is an area to keep an eye on! The number of Lightweight PETs for healthcare wearables, industrial sensors and smart cities will be on the rise in the coming years, necessitating lots of development.
PETs as Data Sovereignty Infrastructure
One of the least talked about applications of PETs is related to data sovereignty, whereby organisations can engage in global cloud analytics without losing control over data which cannot be moved out of a jurisdiction, to which cryptographic guarantees will be helpful. For instance, EU based business entities, having data processed through US cloud-based infrastructure, are increasingly adopting TEEs and MPC for demonstrating that data continues to be covered by EU law when processed outside of its borders.
The Real Challenges – Not Just Technical Ones
All the PET discussions seem to take place about the cryptography, as I saw during mapping of real problems. The tougher problems are generally organizational.
The Utility-Privacy Trade-off Is Fundamental, Not a Bug
Increased privacy protection generally comes at a cost of decreased accuracy or decreased usefulness of the data. The downside of DP with a very low epsilon is that the results are generally assumptions with high certainties that can lead to a noisy data set that could result in inaccurate conclusions. FL models trained from miniaturized disconnected silo data sets can perform much worse than centrally trained models.
There is no fat grips on the evening train. Calibrating “how much privacy is enough?” is not a technical question but one that could be answered by lawyers, organisations and ethics. Sometimes, teams misunderstand epsilon selection, and do it just from the technical perspective.
Attacks on PETs Themselves
- Gradient inversion attacks are able to recover training data from FL updates to the gradients, particularly updates to large batch sizes.
- In an inversion attack, it is possible to gain information regarding the training data from the model to be used in prediction.
- Membership inference attacks are used to determine if a certain record existed in a model’s training set.
- Synthetic data generators have the potential to learn about an individual or a small group of individuals in the source data and leak those students into the synthetic generation.
However, if not set up properly by DP, multiple DP mechanisms may lead to privacy budgets that are significantly lower than desired.
Every PET will have an attack surface. Without a threat model and adversarial evaluation, a PET deployment is not a step up in privacy, it’s security theater.
The Skills Gap Is Severe
Barriers to Adopting PETs is a new study carried out by the U.K. Information Commissioner’s Office (ICO) that confirms what many practitioners know: it turns out that it is hard to find individuals proficient in both cryptography and machine learning, and they need to be to deploy PETs properly. The majority of ML engineers don’t have cryptography expertise. The vast majority of cryptographers lack the expertise of ML. Now the field requires individuals who can connect both.
This is also a content gap: there is a lack of practical and concrete knowledge about the deployment of PET which extends beyond the academic abstraction. Most implementation guides do not include actual code or have a “real threat model breakdown.
Regulatory Uncertainty – When Is DP-Protected Data Still Personal?
One of the most tantalising unanswered questions – is data that has been processed using DP, when collected prior to processing, still considered “personal data” under GDPR? But regulators say the answer is, it depends. However, residual re-identification risk, particularly if DP outputs are using linked datasets from external sources, might result in data being still legally considered personal. At present, the area of PETs for Regulatory Compliance is a work in progress, as law and technical aspects are still under development.
PDEs must hire privacy attorneys at the outset, rather than at the end of the process, whenever deploying PETs.
How Practitioners Can Actually Use This Right Now
Map PETs to Threat Models, Not Buzzwords
The question is not “which PET should I use?”. But what is my real threat model? If there are multiple responses, then there will be multiple PETs:
- Data needs to remain physically located somewhere? → Federated Learning.
- Wanted an extra layer of privacy for a public statistics or model output? → Differential Privacy.
- Need to do computations on untrusted cloud providers? → TEEs.
- Collectively computing both without revealing any information about input to other? → Secure MPC or Homomorphic Encryption.
- Have to share a dataset externally for testing or pre-training? → Use Synthetic data, generated without the risk of identification.
Start Narrow and Prove It Out
The organizations I have observed that have been successful with PETs began with a single high value pilot with one clear focus. Two hospitals conducting an experiment with federated learning of a specific diagnostic model. Next, a financial consortium that is implementing the secure aggregation for a narrow fraud signal. A single analytics team, with one telemetry pipeline deployed using DP.
Supporting narrow scope provides your with a real system to learn from, a concrete privacy analysis to inform regulators, and governance patterns to reuse. It’s easy to see that attempting to create a full PET stack on a single data platform is quite oftentimes a huge task that is impossible to sustain.
Open-Source Frameworks Worth Knowing
- Google’s production-tested FL framework TensorFlow Federated (TFF) with support for DP and secure aggregation.
- OpenMened’s privacy-preserving ML framework, PySyft, can be used for FL, DP, and MPC.
- Microsoft SEAL / TenSEAL — HE libraries with Python bindings to experiment with encrypted computation.
- Relevant library: OpenDP — open source DP library used for the applications of US Census in Harvard.
- Flower (flwr) — Framework-agnostic FL library for PyTorch, TensorFlow and JAX.
- All of these are available for free, and actively developed. If you don’t have any small but real examples to build, but watch only, in this space you will not get real credibility.
Frequently Asked Questions
Do PETs replace security and compliance frameworks like GDPR?
No. PETs are controls that complement legal compliance, IAM, logging and incident response. So it is plain to OECD and regulators that PETs don’t represent a compliance quick fix. They diminish privacy risk – they do not remove legal obligations.
If I apply Differential Privacy, is my data automatically anonymous?
Not necessarily. If the information remains identifiable in any way after DP has been applied, it will still be regarded as “personal” information for the purposes of GDPR – even if it is anonymised.Even if the personal information remains identifiable in certain ways after DP (such as through linking with external datasets), it will still be considered “personal” for GDPR purposes. There’s also a growing scepticism from regulators for ‘naive’ anonymisation claims. It is important to note that for privacy, an appropriate (and well selected) epsilon can often make a massive difference to lowering the risk of re-identification, and that compositional considerations play a crucial part, but this does not require any privacy flag to be toggled.
Which PET is best for training AI on sensitive data?
It depends on the constraint that it operates under. Where data need remain on site: FL with DP the standard production style, secure aggregation. TEEs are the pragmatic solution, if the computation is to be realized on an untrusted cloud compute. When several independent parties require computation together but neither party can access the data of the other, then MPC or HE can be used. The correct answer is usually a mixture of two or three of these.
Are PETs too slow for production?
There are some that are in their general use form. However, general HE and rich MPC protocols can still be too slow for a variety of real-time applications. Many workloads can be production viable for DP, FL, and TEE-based confidential computing, provided some care is taken to make sure operations, especially memory operations, have low latency and that large PETs are scoped to the most important stages of the pipeline.
What’s the most practical starting point for a team new to PETs?
Choose one specific and easy-to-understand use case with a clearly identifiable threat model. Initially run a small FL + DP pilot or deploy a single, sensitive compute workload that requires confidential computing. Have privacy experts on board from the beginning. List the conditions or guarantees your selected PET will provide and no provide for an overview of compliance and legal issues. Then, formulate governance structures that are reusable.
My Take: What This Space Actually Needs
PETs are past the research-only stage. The foundation features of DP, FL, TEE, secure aggregation are in production at scale. Not the encryption, it’s the deployment ignorance, the risk model awareness and the compliance conflusion.
The missing element is practitioners who also successfully balance exposure from attacks, governance needs and practical realities to provide a thorough understanding of a PET architecture without resorting to nothing but business management or theory.
If you are developing some real depth here, you are either an 18- to 35-year-old building in AI, or perhaps you’re a data engineer, or perhaps only you’re seriously paying interest in just how it shall ought to be performed within the era of large-scale device learning. The tools are free and open source. The study is open source. There is a significant need for those with technical and governance skills and an increasing need for those that have both.
It is not the organizations with the largest budgets who are succeeding in bringing PETs to fruition. These are the people who began with a very narrow view and created real systems and involved a legal team and a technical side into the same process from the start.
That’s the actual playbook. All the noise is worthless, except for that.
Read:
Kubernetes Security – Hardening Cloud-Native Workloads
I’m a technology writer with a passion for AI and digital marketing. I create engaging and useful content that bridges the gap between complex technology concepts and digital technologies. My writing makes the process easy and curious. and encourage participation I continue to research innovation and technology. Let’s connect and talk technology!



