How AI Companies Are Finally Locking Down Their Models

It is the silent anxiety that permeates each AI team that has ever deployed a model to the third-party cloud. You have taken months of your life even years training something that is valuable. and then you turn it over to infrastructure you lack 100 percent control over, which runs on people who have administrative rights over it and with whom you have no connection.
It’s not paranoia. It’s a real gap. And confidential computing is the most plausible technical solution, the industry has managed to come up with to date.

It then disentangles what actually is currently working, what is yet to be figured out, and, most importantly, how engineers, enterprises, and model providers can begin to use this and not get lost in the complexity.

Table of Contents

The Problem Nobody Talks About Out Loud

The vast majority of AI security discussions fly at the level of encrypting data at rest and transit. That does actually sound complete until one finds out that there is a third state, which is the data in use. When your model is actively executing inference, the weights are stored in memory – accessible by anybody with root access to the host machine.

Cloud providers, hypervisor, other users of a shared set of GPUs they are not intended to look. But technically, a great number of them are able to do so.

This is what everyone who is using confidential computing is trying to keep secret. Before reading on further, the Confidential Computing Consortium Confidential Computing 101 primer provides some background on the differences between this and standard encryption.

Protecting Proprietary Models and IP with Confidential Computing – What the Tech Actually Does

The Enclave Idea, Without the Jargon

Trusted Execution Environments (TEEs) comprise carving out a hardware-isolated portion of memory, enclave, where code is executed encrypted. The host OS or hypervisor is even unable to read what is occurring inside. The two primary CPU implementations that you will come across today on large clouds are Intel TDX and AMD SEV-SNP.

With AI workloads, it implies that you can load your model weights into an encrypted container image. Such image is, in effect, useless, unless a trusted TEE has been established. When deployed, you can do a check-equivalent of hardware and software stack attestation, called remote attestation, that is, whether the hardware and software stack is exactly as you intended before you publish the decryption key.

One thing that has helped me comprehend this pipeline better is by looking at the documentation that NVIDIA has put together on this pipeline and the mental model that has helped me make better sense of this pipeline is this: think of it as a sealed vault that can demonstrate to you that it is a vault even before you hand over the combination.

The flow appears to be approximately the following:

Encrypt weights + code model weights with keys maintained in a Key Management System (KMS) or Hardware Security Module (HSM).
wrap as a container image (encrypted) – will do nothing in plaintext out of a TEE.
Deploy on confidential VMs or enclaves, with a very small trusted software stack.
Run remote attestation -only when measurements check the KMS issues decryption keys.
Decrypt within the enclave, infer, and clean up logs of any sensitive result.

The bulk of the operating nuance is in step 4.

What’s Production-Ready Right Now

It is no longer experimental at least not at the CPU end.

They are all three large clouds (AWS, Azure, Google Cloud) of confidenced VM SKUs, which are based on Intel TDX or AMD SEV-SNP. Containerized AI workloads can be operated on these today with comparatively few code adjustments. Even Google Cloud provides a lot of the attestation plumbing, in their Confidential Space product.

Other platforms, such as Fortanix Confidential AI, go to a further level: they have created end-to-end pipelines, with encrypted model distribution, attestated key release, and runtime protection, all on end-to-end pipelines that work across enterprise deployments. I realized that they have been particularly geared towards regulated industries such as healthcare and finance in which both the model IP and the patient/transaction data must be safeguarded co-jointly.

There are real rollouts in those areas already. Hospitals with federated learning through institutions, banks that use third-party LLMs are not exposing customer records to the model provider. They are not pilots they are loads in the production.

To establish a sound foundation on why the enterprises in the specific case are taking this direction, the dissection of Top Benefits of Cloud Computing on Business published in the blog of Google Cloud is tangible and does not engage in the ancient art of hand-waving.

Where Things Get Messy – The Real Challenges

GPU Support Is Early, Not Finished

The H100 and H200 chips, introduced by NVIDIA, introduced a support of confidential computing at the level of the GPU, pushing the TEE boundary in the RAM of the device. In the context of LLM inference, that is enormous – even the most serious models cannot possibly run on CPU TEEs alone due to performance limitations.

However, the software-based society surrounding GPU secret computing continues to grow. The adaptation is needed to get frameworks such as vLLM or PyTorch to run correctly in a confidential GPU environment. The debugging experience within an enclave is deliberately limited by design, which contributes to a noticeably poorer debugging experience compared to typical working loads of a GPU.

Paper published by IBM Research demonstrated that under the appropriate parallelization strategy, the overhead of CPU-GPU setups of TEE can be insignificant in respect to the use of LLM. Yet that sentence works a great deal of work with that phrase with the right strategy. You should benchmark your particular work load, as opposed to using general figures.

Attestation at Scale Is an Operational Problem

Remote attestation is clean in a diagram. Practically, you’re tracking the attestation flows, key release by policy, and schedules of key rotation, audit trails, and that must all be integrated with your existing CI/CD pipelines and MLOps tooling.

Reading my way through the Confidential Space documentation by Google Cloud, the primitives will be solid, though wiring up the actual functionality will be very much a complex task. This will be struck by teams who have never considered the aspect of managing secrets at scale.

Confidential Computing Doesn’t Replace Everything Else

This is the lesson that most articles will omit. Hardware isolation safeguards your model weights, in-memory data with attackers at infrastructure-level. It doesn’t protect against:

A legitimate user deriving knowledge out of the model by tactic of repeated queries (model extraction attacks)
The first place weak access controls to the requestors of inference.
Contract variation on deployment to consumers, who do not agree to extraction of the model, but might attempt to anyway.
Translation requirements that transcend technical constraints.

Contracts, licensing, access controls, rate limiting, legal frameworks are still there. One but not the entire stack is confidential computing.

What’s Coming And Why It Actually Matters

NVIDIA Blackwell and the Next GPU Generation

The Blackwell architecture (B100/B200) of NVIDIA is expanding confidential computing support even further, with enhanced integration between the CPU TEE and the regions of the Google compute resources that are provided with and handed out protected by confidential computing. The economics surrounding confidential AI inferences will change as these chips become more ubiquitous due to cloud providers and the existence of markets providing access to what can be referred to as confidential AI inferences.

At the current time GPUs in confidential instances are both very costly and scarce. That will change.

Cross-Vendor Attestation Standards

Presently, attestation is mostly vendor specific. Intel TDX attestation does not inherently or automatically implorate or interoperate with the AMD SEV-SNP verification in a standardized manner. The Confidential computing consortium has been developing the standards, and this is yet to be completed.

Cross-vendor attestation can make it a lot more straightforward to construct portable confidential AI pipes that are not tied to a single cloud or even a single chip vendor. When the ecosystem is truly opened up this is what happens.

Tooling and Developer Experience

Developer friction is a fact, and the vendors understand it. Improved observability tools that do not require breaking enclave guarantees, improved debugging workflows and higher level SDKs are all areas of active investments. The trend is obviously to make this available to teams which are not necessarily experts of confidential computing.

Three Scenarios Worth Understanding

These overlap with the actual distribution of this by various teams:

Scenario 1 The Model Provider: A business having proprietary LLM desires giving deployments of the bring your own data. The encrypted model is executed on the customer-owned confidential GPU infrastructure. The provider can still protect IP, the customer can still have full sovereignty of data. There is no need to trust the infrastructures of either party.

Scenario 2 – The Enterprise: A financial services company has standardized on confidential VM and GPU SKUs, across all AI workloads that interact with customer data. The release of keys based on attestation is platform-enforced at the platform level. The security team can show auditors that model weights and inference data are not shared with the administrators of the cloud provider.

Scenario 3 – The ML Engineer Getting Started: Begin with a very simple inference workload load on a confidential VM with sample code in Google Cloud or Azure. Know the flow of the attestation. Benchmark latency. When you get acquainted with the working patterns, transfer the models that have high value into the fitting.

My Take on Confidential AI Platforms Specifically

Systems such as Fortanix Confidential AI separate out a big portion of the complexity. In the case of all the teams which need to distribute models encrypted and protect their runtime, but do not want to create its infrastructure of attestation by hand, they are worth consideration.

When looking over their approach, what I found interesting is that the multi-party deployment scenario is specifically addressed, where you have a model provider, a data owner, and a compute provider but they are all separate entities and differing trust relationships. It is very difficult to coordinate that by hand. It is not just a convenience to have a platform that provides it end-to-end, but it is risk reduction.

Dependent on the vendors is the tradeoff. You are entrusting to the software stack provided by the platform, and that software stack becomes part of your trust boundary. Good to consider well.

Free Resources Worth Actually Using

You can continue digging without spending, and this is a viable reading order:

Start with concepts:

What is confidential computing? – the clean overview to What is confidential computing? with real AI examples. Red Hat.
The blog by Google Cloud on confidential computing in AI and federated learning – good on patterns of architecture talks of a blog.

Shuffle to AI-specific depth:

The article by Red Hat Office of the CTO about how phoning AI inference security using confidential computing improves.
NVIDIA developer blog on safeguarding sensitive data and AI models – discusses end-to-end pipelines.

Get into benchmarks:

arXiv article on confidential computing on NVIDIA hopper GPUs – will quantify how much overhead there is in the specific case of LLM inference on NVIDIA hopper GPUs.
IBM Research vLLM in secrecy CPU-GPU enclaves – the benchmarking information here is encouraging.

Hands-on docs:

Google Cloud Confidential Space documentation
ist fortanix Confidential AI docs and blog posts

Approximately in that order deal with them. Ideas, then architecture, then descriptions of specifics of platforms.

Who Should Actually Care About This Right Now

Not everybody has to do so at the moment, but here, perhaps, is a definite setback:

Act now if you’re:

A model provider that is deployed to environments of customers or shared clusters of GPUs.
A business in health, finance, or government that deals with sensitive information and proprietary models.
A team that does inference over rented/third-party-provided infrastructure based on a rented set of GPUs.

Prepare to score in case you:

Creation of AI products which will at some point process regulated information.
Analysis of cloud providers to long-term AI infrastructure.

Keep check at the moment whether you are:

Starting at the early steps and setting up on your specifically managed infrastructure.
Operating on open-source designs in which IP protection is not as much of an issue.

Wrapping Up – An Honest Take

Confidential computing can be used to protect proprietary models and IP in real and practical use on CPUs today. On GPUs it is nearer – in practice though still crude in tool making and supply.

The technology helps bridge a real gap that could never be bridged by encryption.

Nonetheless, it is an element of a bigger security and IP strategy, rather than a substitution of access controls, contracts, and compliance process.

It is the teams that receive small workloads today, who will gain familiarity with the new work processes of attestation and key management, and who will take advantage of the opportunity to use confidential GPU availability when it becomes the option of last resort.
Not a far off horizon. Hardware is already shipped.

Pranay Sai Aduvala

I’m a technology writer with a passion for AI and digital marketing. I create engaging and useful content that bridges the gap between complex technology concepts and digital technologies. My writing makes the process easy and curious. and encourage participation I continue to research innovation and technology. Let’s connect and talk technology!