Post-Quantum Readiness in AI Infrastructure

This is no longer theoretical. Post-quantum readiness is becoming a core architecture decision for modern AI infrastructure.

Harvest Now Decrypt Later Is a Veiled AI Risk

A founder I was advising was building a VR input solution alongside a backend to create a corpus from anonymized user data. When asked how long they expected that training data to remain sensitive, the answer was essentially 'forever.' That conversation made PQ readiness feel present tense, not a future migration. The assets accumulating inside AI platforms: training corpora, model weights, and embeddings. They don't expire. Neither does the exposure.

Practically every AI platform relies on Transport Layer Security (TLS) for artifact storage, control planes, cluster trust. Without a migration path away from RSA and ECC toward NIST-standard post-quantum primitives such as CRYSTALS-Kyber (ML-KEM) for key exchange, and CRYSTALS-Dilithium (ML-DSA) for signatures, your confidentiality window is time-bounded. Most production deployments will need to traverse a hybrid period before full PQ adoption is viable.

If your models are strategic assets with a 5- or even 10-year lifecycle, PQ-readiness is a present architecture decision, not a future migration project.

Three Planes Where PQ Stress Shows Up First in AI Systems

Control Planes where service meshes, mutual TLS communications, certificate issuance, and autoscaling bursts introduce measurable performance stress like handshake latency, CPU overhead, and memory pressure. This is where the storm originates. Autoscaling events trigger thousands of simultaneous connections: new GPU workers joining training clusters, job schedulers spinning up, service mesh certificates rotating. The resulting handshake amplification is a control plane event. The throughput cost shows up on the data plane.

Data Planes (High-Throughput GPU Fabrics) where AI clusters push encrypted traffic across east-west VPC traffic, multi-cloud links, cross-region replication and enterprise ingress have exhibited performance issues with PQ-safe cryptography since the first implementations. When PQ-safe TLS or hybrid key exchange is introduced, packet sizes increase, CPU cycles per handshake increase, and hardware acceleration may not fully support PQ primitives. At scale, even a 3–5% regression in effective throughput can translate into meaningful GPU underutilization. The tradeoff becomes real very quickly. Do you sacrifice throughput or defer PQ?

In practical deployments the difference is not subtle. Hybrid TLS handshakes that combine classical ECDHE with ML-KEM routinely expand handshake payload sizes by several kilobytes and increase CPU verification costs. The effect is rarely visible in small benchmarks but becomes obvious at production scale. In GPU-dense clusters where nodes represent tens of thousands of dollars of hardware, even short-lived connection storms can translate into measurable idle GPU time.

Artifact Planes (Artifact Authenticity and Cryptographic Supply Chain Integrity) where model weights are signed, replicated, cached, and redistributed across registries and regions, and where the move toward ML-KEM key establishment and ML-DSA signatures introduces real overhead. In hybrid deployments, signature sizes increase and verification paths grow more complex. In high-churn CI/CD and model distribution pipelines that overhead becomes a measurable systems characteristic, not a theoretical concern.

This surface expands in federated learning, multi-party training, and partner model exchange. Key rotation, enrollment, revocation, and long-term provenance guarantees all carry additional weight under PQ-safe schemes. At AI scale, cryptography is not just transport security, it becomes part of the lifecycle management layer for strategic assets.

GPU Acceleration: PQ Enters the AI Compute Stack

NVIDIA recently introduced cuPQC, a GPU-accelerated SDK for NIST-standard post-quantum algorithms. For years, one of the objections to PQC adoption at scale was: "The math is too expensive computationally." GPU acceleration reframes PQ operations, especially lattice-based key encapsulation, as parallelizable workloads.

If you're already GPU-dense, it's worth running your own benchmarks, particularly around high-volume key encapsulation during training cluster churn. If you're not, wait for someone else to do that work and publish it. The practical value at smaller scale isn't there yet.

The more interesting signal is that NVIDIA is shipping this at all. Most of the industry is still figuring out what to do with AI. NVIDIA is already thinking about what comes after it. That's worth paying attention to regardless of whether you adopt cuPQC today.

CPU & Network Offload: The Other Half of the Story

At the same time, pressure from PQ has not waned on the network edge. Targets including ATS / Envoy / NGINX ingress, mTLS sidecars, API gateways, and cross-region encrypted backbones have exhibited pressure and measurable overhead well before LLMs began to dominate.

The acceleration paths for edge are well-defined: OpenSSL 3.5+ native post-quantum libraries, Open Quantum Safe (OQS) provider integration, Intel AVX-512 vectorization, and QuickAssist Technology (QAT) hardware offload for high-CPS environments.

The compute placement decision isn't actually that complicated once you've benchmarked it: GPUs for batched PQ workloads, AVX-512 for CPU-side KEM and signature efficiency, offload engines for edge handshake volume. The hard part is getting there intentionally rather than discovering it under production load.

The Strategic Mistake: Treating PQ as a Library Swap

The most common anti-pattern I’ve heard is “When standards finalize, we’ll just upgrade OpenSSL.” It's obvious why this falls short. It completely ignores the massive protocol surface area (TLS, QUIC, RPC, IPsec, firmware validation), compounded performance regressions at scale, and hardware roadmap misalignment.

PQ-readiness is an initiative requiring cryptographic inventory, capacity modeling, benchmarking under hybrid load, hardware vendor alignment, and executive sponsorship tied to long-term data risk.

Why This Matters Specifically for AI

The AI industry is spending billions on model development while largely treating cryptographic exposure as an edge problem, something bolted on at the perimeter like DDoS mitigation. That framing doesn't hold for AI infrastructure. The artifacts that matter most, including training corpora, model weights, and embeddings, move continuously across networks, regions, and partners. There's no single edge to protect.

Most of the IP protection conversation in AI has focused on distillation attacks against already-released models. That's a real threat, but it's a narrow one. The more exposure sits upstream: the training data and intermediate artifacts that represent years of investment and haven't been released to anyone. Those assets are long-lived, they're valuable, and right now most of them are protected by cryptography that has a known expiration date.

A Minimal Readiness Path for AI Platforms

Inventory: Map every cryptographic dependency across control, data, and artifact planes. Identify long-lived assets and hybrid exposure points.
Benchmark: Measure handshake latency, throughput regression, signature verification cost, and GPU underutilization under hybrid PQ loads.
Placement Strategy: Decide intentionally where PQ executes across CPUs, GPUs, and hardware offload engines rather than defaulting to library-level swaps.

Closing

AI infrastructure discussions rarely treat post-quantum readiness as a first-class architectural topic. Given where the investment is going and how long those assets will need to remain protected, that's a gap worth closing sooner than most teams plan for.

If you're not sure where your cryptographic exposure sits today, that's usually the right place to start.

Update - April 2026

A recurring objection after publication was the physics uncertainty argument, specifically the Ozhigov complexity-accuracy uncertainty relation and Knight's decoherence arguments against QEC's theoretical foundations. These are not fringe positions, and the exchange is worth preserving here.

PQ readiness doesn’t require certainty that cryptographically relevant quantum systems will arrive. It requires taking the possibility seriously over a 10-year window, which NIST and CISA already do. ‘Harvest now, decrypt later’ is not theoretical; it is documented adversarial posture.

The data accumulating in AI systems today will still exist when the physics debate resolves. Migrating to post-quantum cryptography is a bounded engineering cost. Being wrong about the timeline is not. The asymmetry is the point.

That was the position in March. Then Google Quantum AI published a whitepaper demonstrating that Shor’s algorithm for ECDLP-256, the elliptic curve cryptography underlying most AI infrastructure today, may require fewer than 500,000 physical qubits, roughly a 20-fold reduction from prior estimates. The research focuses on cryptocurrency, where the economic stakes are immediate, but the primitives are identical: RSA and ECC underpin the control planes, data planes, and artifact integrity of modern AI systems.

The training corpora, model weights, and embeddings described throughout this article are protected by the same cryptographic assumptions Google just demonstrated are more vulnerable than the industry believed.

Google withheld the underlying circuits and used a zero-knowledge proof to validate the estimates instead, departing from the security community's usual norm of full disclosure. The circuits will follow eventually; that is how this field works. What matters now is that the resource estimate is public, verified, and will be reproduced.

The asymmetry argument has not changed. What has changed is that one of the most credible quantum computing organizations in the world just moved the timeline closer.

Sean O'Hara

Technology leader and Co-Founder of Arbor Engineering Group. He writes about infrastructure, engineering organizations, and the decisions that compound quietly before they surface. He is currently looking for a full-time CTO or Head of Engineering role. Find him at arboreng.com/insights or on LinkedIn.

CTO Insights