
For years, the public cloud has been the default destination for enterprise workloads. It offered virtually unlimited scalability, reduced infrastructure management, and allowed organizations to innovate faster. However, the rise of AI is causing many organizations to rethink whether every workload truly belongs in the public cloud.
Compute demand
AI workloads are fundamentally different from traditional applications. They require massive amounts of compute power, consume enormous datasets, and in many cases are highly sensitive to latency. These characteristics are exposing some of the trade-offs of cloud-first strategies.
One of the biggest challenges is compute demand. Training or running large AI models requires powerful GPU infrastructure that can operate continuously. While public cloud providers offer access to the latest GPUs, the costs can become substantial when workloads run around the clock. What initially appears cost-effective can become expensive at scale. Many organizations are beginning to realize that if they have a predictable and sustained demand for AI processing, investing in their own GPU infrastructure may provide better long-term economics.
Data gravity
Data gravity is another important factor. AI systems are only as valuable as the data they can access. Large enterprises often possess petabytes of information spread across data warehouses, document repositories, transactional systems, and data lakes. Moving all of this data into the cloud is not always practical. It introduces transfer costs, synchronization challenges, and in some industries, regulatory concerns. As a result, many organizations are choosing to bring AI closer to where their data already resides rather than moving the data to the AI.
Latency
Latency also plays a significant role. Certain AI use cases can tolerate delays of a few seconds, but others cannot. Fraud detection, real-time payment processing, customer service assistants, and manufacturing automation often require near-instant responses. In these scenarios, every network hop matters. Running inference engines closer to business applications and data sources can deliver faster and more predictable performance than relying on a remote cloud environment.
This naturally leads to the question: does running AI on-premises or in a private cloud provide better performance?
In many cases, the answer is yes.
A dedicated on-premises AI environment can be optimized specifically for the organization’s needs. GPUs are not shared with other tenants, data is accessed locally, and networking can be tuned for high-performance AI workloads. The result is often lower latency, higher throughput, and more predictable performance. This is one reason why many large banks, telecommunications providers, and technology companies continue to invest heavily in their own AI infrastructure.
That said, public cloud still offers significant advantages. It provides rapid access to new technologies, allows organizations to scale quickly, and removes the burden of managing hardware. If a company suddenly needs hundreds of GPUs for a training exercise, the cloud can often provide them immediately without waiting months for procurement and deployment.
Because of these competing factors, the industry is increasingly moving toward a hybrid approach rather than choosing one side over the other.
Strategic Perspective
A common pattern is to use public cloud resources for experimentation, model development, and burst training workloads while keeping production inference and sensitive data within private or on-premises environments. This allows organizations to benefit from cloud flexibility while maintaining control over performance, cost, and governance.
For industries such as banking, this model is particularly attractive. Customer data, core banking systems, and payment platforms often remain within the bank’s infrastructure, while cloud services are used selectively for AI training, model experimentation, or temporary capacity expansion.
When AI Challenges the Cloud-First Strategy
Ironically, AI may become one of the biggest reasons some enterprises slow down their migration to public cloud. Not because cloud has failed, but because AI makes factors such as data location, latency, compliance, and GPU economics far more important than they were for traditional business applications.
Forward-Looking
The future is unlikely to be purely public cloud or purely on-premises. Instead, organizations will increasingly place AI workloads wherever they make the most sense—running compute close to data, keeping latency-sensitive workloads local, and using the cloud when flexibility and scale are needed. In many ways, AI is pushing enterprises toward a more balanced and pragmatic hybrid architecture.
Production inference is the process of deploying a trained machine learning model into live environments to make real-time predictions on unseen data.
Data gravity is the phenomenon where a growing dataset becomes so massive that it “pulls” applications, processing power, and other services toward it.
Latency is the delay or time interval between a user’s request/action and the system’s response.
TechE2E
A diverse group of technologists—ranging from beginners to experienced professionals—sharing insights, simplifying complex tech topics, and fostering meaningful discussions for readers at all stages of their journey.

