As AI enters the era of agentic inference, datacenter operators must rethink infrastructure design, embracing energy-efficient architectures, liquid cooling, and modular deployment models to maximize intelligence per watt and enable sustainable, large-scale AI adoption.
The AI industry is deep into a transition, and the infrastructure supporting it needs to keep pace. As the inference era accelerates, one imperative is becoming impossible to ignore. AI systems must extract more value from every watt they consume, and they must do it now.
The scale of the problem is reflected in the data. AI-driven datacentre electricity consumption has grown at roughly 12% per year since 2017, more than four times the rate of overall global electricity demand growth. Regionally, UAE datacenter electricity consumption is projected to double from around 3 TWh in 2025 to over 6 TWh driven by AI and cloud computing investments.
Redefining what intelligence per watt means
The environmental argument for efficient AI infrastructure is well-worn. But treating this purely as a sustainability conversation understates what is actually at stake. The real issue is infrastructure efficiency at system scale, specifically maximizing intelligence per watt (IPW) across the full operational lifecycle.
AI is moving into an agentic inference phase, where models no longer simply respond to queries. They interpret context, make decisions, and act continuously in real time. That shift places entirely new demands on infrastructure, driving sustained pressure on compute, energy, latency, and system efficiency across the stack. NVIDIA’s AI infrastructure model identifies five layers, with energy at the foundation, a recognition that a system’s intelligence ceiling is ultimately set by the power available to run it.
“The future of AI will not be determined solely by model sophistication, but by the efficiency of the infrastructure that powers it. Maximizing intelligence per watt is emerging as the defining metric for sustainable AI growth, enabling organizations to scale innovation while controlling energy consumption, costs, and environmental impact.”
— Sami Alfaraj, MEA Head of Technology, Submer
That positions energy as the cornerstone of IPW. When IPW is higher, AI models deliver equivalent or improved performance at lower electricity draw. The framing shifts accordingly. AI is no longer an energy liability but a potential driver of efficiency at scale, provided the infrastructure beneath it is built with that outcome as a design condition. The applications are concrete. Higher-IPW AI is better positioned to manage smart grids, cut industrial waste, and optimize resource-intensive operations.
The implications run deeper than operational metrics. In the inference era, infrastructure efficiency directly shapes capital allocation, deployment velocity, and whether systems can scale without compounding cost.
Edge deployment and the efficiency case
Research suggests that running smaller, specialized AI models locally at the edge can reduce energy consumption by 60 to 80% compared to large general-purpose models operating from centralized cloud datacenters. That decentralization produces AI applications that are leaner, lower-latency, and higher in IPW, which strengthens the case for designing datacenters around efficient model architectures and purpose-fit hardware, rather than simply scaling what already exists.
That said, the efficiency question is not reducible to a choice between centralization and edge deployment. Energy is one dimension of a larger picture. True infrastructure efficiency also spans materials sourcing, capacity planning, and lifecycle decision-making over time. A genuinely efficient datacenter is one where operational gain compound, each efficiency improvement reducing energy draw and lifting IPW across the system.
Translating IPW into infrastructure design
The inference era exposes a structural mismatch at the heart of datacenter design. Air-cooled facilities were built for batch compute workloads, not for the continuous, high-density demands of agentic AI. The inefficiencies climb along with utilization and rack density, with higher energy consumption, increased water usage, and accelerated hardware lifecycles generating additional cost and carbon.
Addressing this requires a holistic rethink of the infrastructure stack, not a series of incremental fixes applied to an architecture built for a different era.
Liquid-cooling technologies and modular architecture are gaining traction as a more coherent answer. Liquid cooling removes the thermal ceiling that constrains air-cooled systems, enabling high compute density at reduced energy cost. Modularity means infrastructure can evolve with hardware without requiring wholesale replacement, eliminating unnecessary capital expenditure and operational disruption.
The impact of these approaches can be quantified. Submer’s existing infrastructure deployments have delivered energy savings of 913.68 GWh, water savings of 3,653.95 million liters, and CO₂ equivalent emissions reductions totaling 323,110 tones, figures drawn from full lifecycle analysis rather than point-in-time efficiency measurements. That makes them particularly meaningful for operators weighing the long-term consequences of today’s infrastructure decisions.
Efficiency is an architectural condition that needs to be established from the outset. As AI workloads take on the same operational criticality as power and connectivity, the infrastructure supporting them must be held to the same standard, delivering more intelligence per watt, reliably and at scale.
