Up to 84% of GPU power wasted in growing multimodal AI sector

Thu, 27th Nov 2025

New data from NeuReality has revealed that as much as 84% of GPU computing power is being wasted in multimodal AI environments.

The findings underscore the growing inefficiency as enterprises ramp up their use of artificial intelligence that processes images, video, text and voice, with significant economic and operational consequences.

Resource underuse

The surge in multimodal AI workloads is visible in platforms like Google Lens, which processes over 20 billion visual queries each month, and Alibaba, with more than 50 million daily image-based requests. However, legacy infrastructure was not engineered for these workloads, leaving a vast proportion of computing capability idle.

"We're at an inflection point where the infrastructure needs to be optimised for running models as efficiently as possible," said Gaurav Shah, VP of Business Development, NeuReality. "Companies are investing millions in GPU capacity, but our research shows only 16% of that compute power is being properly utilised in multimodal inference workloads. That's not just inefficient, it's economically unsustainable."

Bottleneck challenges

Traditional x86 server architectures remain a core source of the inefficiency. For inference pipelines that process video frames and images, there is a need for frequent, asynchronous communication between services responsible for vision processing, embedding, vector search, and language inference. The central processing unit (CPU) orchestrates all data flow in these systems.

This setup forces every data packet - from decoded video frames to language embeddings - to be managed by the CPU, creating serial delays. The result is that GPUs, designed for parallel workloads, are frequently left idle as they await new tasks, undermining potential performance and return on infrastructure investment.

Financial burden

The infrastructure inefficiency translates into considerable financial loss. Large AI-driven platforms, such as those mentioned, incur hundreds of millions of USD in excess capital and operational expenditures per year as they acquire and maintain GPU resources that remain largely unproductive. Unused GPU capacity not only wastes upfront investment but also contributes to unnecessary energy consumption, pushing up costs for power and cooling.

Sectors beyond big tech are also feeling the pressure. Healthcare organisations relying on AI for medical imaging, security firms deploying scene analysis, and media companies automating content indexing all experience similar bottlenecks and inefficiencies.

Architectural overhaul

NeuReality is reworking the core organisation of AI inference systems with a new approach. Its NR1 AI-CPU moves orchestration, preprocessing, and vector tasks away from the general-purpose CPU to dedicated hardware engines built for these operations. The NR AI Hypervisor is tasked with distributing workload and data management across many GPUs, aiming to close the utilisation gap.

"We're seeing up to 85% performance improvement and near-linear scaling across multi-GPU configurations in our benchmarks," said Shah. "In production environments, this means achieving 100% GPU utilisation - getting the full value from infrastructure investments while dramatically reducing power consumption."

Competitive pressures

Industry analysts predict that infrastructure efficiency will be a central factor determining which organisations maintain a profitable edge as multimodal AI adoption widens. The cost of inefficient scaling may become unsustainable for firms unable to overhaul their underlying systems.