Nvidia Unveils Groq 3 LPU Chip to Boost AI Inference Performance

March 17, 2026

At this year’s GPU Technology Conference (GTC), Nvidia revealed an expansion to its Vera Rubin platform with the integration of a new chip called the Groq 3 LPU. This addition comes as a result of leveraging intellectual property acquired from the AI chipmaker Groq, aiming to enhance the performance of AI inference tasks.

Enhancing AI Inference at the Token Level

The Groq 3 LPU is positioned by Nvidia as an inference accelerator designed to deliver a high volume of token outputs with minimal latency. Its incorporation into the Vera Rubin platform aligns with Nvidia’s goal to optimize the speed and efficiency of generating tokens, which is a critical operation in many natural language processing and AI model inference workloads.

Token-level processing is fundamental to models that handle text generation, understanding, and related AI functionalities. By providing a dedicated chip tuned for this purpose, Nvidia aims to address the growing demand for inference acceleration in machine learning applications. The Groq 3 LPU is expected to work alongside other components within Vera Rubin to bolster the platform’s overall capabilities.

The announcement at GTC highlighted Nvidia’s continued strategy to enhance AI infrastructure by integrating specialized hardware. The Groq 3 LPU chip reflects this approach by focusing on specific computational tasks that traditional GPUs might not handle as efficiently.

Although technical details such as performance metrics, power consumption, or deployment timelines were not disclosed, the strategic move underscores how Nvidia is expanding its AI hardware portfolio through acquisitions and platform improvements targeting inference workloads.

With the Vera Rubin platform now augmented by the Groq 3 LPU, Nvidia is further positioning itself to support advanced AI model deployment across various industries that rely on rapid and large-scale token generation.

Nvidia introduces the Groq 3 LPU, a new inference accelerator chip designed to enhance token-level processing with high throughput and low latency.