Cerebras Unveils World’s Fastest AI Chip, Beating Nvidia in Inference Speed

 

In a move that could redefine AI infrastructure, Cerebras Systems showcased its record-breaking Wafer Scale Engine (WSE) chip at Web Summit Vancouver, claiming it now holds the title of the world’s fastest AI inference engine. 
Roughly the size of a dinner plate, the latest WSE chip spans 8.5 inches (22 cm) per side and packs an astonishing 4 trillion transistors — a monumental leap from traditional processors like Intel’s Core i9 (33.5 billion transistors) or Apple’s M2 Max (67 billion). 
The result? A groundbreaking 2,500 tokens per second on Meta’s Llama 4 model, nearly 2.5 times faster than Nvidia’s recently announced benchmark of 1,000 tokens per second.

“Inference is where speed matters the most,” said Naor Penso, Chief Information Security Officer at Cerebras. “Last week Nvidia hit 1,000 tokens per second — which is impressive — but today, we’ve surpassed that with 2,500 tokens per second.” 

Inference refers to how AI processes information to generate outputs like text, images, or decisions. Tokens, which can be words or characters, represent the basic units AI uses to interpret and respond. As AI agents take on more complex, multi-step tasks, inference speed becomes increasingly essential.

“Agents need to break large tasks into dozens of sub-tasks and communicate between them quickly,” Penso explained. “Slow inference disrupts that entire flow.” 

This article has been indexed from CySecurity News – Latest Information Security and Hacking Incidents

Read the original article: