Cerebras Systems, a startup dedicated to accelerating artificial intelligence (AI) computing, recently unveiled what they call “the largest chip ever built.” Optimized for AI work, the Cerebras Wafer Scale Engine (WSE) is a single chip that contains more than 1.2 trillion transistors and is 46,225 square millimeters, the company says.
The WSE is 56.7 times larger than the largest graphics processing unit, which measures 815 square millimeters and 21.1 billion transistors.
The WSE also contains 3,000 times more high speed, on-chip memory, and has 10,000 times more memory bandwidth, according to Cerebras.
In AI, chip size is profoundly important. Big chips process information more quickly, producing answers in less time. Reducing the time-to-insight, or “training time,” allows researchers to test more ideas, use more data, and solve new problems. Google, Facebook, OpenAI, Tencent, Baidu, and many others argue that the fundamental limitation to today’s AI is that it takes too long to train models. Reducing training time removes a major bottleneck to industry-wide progress. — Cerebras press release
How it works
With an exclusive focus on AI, the Cerebras Wafer Scale Engine accelerates calculation and communication and thereby reduces training time. The approach is straightforward and is a function of the size of the Cerebras Wafer Scale Engine: With 56.7 times more silicon area than the largest graphics processing unit, the WSE provides more cores to do calculations and more memory closer to the cores so the cores can operate efficiently.
Because this vast array of cores and memory are on a single chip, all communication is kept on-silicon. This means the WSE’s low-latency communication bandwidth is immense, so groups of cores can collaborate with maximum efficiency, and memory bandwidth is no longer a bottleneck.
“While AI is used in a general sense, no two data sets or AI tasks are the same. New AI workloads continue to emerge and the data sets continue to grow larger,” said Jim McGregor, principal analyst and founder at TIRIAS Research.
“As AI has evolved, so too have the silicon and platform solutions. The Cerebras Wafer Scale Engine is an amazing engineering achievement in semiconductor and platform design that offers the compute, high-performance memory, and bandwidth of a supercomputer in a single wafer-scale solution.”
The WSE contains 400,000 AI-optimized compute cores. Called SLAC for Sparse Linear Algebra Cores, the compute cores are flexible, programmable, and optimized for the sparse linear algebra that underpins all neural network computation, according to the company.
SLAC’s programmability ensures cores can run all neural network algorithms in the constantly changing machine learning field.
Because the Sparse Linear Algebra Cores are optimized for neural network compute primitives, they achieve industry-best utilization—often triple or quadruple that of a graphics processing unit.
In addition, the WSE cores include Cerebras Systems-invented sparsity harvesting technology to accelerate computational performance on sparse workloads (workloads that contain zeros) like deep learning.
High-performance deep learning requires massive compute with frequent access to data. This requires close proximity between the compute cores and memory.
“This is not the case in graphics processing units where the vast majority of the memory is slow and far away (off-chip),” says Cerebras.
The Cerebras Wafer Scale Engine has 18 Gigabytes of on-chip memory accessible by its core in one clock cycle. The collection of core-local memory aboard the WSE delivers an aggregate of 9 petabytes per second of memory bandwidth—this is 3,000 X more on-chip memory and 10,000 X more memory bandwidth than the leading graphics processing unit has, according to Cerebras.