Thread Rating:
  • 1 Vote(s) - 5 Average
  • 1
  • 2
  • 3
  • 4
  • 5
New Algorithm Makes CPUs 15 Times Faster Than GPUs in Some AI Work
#1
Information 
Quote:
[Image: o6R5GnN6o45NnzVBuhB2DJ-1024-80.jpg.webp]

CPUs can beat GPUs in some AI workloads

GPUs are known for being significantly better than most CPUs when it comes to AI deep neural networks (DNNs) training simply because they have more execution units (or cores). But a new algorithm proposed by computer scientists from Rice University is claimed to actually flip the tables and make CPUs a whopping 15 times faster than some leading-edge GPUs. 

The most complex compute challenges are usually solved using brute force methods, like either throwing more hardware at them or inventing special-purpose hardware that can solve the task. DNN training is without any doubt among the most compute-intensive workloads nowadays, so if programmers want maximum training performance, they use GPUs for their workloads. This happens to a large degree because it is easier to achieve high performance using compute GPUs as most algorithms are based on matrix multiplications.  

Anshumali Shrivastava, an assistant professor of computer science at Rice's Brown School of Engineering, and his colleagues have presented an algorithm that can greatly speed up DNN training on modern AVX512 and AVX512_BF16-enabled CPUs. 

"Companies are spending millions of dollars a week just to train and fine-tune their AI workloads," said Shrivastava in a conversation with TechXplore. "The whole industry is fixated on one kind of improvement — faster matrix multiplications. Everyone is looking at specialized hardware and architectures to push matrix multiplication. People are now even talking about having specialized hardware-software stacks for specific kinds of deep learning. Instead of taking a [computationally] expensive algorithm and throwing the whole world of system optimization at it, I'm saying, 'Let's revisit the algorithm.'" 

To prove their point, the scientists took SLIDE (Sub-LInear Deep Learning Engine), a  C++ OpenMP-based engine that combines smart hashing randomized algorithms with modest multi-core parallelism on CPU, and optimized it heavily for Intel's AVX512 and AVX512-bfloat16-supporting processors. 

The engine employs Locality Sensitive Hashing (LSH) to identify neurons during each update adaptively, which optimizes compute performance requirements. Even without modifications, it can be faster in training a 200-million-parameter neural network, in terms of wall clock time, than the optimized TensorFlow implementation on an Nvidia V100 GPU, according to the paper. 

"Hash table-based acceleration already outperforms GPU, but CPUs are also evolving," said study co-author Shabnam Daghaghi. 

To make hashing faster, the researchers vectorized and quantized the algorithm so that it could be better handled by Intel's AVX512 and AVX512_BF16 engines. They also implemented some memory optimizations.  

"We leveraged [AVX512 and AVX512_BF16] CPU innovations to take SLIDE even further, showing that if you aren't fixated on matrix multiplications, you can leverage the power in modern CPUs and train AI models four to 15 times faster than the best specialized hardware alternative."

The results they obtained with Amazon-670K, WikiLSHTC-325K, and Text8 datasets are indeed very promising with the optimized SLIDE engine. Intel's Cooper Lake (CPX) processor can outperform Nvidia's Tesla V100 by about 7.8 times with Amazon-670K, by approximately 5.2 times with WikiLSHTC-325K, and by roughly 15.5 times with Text8. In fact, even an optimized Cascade Lake (CLX) processor can be 2.55–11.6 times faster than Nvidia's Tesla V100.
...
Continue Reading
[-] The following 1 user says Thank You to harlan4096 for this post:
  • silversurfer
Reply


Forum Jump:


Users browsing this thread: 1 Guest(s)
[-]
Welcome
You have to register before you can post on our site.

Username/Email:


Password:





[-]
Recent Posts
Website X5 Go 2024.1
Website X5 Go 2024.1...zevish — 09:32
Apple's rules to allow third-party app ...
Apple has announ...alison30 — 09:28
Intel: Microsoft AI PCs need a Copilot K...
Microsoft hopes th...harlan4096 — 08:55
Synchredible 8 Professional Edition v8.2...
          Synchredib...zevish — 08:54
GFYI [Official] EaseUS Data Recovery Wi...
I utilize EaseUS Par...zevish — 08:10

[-]
Birthdays
Today's Birthdays
No birthdays today.
Upcoming Birthdays
No upcoming birthdays.

[-]
Online Staff
harlan4096's profile harlan4096
Administrator

>