Nvidia says large GPGPU speed up claims were due to bad original code
Nvidia has said that most of the outlandish performance increase figures touted by GPGPU vendors was down to poor original code rather than sheer brute force computing power provided by GPUs.
Both AMD and Nvidia have been using real-world code examples and projects to promote the performance of their respective GPGPU accelerators for years, but now it seems some of the eye popping figures including speed ups of 100x or 200x were not down to just the computing power of GPGPUs. Sumit Gupta, GM of Nvidia’s Tesla business told The INQUIRER that such figures were generally down to starting with un-optimized CPU code.
During Intel’s Xeon Phi pre-launch press conference call, the firm cast doubt on some of the orders of magnitude speed up claims that had been bandied about for years. Now Gupta told The INQUIRER that while those large speed ups did happen, it was possible because of poorly optimized code to begin with, thus the bar was set very low.
Gupta said, “Most of the time when you saw the 100x, 200x and larger numbers those came from universities. Nvidia may have taken university work and shown it and it has an 100x on it, but really most of those gains came from academic work. Typically we find when you investigate why someone got 100x [speed up] is because they didn’t have good CPU code to begin with. When you investigate why they didn’t have good CPU code you find that typically they are domain scientists not computer science guys – biologists, chemists, physics – and they wrote some C code and it wasn’t good on the CPU. It turns out most of those people find it easier to code in CUDA C or CUDA Fortran than they do to use MPI or Pthreads to go to multi-core CPUs, so CUDA programming for a GPU is easier than multi-core CPU programming.”
According to Gupta, those users that have optimised their code to squeeze most of the performance out of the CPU can get somewhat more sedate performance gains. “Most people we find who have optimised CPU code, and really you’ll only find optimised CPU code in the HPC world, get between 5x to 10x speed up, that’s the average speed up that people get. In some cases it’s even less, we’ve seen people getting speed ups of 2X but they are delighted with 2x because there is no way for them to get a sustainable 2X speed up from where they are today,” said Gupta.
Gupta’s comments about code optimisation is something that will resonate with many researchers who work on tight paper deadlines intertwined with writing funding proposals where it is far more prudent to spend time working on the theory, modelling a solution and evaluating the results rather than spending time optimising code when these days computing is far easier to come by. Nevertheless Gupta said that when it comes to picking whether to optimise code for CPU or GPGPU, researchers do a simple cost analysis.
Gupta said, “Even if you assume it is the same effort to do multi-core CPU versus GPU, lets say multi-core CPU gives you 10x speedup but CUDA gives you 100X over where you are today, you’ll obviously go for the bigger speed-up and work on that platform first, and that’s why you end up with these guys getting these phenomenal speed ups. It’s only because they have really bad original CPU code and don’t have either the interest or energy or time to make it into good CPU code.”
While Gupta’s candor over the source of the speed-ups touted by Nvidia and AMD in the past is refreshing and should bring GPGPU accelerators back down to Earth for some people, it should be noted that 2x speed ups are still very desirable for many researchers. However it seems that as always with benchmarks it is good to apply a healthy dose of skepticism.
via The Inquirer