NVIDIA Pascal GP100 GPU Benchmarks Unveiled – Tesla P100 Is The Fastest Graphics Card Ever Created For Hyperscale Computing

Hassan Mujtaba • Jun 2, 2016 at 03:40pm EDT

The first benchmark results of the NVIDIA GP100 GPU accelerator have been revealed (via Exxact Corp). Featured on the Tesla P100 graphics board, the GP100 GPU is directed at hyperscale servers and high-performance computing (HPC) in general. The Tesla P100 is already shipping to NVIDIA's priority customers which includes super computing companies and one of the organization decided to shows us the first performance numbers of the GP100 GPU. (Source: PCGamesHardware via Videocardz)

At GTC 2016, NVIDIA announced the Tesla P100, their most advanced hyperscale GPU to date.

NVIDIA Tesla P100 Accelerator Benched in HPC Workloads - GP100 GPU First Tests Unveiled

The benchmarks we will be looking at are from a tool known as AMBER which stands for Assisted Model Building with Energy Refinement. This tool was co-developed by Ross Walker from San Diego Supercomputer Center and Scott Le Grand from Amazon Web Services. Amber has two uses, it simulates how force fields are used to affect biomolecules. It also contains package of molecular simulation programs such as source codes and demos.

"Amber" refers to two things: a set of molecular mechanical force fields for the simulation of biomolecules (which are in the public domain, and are used in a variety of simulation programs); and a package of molecular simulation programswhich includes source code and demos. Amber is distributed in two parts: AmberTools16 and Amber16. via Ambermd.org

All of these benchmarks are part of HPC simulations and have nothing do with general performance in gaming applications. It provides us an overview of how well the GP100 GPU performs in such tasks against a range of other NVIDIA GPUs such as GP104, GM200 and GK110. The following configuration was used in the benchmark run:

Exxact AMBER Certified 2U GPU Workstation:

CPU = Dual x 8 Core Intel E5-2650v3 (2.3GHz), 64 GB DDR4 Ram
(note the cheaper 6 Core E5-2620v3 and v4 CPUs would also give the same performance for GPU runs)
MPICH v3.1.4 - GNU v4.8.5 - Centos 7.2
CUDA Toolkit NVCC v7.5 (8.0RC1 for GTX-1080 and P100)
NVIDIA Driver Linux 64 - 361.43
Precision Model = SPFP (GPU), Double Precision (CPU)

Now there's a few things to note before we look at the benchmarks. The tests were conducted on SPFP (GPU) precision model. This means that all GPUs used their single precision throughput to conduct these benchmarks while the CPUs were ran in double precision (FP64) model. It should also be mentioned that these tests were conducted at the time when both Tesla P100 and GTX 1080 were not publicly launched. So the question arises, how did Amber managed to get these cards before their announcement?

Amber works really close to NVIDIA and since the program was developed and written with NVIDIA's help to accelerate research based simulations, Amber managed to get first hand access on these cards. This however means that both cards are in engineering phase and should not be compared to the final retail versions whose performance should be better optimized. The NVIDIA Pascal GPUs also ran a pre-release version of CUDA 8.0.

At the time of writing GTX-1080 and P100 (DGX-1) cards had not been publically released. The benchmarks here are from pre-release hardware. As such they represent a bottom end to the performance. It is hoped that with access to released hardware that optimization of AMBER 16 specific to the Pascal architecture will be possible resulting in improved performance. (Pascal hardware benchmarks made use of a pre-release version of CUDA 8.0). So without further a do, let's take a look at the benchmarks:

NVIDIA Tesla P100 GP100 GPU Benchmarks:

In the benchmarks provided below, we can see that a single Tesla P100 is giving enough throughput to out perform a quad Titan X configuration. We also note that in some cases, the GeForce GTX 1080 is around as fast as the GP100 GPU which is due to the fact that GP104 is also a 9.3 TFLOPs graphics chip which is close to the 10 (10.6) TFLOPs output of the Tesla P100 accelerator. That changes when multiple boards are used. Tesla P100 is fastest without a doubt but with proper implementation of NVLINK in the final models which are now shipping to customers, we can see even bigger gains.

The NVIDIA DGX-1 is a supercomputing rack capable of delivering up to 170 TFLOPs of compute performance.

The NVIDIA DGX-1 system uses up to 8 Tesla P100 boards and costs $129,000 US. The system includes the following specifications:

Up to 170 teraflops of half-precision (FP16) peak performance
Eight Tesla P100 GPU accelerators, 16GB memory per GPU
NVLink Hybrid Cube Mesh
7TB SSD DL Cache
Dual 10GbE, Quad InfiniBand 100Gb networking
3U – 3200W

NVIDIA Pascal GP100 With Tesla P100 Graphics Board Benchmarks (Image Credits: Ambermd)

The following tests are too small to effectively scale to multiple modern GPUs and since we are looking at pre-release hardware, NVLINK isn't fine tuned to make use of all Tesla P100 hardware (Up To 4 in the benchmarks provided below).

NVIDIA Pascal GP100 With Tesla P100 Graphics Board Benchmarks (Image Credits: Ambermd)

For those expecting gaming benchmarks, we already made it clear that these results have nothing to do with general application performance. These workloads are specific to the HPC sector and that's what the GP100 GPU has been designed to handle. We have heard rumors that NVIDIA is preparing a more cost effective 16 FinFET based GP102 GPU which might launched later this year as a flagship Titan product with similar specs as the Tesla P100. We don't have any confirmation but we will update you as more news comes our way.

NVIDIA Volta Tesla V100S Specs:

NVIDIA Tesla Graphics Card	Tesla K40 (PCI-Express)	Tesla M40 (PCI-Express)	Tesla P100 (PCI-Express)	Tesla P100 (SXM2)	Tesla V100 (PCI-Express)	Tesla V100 (SXM2)	Tesla V100S (PCIe)
GPU	GK110 (Kepler)	GM200 (Maxwell)	GP100 (Pascal)	GP100 (Pascal)	GV100 (Volta)	GV100 (Volta)	GV100 (Volta)
Process Node	28nm	28nm	16nm	16nm	12nm	12nm	12nm
Transistors	7.1 Billion	8 Billion	15.3 Billion	15.3 Billion	21.1 Billion	21.1 Billion	21.1 Billion
GPU Die Size	551 mm2	601 mm2	610 mm2	610 mm2	815mm2	815mm2	815mm2
SMs	15	24	56	56	80	80	80
TPCs	15	24	28	28	40	40	40
CUDA Cores Per SM	192	128	64	64	64	64	64
CUDA Cores (Total)	2880	3072	3584	3584	5120	5120	5120
Texture Units	240	192	224	224	320	320	320
FP64 CUDA Cores / SM	64	4	32	32	32	32	32
FP64 CUDA Cores / GPU	960	96	1792	1792	2560	2560	2560
Base Clock	745 MHz	948 MHz	1190 MHz	1328 MHz	1230 MHz	1297 MHz	TBD
Boost Clock	875 MHz	1114 MHz	1329MHz	1480 MHz	1380 MHz	1530 MHz	1601 MHz
FP16 Compute	N/A	N/A	18.7 TFLOPs	21.2 TFLOPs	28.0 TFLOPs	30.4 TFLOPs	32.8 TFLOPs
FP32 Compute	5.04 TFLOPs	6.8 TFLOPs	10.0 TFLOPs	10.6 TFLOPs	14.0 TFLOPs	15.7 TFLOPs	16.4 TFLOPs
FP64 Compute	1.68 TFLOPs	0.2 TFLOPs	4.7 TFLOPs	5.30 TFLOPs	7.0 TFLOPs	7.80 TFLOPs	8.2 TFLOPs
Memory Interface	384-bit GDDR5	384-bit GDDR5	4096-bit HBM2	4096-bit HBM2	4096-bit HBM2	4096-bit HBM2	4096-bit HBM
Memory Size	12 GB GDDR5 @ 288 GB/s	24 GB GDDR5 @ 288 GB/s	16 GB HBM2 @ 732 GB/s 12 GB HBM2 @ 549 GB/s	16 GB HBM2 @ 732 GB/s	16 GB HBM2 @ 900 GB/s	16 GB HBM2 @ 900 GB/s	16 GB HBM2 @ 1134 GB/s
L2 Cache Size	1536 KB	3072 KB	4096 KB	4096 KB	6144 KB	6144 KB	6144 KB
TDP	235W	250W	250W	300W	250W	300W	250W

Deal of the Day

NVIDIA Pascal GP100 GPU Benchmarks Unveiled – Tesla P100 Is The Fastest Graphics Card Ever Created For Hyperscale Computing

NVIDIA Tesla P100 Accelerator Benched in HPC Workloads - GP100 GPU First Tests Unveiled

Exxact AMBER Certified 2U GPU Workstation:

NVIDIA Tesla P100 GP100 GPU Benchmarks:

NVIDIA Volta Tesla V100S Specs:

Deal of the Day

Trending Stories

Two Microsoft Employees Disrupted The Company’s 50th Anniversary Event, Openly Questioning The Ethics Behind Its Deployment Of AI Technology

AMD Radeon RX 9070 Gets BIOS Modded, Faster Than The RX 9070 XT Reference GPU Thanks To Higher Clocks & TGP

Bill Ackman, A Billionaire Who Supports Trump, Alleges That The US Commerce Secretary Howard Lutnick Is “Long Bonds” And “Profits When Our Economy Implodes”

Samsung Is Rumored To Be Running Tests Where Its S Pen Is Kept Separate From Its Future Flagships To Make More Room For The Battery, Galaxy S26 Ultra To Retain The Accessory

Apple Moving iPhone Production To The U.S. Is Not Happening In Any Universe, Claims New Update, Switching Supply Chains Would Astronomically Increase Labor Costs

Popular Discussions

AMD Ryzen 9 9950X3D Explodes On ASRock X870 Pro RS: CPU Died While Gaming

ASUS ROG Astral GeForce RTX 5080 Will Now Feature a Higher 450W TGP, Change Pushed Out Through a BIOS Update

NVIDIA GeForce RTX 5060 Ti Reportedly Costs The Same As 4060 Ti, 16 GB For $499 & 8 GB For $399

NVIDIA’s GeForce RTX 5060 Ti Emerges At FurMark Database, Indicating Launch To Likely Occur By Mid-April

AMD Gained 16.6% CPU Market Share While Intel Lost 10% As Per Early 2025 CPU-Z Statistics

nproxy.org

NVIDIA Pascal GP100 GPU Benchmarks Unveiled – Tesla P100 Is The Fastest Graphics Card Ever Created For Hyperscale Computing

NVIDIA Tesla P100 Accelerator Benched in HPC Workloads - GP100 GPU First Tests Unveiled

Exxact AMBER Certified 2U GPU Workstation:

NVIDIA Tesla P100 GP100 GPU Benchmarks:

NVIDIA Volta Tesla V100S Specs:

Deal of the Day

Further Reading

Trending Stories

Popular Discussions

nproxy.org