NVIDIA H100 80 GB PCIe Accelerator With Hopper GPU Is Priced Over $30,000 US In Japan

Hassan Mujtaba Comments

NVIDIA's recently announced H100 80 GB PCIe accelerator based on the Hopper GPU architecture has been listed for sale in Japan. This is the second accelerator that has been listed along with its price in the Japanese market with the first one being the AMD MI210 PCIe which was also listed just a few days back.

NVIDIA H100 80 GB PCIe Accelerator With Hopper GPU Gets Listed In Japan For An Insane Price Exceeding $30,000 US

Unlike the H100 SXM5 configuration, the H100 PCIe offers cut-down specifications, featuring 114 SMs enabled out of the full 144 SMs of the GH100 GPU and 132 SMs on the H100 SXM. The chip as such offers 3200 FP8, 1600 TF16, 800 FP32, and 48 TFLOPs of FP64 compute horsepower. It also features 456 Tensor & Texture Units.

Related Story Chinese AI Lab DeepSeek Has 50,000 NVIDIA H100 AI GPUs, Says AI CEO

Due to its lower peak compute horsepower, the H100 PCIe should operate at lower clocks and as such, features a TDP of 350W versus the double 700W TDP of the SXM5 variant. But the PCIe card will retain its 80 GB memory featured across a 5120-bit bus interface but in HBM2e variation (>2 TB/s bandwidth).

According to gdm-or-jp, a Japanese distribution company, gdep-co-jp, has listed the NVIDIA H100 80 GB PCIe accelerator with a price of ¥4,313,000 ($33,120 US) and a total cost of ¥4,745,950 including sales tax which converts to $36,445 US. The accelerator is expected to ship in the second half of 2022 and will come in the standard dual-slot passively cooled variant. It is also stated that the distributor will provide NVLINK bridges free of cost to those who purchase multiple cards but might ship at a later date.

Now compared to the AMD Instinct MI210 which costs around $16,500 US in the same market, the NVIDIA H100 is more than double the cost. The NVIDIA offering does boast some really high GPU performance figures versus the AMD HPC accelerator at 50W more. The non-tensor FP32 TFLOPs for the H100 are rated at 48 TFLOPs while the MI210 has a peak rated FP32 compute power of 45.3 TFLOPs. With Sparsity and Tensor operations, the H100 can output up to 800 TFLOPs of FP32 horse power. The H100 also rocks higher 80 GB memory capacities versus the 64 GB on the MI210. From the looks of it, NVIDIA is charging the premium for its higher AI/ML capabilities.

NVIDIA HPC / AI GPUs

NVIDIA Tesla Graphics CardNVIDIA B200NVIDIA H200 (SXM5)NVIDIA H100 (SMX5)NVIDIA H100 (PCIe)NVIDIA A100 (SXM4)NVIDIA A100 (PCIe4)Tesla V100S (PCIe)Tesla V100 (SXM2)Tesla P100 (SXM2)Tesla P100
(PCI-Express)
Tesla M40
(PCI-Express)
Tesla K40
(PCI-Express)
GPUB200H200 (Hopper)H100 (Hopper)H100 (Hopper)A100 (Ampere)A100 (Ampere)GV100 (Volta)GV100 (Volta)GP100 (Pascal)GP100 (Pascal)GM200 (Maxwell)GK110 (Kepler)
Process Node4nm4nm4nm4nm7nm7nm12nm12nm16nm16nm28nm28nm
Transistors208 Billion80 Billion80 Billion80 Billion54.2 Billion54.2 Billion21.1 Billion21.1 Billion15.3 Billion15.3 Billion8 Billion7.1 Billion
GPU Die SizeTBD814mm2814mm2814mm2826mm2826mm2815mm2815mm2610 mm2610 mm2601 mm2551 mm2
SMs160132132114108108808056562415
TPCs806666575454404028282415
L2 Cache SizeTBD51200 KB51200 KB51200 KB40960 KB40960 KB6144 KB6144 KB4096 KB4096 KB3072 KB1536 KB
FP32 CUDA Cores Per SMTBD128128128646464646464128192
FP64 CUDA Cores / SMTBD128128128323232323232464
FP32 CUDA CoresTBD16896168961459269126912512051203584358430722880
FP64 CUDA CoresTBD16896168961459234563456256025601792179296960
Tensor CoresTBD528528456432432640640N/AN/AN/AN/A
Texture UnitsTBD528528456432432320320224224192240
Boost ClockTBD~1850 MHz~1850 MHz~1650 MHz1410 MHz1410 MHz1601 MHz1530 MHz1480 MHz1329MHz1114 MHz875 MHz
TOPs (DNN/AI)20,000 TOPs3958 TOPs3958 TOPs3200 TOPs2496 TOPs2496 TOPs130 TOPs125 TOPsN/AN/AN/AN/A
FP16 Compute10,000 TFLOPs1979 TFLOPs1979 TFLOPs1600 TFLOPs624 TFLOPs624 TFLOPs32.8 TFLOPs30.4 TFLOPs21.2 TFLOPs18.7 TFLOPsN/AN/A
FP32 Compute90 TFLOPs67 TFLOPs67 TFLOPs800 TFLOPs156 TFLOPs
(19.5 TFLOPs standard)
156 TFLOPs
(19.5 TFLOPs standard)
16.4 TFLOPs15.7 TFLOPs10.6 TFLOPs10.0 TFLOPs6.8 TFLOPs5.04 TFLOPs
FP64 Compute45 TFLOPs34 TFLOPs34 TFLOPs48 TFLOPs19.5 TFLOPs
(9.7 TFLOPs standard)
19.5 TFLOPs
(9.7 TFLOPs standard)
8.2 TFLOPs7.80 TFLOPs5.30 TFLOPs4.7 TFLOPs0.2 TFLOPs1.68 TFLOPs
Memory Interface8192-bit HBM45120-bit HBM3e5120-bit HBM35120-bit HBM2e6144-bit HBM2e6144-bit HBM2e4096-bit HBM24096-bit HBM24096-bit HBM24096-bit HBM2384-bit GDDR5384-bit GDDR5
Memory SizeUp To 192 GB HBM3 @ 8.0 GbpsUp To 141 GB HBM3e @ 6.5 GbpsUp To 80 GB HBM3 @ 5.2 GbpsUp To 94 GB HBM2e @ 5.1 GbpsUp To 40 GB HBM2 @ 1.6 TB/s
Up To 80 GB HBM2 @ 1.6 TB/s
Up To 40 GB HBM2 @ 1.6 TB/s
Up To 80 GB HBM2 @ 2.0 TB/s
16 GB HBM2 @ 1134 GB/s16 GB HBM2 @ 900 GB/s16 GB HBM2 @ 732 GB/s16 GB HBM2 @ 732 GB/s
12 GB HBM2 @ 549 GB/s
24 GB GDDR5 @ 288 GB/s12 GB GDDR5 @ 288 GB/s
TDP700W700W700W350W400W250W250W300W300W250W250W235W