Hardware

NVIDIA GeForce RTX 4070 Founders Edition & MSRP Model Review Ft. MSI, GALAX & PNY

Hassan Mujtaba • Apr 12, 2023 at 09:00 AM EDT

• Copy Shortlink

NVIDIA Ada GPU - Ada Streaming Multiprocessor, Ada GPC &; Ada GPUs Deep Dive

Let's take a trip down the journey to Ada. In 2016, NVIDIA announced their Pascal GPUs which would soon be featured in their top-to-bottom GeForce 10 series lineup. After the launch of Maxwell, NVIDIA gained a lot of experience in the efficiency department which they put a focus on since their Kepler GPUs.

Four years ago, NVIDIA, rather than offering another standard leap in the rasterization performance of its GPUs took a different approach & introduced two key technologies in its Turing line of consumer GPUs, one being AI-assisted acceleration with the Tensor Cores and the second being hardware-level acceleration for Ray Tracing with its brand new RT cores.

Then came Ampere with its brand new Samsung 8nm fabrication process, and NVIDIA added even more to its gaming graphics lineup. In the Ampere GPU architecture, NVIDIA provided its latest Ampere SM along with next-gen FP32, INT32, Tensor Cores, and RT cores. The focus was to boost both rasterization and ray tracing capabilities to new heights.

Now enter Ada, a brand new architecture that aims to take everything from the first two RTX GPUs and perfect it. The graphics architecture is designed for speed and that it excels at. So let's see the architecture in detail. Following are the few main highlights of the Ada Lovelace GPU architecture:

Revolutionary New Architecture: NVIDIA Ada architecture GPUs deliver outstanding performance for graphics, AI, and compute workloads with exceptional architectural and power efficiency. After the baseline design for the Ada SM was established, the chip was scaled up to shatter records. Manufacturing innovations and materials research enabled NVIDIA engineers to craft a GPU with 76.3 billion transistors and 18,432 CUDA Cores capable of running at clocks over 2.5 GHz while maintaining the same 450W TGP as the prior generation flagship GeForce RTX 3090 Ti GPU. The result is the world’s fastest GPU with the power, acoustics, and temperature characteristics expected of a high-end graphics card.

New Ada RT Core for Faster Ray Tracing: For decades, rendering ray-traced scenes with physically correct lighting in real-time has been considered the holy grail of graphics. At the same time, the geometric complexity of environments and objects continues to increase as 3D games and graphics continually strive to provide the most accurate representations of the real world. The Ada RT Core has been enhanced to deliver 2x faster ray-triangle intersection testing and includes two important new hardware units. An Opacity Micro map Engine speeds up ray tracing of alpha-tested geometry by a factor of 2x, and a Displaced Micro-Mesh Engine generates Displaced Micro-Triangles on-the-fly to create additional geometry. The Micro-Mesh Engine provides the benefit of increased geometric complexity without the traditional performance and storage costs of complex geometries.

Shader Execution Reordering: NVIDIA Ada GPUs support Shader Execution Reordering which dynamically organizes & reorders shading workloads to improve RT shading Introduction efficiency. This improves performance by up to 44% in Cyberpunk 2077 with Ray Tracing Overdrive Mode.

NVIDIA DLSS 3: The Ada architecture features an all-new Optical Flow Accelerator and AI frame generation that boosts DLSS 3’s frame rates up to 2x over the previous DLSS 2.0 while maintaining or exceeding native image quality. Compared to traditional brute-force graphics rendering, DLSS 3 is ultimately up to 4x faster while providing low system latency.

The NVIDIA Ada Lovelace AD104 GPU features up to 5 GPC (Graphics Processing Clusters). This is 1 less SM compared to the Ampere GA104 GPUs. Each GPU will consist of 6 TPCs and 2 SMs which is the same configuration as the existing chip. Each SM (Streaming Multiprocessor) will house four sub-cores which is also the same as the GA102 GPU. What's changed is the FP32 & the INT32 core configuration. Each sub-core will include 64 FP32 units but combined FP32+INT32 units will go up to 128. This is because half of the FP32 units don't share the same sub-core as the IN32 units. The 64 FP32 cores are separate from the 128 INT32 cores.

So in total, each sub-core will consist of 16 FP32 plus 16 INT32 units for a total of 32 units. Each SM will have a total of 64 FP32 units plus 64 INT32 units for a total of 128 units. And since there are a total of 60 SM units (12 per GPC), we are looking at a total of 7,680 cores.

Moving over to the cache, this is another segment where NVIDIA has given a big boost over the existing Ampere GPUs. The L2 cache will be increased to 48 MB. This is a 12x increase over the Ampere GA104 GPU that hosts just 4 MB of L2 cache. The cache will be shared across the GPU. The GPU will also feature up to 80 ROPs for the full-die.

There are also going to be the latest 4th Generation Tensor and 3rd Generation RT (Raytracing) cores infused on the Ada Lovelace GPUs which will help boost DLSS & Raytracing performance to the next level. The NVIDIA GeForce RTX 4070 Ti makes use of the full AD104 GPU die which means that there's no room for expansion for a future high-end GPU on the AD104 silicon. It is possible that tweaked silicon with faster clocks may appear in the future but the core configuration may not change.

NVIDIA AD104 'RTX 4070' Gaming GPU Block Diagram:

NVIDIA AD104 'Ada Lovelace' Gaming GPU 'SM' Block Diagram:

NVIDIA GeForce RTX 4070

29 TFLOPS of peak single-precision (FP32) performance
58 TFLOPS of peak half-precision (FP16) performance
466 Tensor TFLOPs with sparsity
67 RT-TFLOPs

At the heart of the NVIDIA GeForce RTX 4070 graphics card lies the Ada Lovelace AD104 GPU. The GPU measures 295.4mm2 and will utilize the TSMC 4N process node which is an optimized version of TSMC's 5nm (N5) node designed for the green team. The GPU features 35.8 Billion transistors.

Deal of the Day

NVIDIA GeForce RTX 4070 Founders Edition & MSRP Model Review Ft. MSI, GALAX & PNY

NVIDIA Ada GPU - Ada Streaming Multiprocessor, Ada GPC &; Ada GPUs Deep Dive

Contents

Deal of the Day

nproxy.org

NVIDIA GeForce RTX 4070 Founders Edition & MSRP Model Review Ft. MSI, GALAX & PNY

NVIDIA Ada GPU - Ada Streaming Multiprocessor, Ada GPC &; Ada GPUs Deep Dive

Related Story NVIDIA’s Recent Switch To The Bianca Compute Board From Cordelia For GB300 Blackwell Ultra GPUs Is Seen As A “Positive Development”

Contents

Deal of the Day

Further Reading

NVIDIA Shares Gutted By 5%+ After China Reportedly Readies Equivalent AI GPU Shipments

NVIDIA GeForce Game Ready 576.02 Driver Addresses Several RTX 50 GPU Issues Including Black screens, Crashes, Freezes & More

PNY GeForce RTX 5060 Ti Stealth OC 16 GB GPU Review – MSRP Model With Good Cooling, Near-4070 Performance, Unique PCB

NVIDIA To Take A Hit Worth Billions Of Dollars As The Trump Administration Imposes An “Indefinite” Export Licensing Requirement On The H20 GPU

nproxy.org