Nvidia’s Ada architecture and GeForce RTX 40-series graphics cards are slated to begin arriving on October 12, starting with the GeForce RTX 4090 and RTX 4080. That’s two years after the Nvidia Ampere architecture and basically right on schedule given the slowing down (or if you prefer, death) of Moore’s ‘Law,’ and it’s good news as the best graphics cards are in need of some new competition.
With the Nvidia hack earlier this year, we had a good amount of information on what to expect, and Nvidia has now confirmed most of the details on the first RTX 40-series cards. We’ve collected everything into this central hub detailing everything we know and expect from Nvidia’s Ada architecture and the RTX 40-series family.
There are still plenty of rumors swirling around, but we now have a much better idea of what to expect from the Ada Lovelace architecture. Nvidia detailed its data center Hopper H100 GPU, and much like with the Volta V100 and Ampere A100, the consumer products will have rather different configurations.
We know when the RTX 4090 will launch. If Nvidia follows a similar release schedule as in the past, we can expect the rest of the RTX 40-series to trickle out over the next year. RTX 4080 16GB and 12GB models will probably arrive in November, or perhaps late October, RTX 4070 will arrive in early 2023, and RTX 4060 and 4050 will come later next year. Let’s start with the high level overview of the specs and rumored specs for the Ada series of GPUs.
|Graphics Card||RTX 4090||RTX 4080 16GB||RTX 4080 12GB||RTX 4070||RTX 4060||RTX 4050|
|Process Technology||TSMC 4N||TSMC 4N||TSMC 4N||TSMC 4N||TSMC 4N||TSMC 4N|
|Die size (mm^2)||629?||380?||300?||300?||225?||175?|
|SMs / CUs / Xe-Cores||128||76||60||48?||32?||24?|
|GPU Cores (Shaders)||16384||9728||7680||6144?||4096?||3072?|
|Ray Tracing “Cores”||128||76||60||48?||32?||24?|
|Boost Clock (MHz)||2520||2510||2610||2600?||2600?||2600?|
|VRAM Speed (Gbps)||21||23||21||18?||18?||18?|
|VRAM Bus Width||384||256||192||160?||128?||64?|
|TFLOPS FP32 (Boost)||82.6||48.8||40.1||31.9?||21.3?||16.0?|
|TFLOPS FP16 (FP8)||661 (1321)||391 (781)||321 (641)||256 (511)?||170 (341)?||128 (256)?|
|Launch Date||Oct 2022||Nov 2022?||Nov 2022?||Jan 2023?||Apr 2023?||Aug 2023?|
First off, the first three cards are now official and the specs are reasonably accurate. There are a few remaining question marks, like the exact ROPs numbers and VRAM clocks, but they shouldn’t be too far off. The last three cards require some generous helpings of salt, as they’re more speculation than anything concrete.
We do know that Nvidia is hitting clock speeds of 2.5–2.6 GHz on the 4090 and 4080, and we expect similar clocks on the other GPUs in the RTX 40-series. We’ve put in tentative clock speed estimates of 2.6 GHz for now. Nvidia hasn’t specified precisely which GPUs are used on the various cards, or exact die sizes or transistor counts (except for “76 billion” on the RTX 4090).
Nvidia will most likely use TSMC’s 4N process — “4nm Nvidia” — on all of the Ada GPUs, and definitely on the RTX 4090 and 4080 cards. Hopper H100 also uses TSMC’s 4N node, which mostly appears to be a tweaked variation on TSMC’s N5 node that’s been widely used in other chips and which will also be used AMD’s Zen 4 and RDNA 3. We don’t think Samsung will have a compelling alternative that wouldn’t require a serious redesign of the core architecture, so the whole family will likely be on the same node.
Nvidia will be “going big” with the AD102 GPU, and it’s closer in size and transistor counts to the H100 than GA102 was to GA100. Based on available information and a few remaining rumors, Ada Lovelace looks to be a monster. It will pack in far more SMs and the associated cores than the current Ampere GPUs, it will have much higher GPU clocks, and it will also contain a number of architectural enhancements to further boost performance. Nvidia claims that the RTX 4090 is 2x–4x faster than the outgoing RTX 3090 Ti, though caveats apply to those benchmarks.
The preview performance from Nvidia is primarily at 4K ultra, which is something to keep in mind. If you’re currently running a more modest processor rather than one of the absolute best CPUs for gaming, meaning the Core i9-12900K or Ryzen 7 5800X3D, you could very well end up CPU limited even at 1440p ultra. A larger system upgrade will likely be necessary to get the most out of the fastest Ada GPUs.
Ada Will Massively Boost Compute Performance
With the high-level overview out of the way, let’s get into the specifics. The most noticeable change with Ada GPUs will be the number of SMs compared to the current Ampere generation. At the top, AD102 potentially packs 71% more SMs than the GA102. Even if nothing else were to significantly change in the architecture, we would expect that to deliver a huge increase in performance.
That will apply not just to graphics but to other elements as well. It doesn’t seem like most of the calculations have changed from Ampere, though the Tensor cores now support FP8 (with sparsity still) to potentially double the FP16 performance. The RTX 4090 has deep learning/AI compute of up to 661 teraflops in FP16, and 1,321 teraflops of FP8 — and a fully enabled AD102 chip could hit 1.4 petaflops at similar clocks.
The full GA102 in the RTX 3090 Ti by comparison tops out at around 321 TFLOPS FP16 (again, using Nvidia’s sparsity feature). That means RTX 4090 delivers a theoretical 107% increase, based on core counts and clock speeds. The same theoretical boost in performance should apply to shader and ray tracing hardware as well, except those are also changing.
The GPU shader cores will have a new Shader Execution Reordering (SER) feature that Nvidia claims will improve general performance by 25%, and can improve ray tracing operations by up to 200%.
The RT cores meanwhile have doubled down on ray/triangle intersection hardware, plus they have a couple more new tricks available. The Opacity Micromap (OMM) Engine enables significantly faster ray tracing for transparent surfaces like foliage, particles, and fences. The Displaced Micro-Mesh (DMM) Engine on the other hand optimizes the generation of the Bounding Volume Hierarchy (BVH) structure, and Nvidia claims it can create the BVH up to 10x faster while using 20x less (5%) memory for BVH storage.
Together, these architectural enhancements should enable Ada Lovelace GPUs to offer a massive generational leap in performance.
Ada Lovelace ROPs
We’ve put question marks after the ROPs counts (render outputs) on all of the Ada GPUs, as we don’t know for certain how they’re configured on most of the GPUs. With Ampere, Nvidia tied the ROPs to the GPCs, the Graphics Processing Clusters, but some of these could still be disabled.
The AD102 has up to 144 SMs, and we now know that it uses 12 GPCs of 12 SMs each. That yields 192 ROPs as the maximum, though the final number on the RTX 4090 might be lower (at least 176, though). We don’t have concrete details on the remaining GPUs, unfortunately.
It’s a safe bet that AD103 used in the RTX 4080 16GB will have seven GPCs of 12 SMs, just like GA102. That gives it up to 112 ROPs. AD104 in the RTX 4080 12GB on the other hand seems likely to use five GPCs of 12 SMs, with a maximum of 80 ROPs. Nvidia might have changed the ROPs per GPC ratio, however.
For the time being, the remaining three cards should be taken as a best guess. We don’t know for certain what GPUs will be used, and there may be other models (i.e., RTX 4060 Ti) interspersed between cards. We’ll fill in the blanks as more information becomes available in the coming months, once the other Ada GPUs are closer to launching.
Memory Subsystem: GDDR6X Rides Again
Recently, Micron announced it has roadmaps for GDDR6X memory running at speeds of up to 24Gbps. The latest RTX 3090 Ti only uses 21Gbps memory, and Nvidia is currently the only company using GDDR6X for anything. That immediately raises the question of what will be using 24Gbps GDDR6X, and the only reasonable answer seems to be Nvidia Ada. The lower-tier GPUs are more likely to stick with standard GDDR6 rather than GDDR6X as well, which tops out at 18Gbps.
This represents a bit of a problem, as GPUs generally need compute and bandwidth to scale proportionally to realize the promised amount of performance. The RTX 3090 Ti for example has 12% more compute than the 3090, and the higher clocked memory provides 8% more bandwidth. Based on the compute details shown above, there’s a huge disconnect brewing. The RTX 4090 has around twice as much compute as the RTX 3090 Ti, but it may not offer more than 14% more bandwidth.
There’s far more room for bandwidth to grow on the lower tier GPUs, assuming GDDR6X power consumption can be kept in check. The current RTX 3050 through RTX 3070 all use standard GDDR6 memory, clocked at 14–15Gbps. We already know GDDR6 running at 18Gbps is available, so a hypothetical RTX 4050 with 18Gbps GDDR6 ought to easily keep up with the increase in GPU computational power. If Nvidia still needs more bandwidth, it could tap GDDR6X for the lower tier GPUs as well.
Since we know the core specs for the RTX 4090, we can only conclude that Nvidia won’t need massive increases in pure memory bandwidth, because instead it will rework the architecture, similar to what we saw AMD do with RDNA 2 compared to the original RDNA architecture.
Ada Looks to Cash in on L2 Cache
One great way of reducing the need for more raw memory bandwidth is something that has been known and used for decades. Slap more cache on a chip and you get more cache hits, and every cache hit means the GPU doesn’t need to pull data from the GDDR6/GDDR6X memory. AMD’s Infinity Cache allowed the RDNA 2 chips to basically do more with less raw bandwidth, and leaked Nvidia Ada L2 cache information suggests Nvidia will take a somewhat similar approach.
AMD uses a massive L3 cache of up to 128MB on the Navi 21 GPU, with 96MB on Navi 22, 32MB on Navi 23, and just 16MB on Navi 24. Surprisingly, even the smaller 16MB cache does wonders for the memory subsystem. We didn’t think the Radeon RX 6500 XT was a great card overall, but it basically keeps up with cards that have almost twice the memory bandwidth.
The Ada architecture appears to pair an 8MB L2 cache with each 32-bit memory controller. That means the cards with a 128-bit memory interface will get 32MB of total L2 cache, and the 384-bit interface RTX 4090 at the top of the stack will have 96MB of L2 cache. While that’s less than AMD’s Infinity Cache in some cases, we don’t know latencies or other aspects of the design yet. L2 cache tends to have lower latencies than L3 cache, so a slightly smaller L2 could definitely keep up with a larger but slower L3 cache.
If we look at AMD’s RX 6700 XT as an example, it has about 35% more compute than the previous generation RX 5700 XT. Performance in our GPU benchmarks hierarchy meanwhile is about 32% higher at 1440p ultra, so performance overall scaled pretty much in line with compute. Except, the 6700 XT has a 192-bit interface and only 384 GB/s of bandwidth, 14% lower than the RX 5700 XT’s 448 GB/s. That means the big Infinity Cache gave AMD a 50% boost to effective bandwidth.
Assuming Nvidia can get similar results with Ada, and that appears to be the case, even without wider memory interfaces the Ada GPUs should still have plenty of effective bandwidth. It’s also worth mentioning that Nvidia’s memory compression techniques in past architectures have proven capable.
RTX 40-Series Gets DLSS 3
Image 1 of 3
One of the big announcements with the RTX 4090 and 4080 is that DLSS 3 is coming… and it will only work with RTX 40-series graphics cards. Where DLSS 1 and DLSS 2 work on both RTX 20- and 30-series cards, and will also work on Ada GPUs, DLSS 3 fundamentally changes some things in the algorithm and will require the new architectural updates.
Inputs to the DLSS 3 algorithm are mostly the same as before, but now there’s a new Optical Flow Accelerator (OFA), which appears to take the prior frames and generate additional motion vectors that can then feed into the Optical Multi Frame Generation unit. This all sounds a bit like asynchronous time warp form the VR days, except now it’s being used with upscaling to generate two (or more?) frames from a single source frame.
We’ll have to see how it looks in action, but this does provide for some tantalizing performance boosts. Double your framerate? Maybe not quite that much, due to the additional computational work being done, but Nvidia did show slides depicting 63 fps with DLSS 2 and 101 FPS with DLSS 3, a 73% improvement in performance.
We’re not sure if DLSS 3 will require RTX 40-series cards to run at all, or if it will have a fallback mode for developers where it only does DLSS 2 type upscaling on previous generation RTX cards. If it only supports RTX 40-series, that would mean game developers would need to have a separate DLSS 2 implementation, and at that point maybe just add AMD FSR 2.0 and Intel XeSS for good measure.
Ada Gets AV1 Encoding, Times Two
Nvidia announced that the GeForce RTX 4090 and GeForce RTX 4080 graphics cards will feature two of its eighth-generation Nvidia Encoder (NVENC) hardware units. These will also have support for AV1 encoding, similar to Intel Arc — except there are two instead of just one.
AV1 encoding improves efficiency by 40% according to Nvidia. That means any livestreams that support the codec would look as if they had a 40% higher bitrate than the current H.264 streams. Of course, the streaming service will need to support AV1 for this to matter.
Video editors can also benefit from the dual encoders, which can double encoding performance. Nvidia is working with DaVinci Resolve, Voukoder, and Jianying to enable support, and it’s expected to arrive in October.
GeForce Experience and ShadowPlay will also use the new hardware, allowing gamers to capture gameplay at up to 8K and 60 fps in HDR. Perfect for the 0.01% of people that can view native 8K content! (If you build it, they will come…)
Ada Power Consumption
Early reports of 600W and higher TBPs (Total Board Power) for Ada appear to be mostly unfounded, at least on the announced Founders Edition models. The RTX 4090 has the same 450W TBP as the outgoing RTX 3090 Ti, while the RTX 4080 16GB drops that to just 320W and the RTX 4080 12GB has a 285W TBP. Those are for the reference Founders Edition models, however.
As we’ve seen with RTX 3090 Ti and other Ampere GPUs, some AIB (add-in board) partners are more than happy to have substantially higher power draw in pursuit of every last ounce of performance. RTX 4090 custom cards that draw up to 600W certainly aren’t out of the question, and a future RTX 4090 Ti could push that even higher.
It all goes back to the end of Dennard scaling, right along with the death of Moore’s Law. Put simply, Dennard scaling — also called MOSFET scaling — observed that with every generation, dimensions could be scaled down by about 30%. That reduced overall area by 50% (scaling in both length and width), voltage dropped a similar 30%, and circuit delays would decrease by 30% as well. Furthermore, frequencies would increase by around 40% and total power consumption would decrease by 50%.
If that all sounds too good to be true, it’s because Dennard scaling effectively ended around 2007. Like Moore’s Law, it didn’t totally fail, but the gains became far less pronounced. Clock speeds in integrated circuits have only increased from a maximum of around 3.7GHz in 2004 with the Pentium 4 Extreme Edition to today’s maximum of 5.5GHz in the Core i9-12900KS. That’s still almost a 50% increase in frequency, but it’s come over six generations (or more, depending on how you want to count) of process node improvements. Put another way, if Dennard scaling hadn’t died, modern CPUs would clock as high as 28GHz. RIP, Dennard scaling, you’ll be missed.
It’s not just the frequency scaling that died, but power and voltage scaling as well. Today, a new process node can improve transistor density, but voltages and frequencies need to be balanced. If you want a chip that’s twice as fast, you might need to use nearly twice as much power. Alternatively, you can build a chip that’s more efficient, but it won’t be any faster. Nvidia seems to be going after more performance with Ada, though it hasn’t completely tossed efficiency concerns out the window.
How Much Will RTX 40-Series Cards Cost?
The short answer, and the true answer, is that they will cost as much as Nvidia can get away with charging. Nvidia launched Ampere with one set of financial models, and those proved to be completely wrong for the Covid pandemic era. Real-world prices shot up and scalpers profiteered, and that was before cryptocurrency miners started paying two to three times the official recommended prices.
The good news is that GPU prices are coming down, and Ethereum mining has ended. That in turn has absolutely killed GPU profitability for mining, with most cards now costing more to run than they could make off the endeavor. That’s all good news, but it still doesn’t guarantee reasonable prices.
The problem is that with the Ethereum network now on proof of stake, roughly 20 million GPUs that were mining for the past two years are now looking for work. Many of those will likely end up being resold, which will collapse used GPU prices. While buying a used graphics card has some risk, you can take precautions and it might soon be difficult to pass up the good deals.
We’re already feeling the effects, and Nvidia has stated in its earnings call to investors that it expects to be in a consumer GPU oversupply for the next couple of quarters — and that’s of course a conservative estimate. It could take longer, which would mean Nvidia and its partners will be trying to offload RTX 30-series cards until perhaps April 2023. Ouch.
What do you do when you have a bunch of existing cards to sell? You make the new cards cost more. We’re seeing that already with the announced prices on the RTX 4090 and 4080 models. The 4090 is $1,599, $100 more than the 3090 launch price and far out of reach of most gamers. The RTX 4080 16GB isn’t much better at $1,199, and the RTX 4080 12GB costs $899, $200 more than the RTX 3080 10GB launch MSRP — and we’re only just now seeing 3080 cards sell at retail for close to that!
Generational GPU prices are going up with Ada and the RTX 40-series, at least in the near term. However, Nvidia will also have to compete with AMD, and the Radeon RX 7000-series and RDNA 3 GPUs should start arriving in November. Nvidia might try to delay additional GPUs like the RTX 4070 and below until next year, but AMD may also gain some market share if it can provide a decent supply of RDNA 3 cards.
There’s no reason for Nvidia to immediately shift all of its GPU production from Ampere to Ada either. We’ll likely see RTX 30-series GPUs still being produced for quite some time, especially since no other GPUs or CPUs are competing for Samsung Foundry’s 8N manufacturing. Nvidia stands to gain more by introducing high-end Ada cards first, using all of the available capacity it can get from TSMC, and if necessary it can cut prices on the existing RTX 30 cards to plug any holes.
Will Nvidia Change the Founders Edition Design?
Nvidia made a lot of claims about its new Founders Edition card design at the launch of the RTX 3080 and 3090. While the cards generally work fine, what we’ve discovered over the past two years is that traditional axial cooling cards from third party AIC partners tend to cool better and run quieter, even while using more power. The GeForce RTX 3080 Ti Founders Edition was a particularly egregious example of how temperatures and fan speeds couldn’t keep up with hotter running GPUs.
The main culprit seems to be the GDDR6 memory, and Nvidia won’t be packing more GDDR6X into Ada than in Ampere, at least in terms of the total number of chips. RTX 4090 will have twelve 2GB chips, just like the 3090 Ti, while the 4080 16GB cuts that two eight chips and the 12GB card only has to cool six chips. Put in better thermal pads and the existing Founders Edition design seems like it will still be adequate — adequate, but not necessarily superior to other designs.
Even the RTX 4080 16GB (opens in new tab) seems to be getting in on the triple-slot action this round, which is an interesting change of pace. It’s going to be a 320W TBP, but then the 3080 FE and 3080 Ti FE always ran more than a little toast. The 285W TBP on the 4080 12GB will probably get the two-slot treatment.
Ada GPU Release Date
Now that the big reveal is over, we know that the RTX 4090 will arrive on October 12. Beyond that, however, there will be plenty of other Ada graphics cards.
Nvidia launched the RTX 3080 and RTX 3090 in September 2020, the RTX 3070 arrived one month later, then the RTX 3060 Ti arrived just over a month after that. The RTX 3060 didn’t come out until late February 2021, then Nvidia refreshed the series with the RTX 3080 Ti and RTX 3070 Ti in June 2021. The budget-friendly RTX 3050 didn’t arrive until January 2022, and finally the RTX 3090 Ti was just launched at the end of March 2022.
We expect a staggered launch for the Ada cards as well, but based on the oversupply situation Nvidia is currently facing on RTX 30-series parts, it will probably drag on quite a bit longer. Both RTX 4080 models will almost certainly show up by November, but we don’t anticipate more Ada models until 2023. That might change, but that’s our best guess for now.
We still need true budget offerings to take over the GTX 16-series. Could we get a new GTX series, or a true budget RTX card for under $200? It’s possible, but don’t count on it, as Nvidia seems content to let AMD and Intel fight it out in the sub-$200 range. At best, RTX 3050 might drop to $200 in the coming months, but we wouldn’t be surprised to see Nvidia completely abandon the sub-$200 graphics card market.
There will inevitably be a refresh of the Ada offerings about a year after the initial launch as well. Whether those end up being “Ti” models or “Super” models or something else is anyone’s guess, but you can pretty much mark it on your calendar. GeForce RTX 40-series refresh, coming in Summer 2023.
More Competition in the GPU Space
Nvidia has been the dominant player in the graphics card space for a couple of decades now. It controls roughly 80% of the total GPU market, and 90% or more of the professional market, which has largely allowed it to dictate the creation and adoption of new technologies like ray tracing and DLSS. However, with the continuing increase in the importance of AI and compute for scientific research and other computational workloads, and their reliance on GPU-like processors, numerous other companies are looking to break into the industry, chief among them being Intel.
Intel hasn’t made a proper attempt at a dedicated graphics card since the late 90s, unless you count the aborted Larrabee. This time, Intel Arc Alchemist appears to be the real deal — or at least the foot in the door. It looks like Intel has focused more on media capabilities, and the jury is very much still out when it comes to Arc’s gaming or general compute performance. From what we know, the top consumer models will only be in the 18 TFLOPS range at best. Look at our table at the top and that looks like it will only compete with RTX 4060, if that.
But Arc Alchemist is merely the first in a regular cadence of GPU architectures that Intel has planned. Battlemage could easily double down on Alchemist’s capabilities, and if Intel can get that out sooner than later, it could start to eat into Nvidia’s market share, especially in the gaming laptop space. Or Arc could end up being a failure, as oversupply of Nvidia RTX 30-series cards might make them so cheap that Intel can’t compete.
AMD won’t be standing still either, and it has said several times that it’s “on track” to launch its RDNA 3 architecture by the end of the year, with a scheduled November 3 reveal. AMD will move to TSMC’s N5 node for the GPU chiplets, but it will also use the N6 node for the memory chiplets. AMD has so far avoided putting any form of deep learning hardware into its consumer GPUs (unlike its MI200 series), which allows it to focus on delivering performance without worrying as much about upscaling — though FSR 2.0 does cover that as well and works on all GPUs.
There’s also no question that Nvidia currently delivers far superior ray tracing performance than AMD’s RX 6000-series cards, but AMD hasn’t been nearly as vocal about ray tracing hardware or the need for RT effects in games. Intel for its part looks like it may deliver decent RT performance, but only up to the level of the RTX 3070 (give or take). But as long as most games continue to run faster and look good without RT effects, it’s an uphill battle convincing people to upgrade their graphics cards.
Nvidia RTX 40-Series Closing Thoughts
It’s been a long two years of GPU droughts and overpriced cards. 2022 is shaping up to be the first real excitement in the GPU space since 2020. Hopefully this round will see far better availability and pricing. It could hardly be worse than what we’ve seen for the past 24 months.
We anticipate having the first reviews of the GeForce RTX 4090 cards go up on October 11, one day before the retail launch. Check back then for the full rundown on performance, and we’ll be looking at games, professional workloads, and more.