A lot of controversy over the thermal envelope of the GTX-480 spurred us into action. Check out what Fermi thermals looks like after extensive gaming and benching.
Introduction
When the Nvidia GTX-480 hit the shelves we saw a lot of sites reporting what we considered high temperatures on the GPU. A lot of those sites were relying on Furmark and chose not to game for hours on end in a variety of games and get thermal readings that way. We can understand the initial rush to publish because in reviewing it’s publish now or suffer a loss of traffic on the review or article.
We can’t understand not going back and studying the thermal envelope of the GTX-480 after the initial reviews. So we fired up the GTX-480 and gamed for days on end (a dirty job but someone had to do it). Each game we ran we gamed for hours and hours then dropped out and took temperature readings. Each benchmark we used we ran multiple benchmarks back to back and dropped out and took readings. In all we spent 3 weeks gaming, taking thermal readings, and checking ambient temperatures to ensure continuity of testing.
Sometimes you start into a review and it looks, on the surface, to be an easy one. Then later you realize that you just let yourself in for a few weeks of rigorous testing and easy left the room about the time you fired up the rig. This is one of those cases. What we boil down to a screenshot of GPU-Z ended up being 3 gaming sessions lasting hours, then pulling the median thermal reading.
Before we get into all that lets rehash a little GTX-480 info.

| GPU | GTX-480 | GTX-470 | GTX-285 | 5850 | 5870 | GTX-295 | 5970 |
|---|---|---|---|---|---|---|---|
| Shader units | 480 | 448 | 240 | 1440 | 1600 | 2x 240 | 2x 1600 |
| ROPs | 48 | 40 | 32 | 32 | 32 | 2x 28 | 2x 32 |
| GPU | GF100 | GF100 | GT200b | Cypress | Cypress | 2x GT200b | 2x Cypress |
| Transistors | 3200M | 3200M | 1400M | 2154M | 2154M | 2x 1400M | 2x 2154M |
| Memory Size | 1536 MB | 1280 MB | 1024 MB | 1024 MB | 1024 MB | 2x 896 MB | 2x 1024 MB |
| Memory Bus Width | 384 bit | 320 bit | 512 bit | 256 bit | 256 bit | 2x 448 bit | 2x 256 bit |
| Core Clock | 700 MHz | 607 MHz | 648 MHz | 725 MHz | 850 MHz | 576 MHz | 725 MHz |
| Memory Clock | 924 MHz | 837 MHz | 1242 MHz | 1000 MHz | 1200 MHz | 999 MHz | 1000 MHz |
| Price | $499 | $349 | $340 | $299 | $399 | $500 | $599 |
By now the specs of the GTX-480 are pretty well known. Pushing 3 billion transistors on a GPU die is an amazing feat in itself. Pushing 3 Billion transistors on one GPU die and controlling the heat is just as amazing as putting that many transistors on it in the first place.
Straight from Nvidia’s Mouth
This next section is Nvidia’s eloquent explanation on Fermi and it’s design. We could reword it and toss in a few comments and claim it as our own but frankly that would be a disservice to the fine folks over at Nvidia that came up with it. Let’s give credit where credit is due and if we have any comments to toss in this section, they will be in bold type.
NVIDIA’s Next Generation CUDA Compute and Graphics Architecture, Code-Named “Fermi”
The Fermi architecture is the most significant leap forward in GPU architecture since the original G80. G80 was our initial vision of what a unified graphics and computing parallel processor should look like. GT200 extended the performance and functionality of G80. With Fermi, we have taken all we have learned from the two prior processors and all the applications that were written for them, and employed a completely new approach to design to create the world’s first computational GPU. When we started laying the groundwork for Fermi, we gathered extensive user feedback on GPU computing since the introduction of G80 and GT200, and focused on the following key areas for improvement:
Improve Double Precision Performance—while single precision floating point performance was on the order of ten times the performance of desktop CPUs, some GPU computing applications desired more double precision performance as well.
- ECC support ECC allows GPU computing users to safely deploy large numbers of GPUs in datacenter installations, and also ensure data-sensitive applications like medical imaging and financial options pricing are protected from memory errors.
- True Cache Hierarchy some parallel algorithms were unable to use the GPU’s shared memory, and users requested a true cache architecture to aid them.
- More Shared Memory many CUDA programmers requested more than 16 KB of SM shared memory to speed up their applications.
- Faster Context Switching users requested faster context switches between application programs and faster graphics and compute interoperation.
- Faster Atomic Operations users requested faster read-modify-write atomic operations for their parallel algorithms.
With these requests in mind, the Fermi team designed a processor that greatly increases raw compute horsepower, and through architectural innovations, also offers dramatically increased programmability and compute efficiency. The key architectural highlights of Fermi are:
- Third Generation Streaming Multiprocessor (SM)
- 32 CUDA cores per SM, 4x over GT200
- 8x the peak double precision floating point performance over GT200
- Dual Warp Scheduler simultaneously schedules and dispatches instructions from tw
- independent warps
- 64 KB of RAM with a configurable partitioning of shared memory and L1 cache
- Second Generation Parallel Thread Execution ISA
- Unified Address Space with Full C++ Support
- Optimized for OpenCL and DirectCompute
- Full IEEE 754-2008 32-bit and 64-bit precision
- Full 32-bit integer path with 64-bit extensions
- Memory access instructions t
- support transition t
- 64-bit addressing
- Improved Performance through Predication
- Improved Memory Subsystem
- NVIDIA Parallel DataCacheTM hierarchy with Configurable L1 and Unified L2 Caches
- First GPU with ECC memory support
- Greatly improved atomic memory operation performance
- NVIDIA GigaThreadTM Engine
- 10x faster application context switching
- Concurrent kernel execution
- Out of Order thread block execution
- Dual overlapped memory transfer engines
An Overview of the Fermi Architecture
The first Fermi based GPU, implemented with 3.0 billion transistors, features up to 512 (GTX-480 has 480 cores exposed but has a total of 512 cores so we may see an enthusiast line with all 512 cores exposed at a later date, but notice the “up to 512 CUDA cores Nvidia let slip in the document) CUDA cores. A CUDA core executes a floating point or integer instruction per clock for a thread. The 512 CUDA cores are organized in 16 SMs of 32 cores each. The GPU has six 64-bit memory partitions, for a 384-bit memory interface, supporting up to a total of 6 GB of GDDR5 DRAM memory. A host interface connects the GPU to the CPU via PCI-Express. The GigaThread global scheduler distributes thread blocks to SM thread schedulers.
Third Generation Streaming Multiprocessor
The third generation SM introduces several architectural innovations that make it not only the most powerful SM yet built, but also the most programmable and efficient.
512 High Performance CUDA cores Each SM features 32 CUDA processors—a fourfold increase over prior SM designs. Each CUDA processor has a fully pipelined integer arithmetic logic unit (ALU) and floating point unit (FPU). Prior GPUs used IEEE 754-1985 floating point arithmetic. The Fermi architecture implements the new IEEE 754-2008 floating-point standard, providing the fused multiply-add (FMA) instruction for both single and double precision arithmetic. FMA improves over a multiply-add (MAD) instruction by doing the multiplication and addition with a single final rounding step, with no loss of precision in the addition. FMA is more accurate than performing the operations separately. GT200 implemented double precision FMA.
In GT200, the integer ALU was limited to 24-bit precision for multiply operations; as a result, multi-instruction emulation sequences were required for integer arithmetic. In Fermi, the newly designed integer ALU supports full 32-bit precision for all instructions, consistent with standard programming language requirements. The integer ALU is also optimized to efficiently support 64-bit and extended precision operations. Various instructions are supported, including Boolean, shift, move, compare, convert, bit-field extract, bit-reverse insert, and population count.
16 Load/Store Units
Each SM has 16 load/store units, allowing source and destination addresses to be calculated for sixteen threads per clock. Supporting units load and store the data at each address to cache or DRAM.
Four Special Function Units
Special Function Units (SFUs) execute transcendental instructions such as sin, cosine, reciprocal, and square root. Each SFU executes one instruction per thread, per clock; a warp executes over eight clocks. The SFU pipeline is decoupled from the dispatch unit, allowing the dispatch unit to issue to other execution units while the SFU is occupied.
Fermi’s 16 SM are positioned around a common L2 cache. Each SM is a vertical rectangular strip that contain an orange portion (scheduler and dispatch), a green portion (execution units), and light blue portions (register file and L1 cache).
The Test Chassis
Normally we run tests on an open Top Deck Testing station but an open chassis design like that won’t work for this article. So we went with a Silverstone Raven 2 chassis that provides plenty of room. You can see the entire review of the Raven 2 HERE if you want to peruse it.
To make things a little easier lets insert a page from that review here to give you an idea of the chassis. No this isn’t a chassis article but the chassis is probably the single most important component in overall cooling in any system.
The Raven RV-02 is a little different from what we are used to. Removing the side panels on the Raven RV-02 required removal of the top porition of the chassis.
Then there are four thumb screws that secure the side panels onto the chassis. Remove the screws, swing the panels out a couple of inches, then lift straight up.
The design being solely based on an inverted 90° rotation of the motherboard, which means the PCI expansion cards go in through the rear and point upwards.
This chassis uses three 180mm fans on the bottom. With one 120mm fan up top. I see the 5.25″ drive bay is in the right spot, and this chassis uses a 3.5″ to 5.25″ HDD cage. Now I wonder where the PSU gets mounted at.
Looking at the PCI expansion openings on this chassis, a total of eight.
The included top mounted 120mm fan.
Better picture of one of the bottom 180mm fans. SilverStone uses a hex style grill to keep your fingers and small screws from entering.
This looks like where the PSU gets mounted at, it too is on a 90° angle.
As I suspected, the PSU gets mounted the same way the motherboard does. Right in front of the PSU mounting area are three little switches. These switches control the bottom three 180mm fans, a low and high speed setting. Now this baffled me, why use a fan controller on already quiet fans?
Moving towards the front of the chassis, you can make out the vented PCI expansion covers.
Each bottom 180mm fan has a removable filter. To remove the filters, all I had to do was grab the tab, lift up, and slide it right out.
The fresh air intake for the PSU.
The 3.5″ to 5.25″ HDD cage, this HDD cage can only house three 3.5″ HDD’s.
Since our components are mounted pointing upwards, SilverStone made this chassis step down a bit so we can hide all of our cables.
Here is the Raven RV-02 chassis top cover. This is vented so the computer components can breath. Since heat rises, the design of this chassis makes sense.
A quick snapshot of the SilverStone logo on the Raven RV-02 chassis.
The PSU intake has a removable filter on it as well.
Looking at the rear external portion of the chassis.
Here we put the Raven RV-02 chassis on its side to look at the bottom. The entire bottom is well ventilated to allow airflow to the bottom mounted 180mm fans. If you look closely at the bottom of the chassis you can make out two small holes.
This is for those with an external water cooling set up. What i thought was a bit odd was that these are located here. Running our hoses from the bottom of the chassis is begging for kinks when trying to make that sharp radius.
The front bottom of the chassis.
Time to turn our attention to the 180mm fans once again. We can remove the bottom 180mm fans to either clean them, replace them, or if we wanted, install a water cooling radiator. Remove the two screws that are in front of the fans, then slide them out. Similar to removing the filter.
NOTE: You also have to remove the top three fan control switches to fully remove the bottom 180mm fans.
Here is what the fans look like that were used in this chassis.
The 5.25″ drive bays are a screw-less design.
SilverStone includes 2 front I/O USB ports with front audio headers, these are located up top in the front.
On the outer edges of the top portion are the ON/OFF and reset buttons.
The HDD cage is held in by eight thumbscrews, there are 4 on each side.
The HDD cage has rubber isolation mounts mounted directly to the cage to reduce vibrations from the HDDs.
The Test Rig
Since we had a look at the chassis we need to look at what we have installed in the test rig. We didn’t take it easy on the article, we packed the chassis with everything but the kitchen sink. This is a full fledged enthusiast system with a Core i7 980x, 3x SAS Hard drives, 2x SSD’s, hardware Raid card, and about as much extremely expensive hardware as we could cram into it as we could get.

We didn’t go skimpy in this chassis. At the end of the day we have to live with the build on a day to day basis and you don’t strip out a chassis just to keep things tidy. We did make every effort to wire the chassis tightly and keep wires out of the way as much as possible. We do that anyway, a bunch of wires hanging around looking ugly can ruin the look of a rig as well as the airflow.

Over toward the drive bays you can see the three Seagate Constellation SAS 6Gb/s hard drives, one of the two SSD’s and the Asus Blu-Ray combo drive. You also get a peek at the 3Ware hardware RAID controller.

That’s a full sized Silverstone Strider 1500W PSU and we used the Noctua NH-D14 dual tower CPU cooler and used 3 120mm Noctua fans. In other words we didn’t strip out the chassis, we loaded it heavier than most chassis can ever dream of being loaded. How heavy, figure the Drives, CPU, GPU, Raid Card run close to $4k, not to mention the 1500w PSU, the Raven 2, the Rampage 3, the CPU cooler, or the 12 GB of ram. It’s a dream machine and the chassis is packed with goodies so the first e-mailer that says “Oh they stripped the chassis” we will send some secret Bjorn3D Ninja’s and they will steal away into the night with your gaming shrine.
Test Rig Specifics
| Test Rig “Quadzilla” |
|
| Case Type | Silverstone Raven 2 |
| CPU | Intel Core I7 980 Extreme |
| Motherboard | Asus Rampage 3 |
| Ram | Kingston HyperX 12GB 9-9-9-24 |
| CPU Cooler | Noctua NH-D14 (3x 120mm fans) |
| Hard Drives |
3x Seagate Constellation ES 2TB 7200 RPM 16MB Cache |
| Optical | Asus BD-Combo |
| GPU | Nvidia GTX-480 |
| Case Fans | 120mm Fan cooling the mosfet CPU area |
| Docking Stations | None |
| Testing PSU | Silverstone Strider 1500 Watt |
| Legacy | None |
| Mouse | Razer Lachesis |
| Keyboard | Razer Lycosa |
| Gaming Ear Buds |
Razer Moray |
| Speakers | None |
| Any Attempt Copy This System Configuration May Lead to Bankruptcy | |
Please note that all tests were performed with the chassis sides on and absolutely no extra cooling. Just the Raven 2 with it’s built in cooling. The only other fans in the chassis were on the power supply, the GPU, and the CPU cooler. No extra fans were stuffed in there, no fans were sitting outside the chassis providing extra airflow. Just the Raven 2 with it’s low RPM fans.
Methodology and Software Used
Initially we were going to just run games. Then the article got more grandiose in scale and we included Furmark, ATITool, 3DMark Vantage, several stand alone game benchmarks and of course games.
We ran Furmark for 5 minutes, for an explanation on that see the next page. We also ran Furmark for 30 minutes on a lot of the modern GPU’s and built a table for it. ATITool we ran for 10 minutes, stand alone benchmarks we ran 5 passes at the benchmark then recorded the temperature. In Crysis we ran 10 complete passes of the Frame buffer Benchmark tool. In Games we ran the games on the absolute highest setting you can use. Each game we gamed for 2 hours then recorded the temperatures. We did 3 gaming sessions for each game totaling 6 hours. Each session was back to back with nothing but a pause for taking a screenshot of GPU-Z. We didn’t let the GPU cool down at all between sessions. We report the highest temperature we got during sessions.
Of course we used a fresh load of Windows 7 64 bit Ultimate and updated it and all the drivers to the latest versions. All the games were patched to the latest patch, all the benchmarks were the latest releases.
Keep in mind we aren’t looking at performance, we are looking at the temperatures the GTX-480 generates so we went with screenshots because we want you to see the GPU-Z screen superimposed on the game or benchmark we were running so there is no doubt what temperature the GPU was running at.
We went to great pain to ensure the ambient temperature was at 72°F and no breezes or fans were blowing on the chassis at any time. The chassis itself was on a wooden surface clear of dust and debris. We didn’t disable anything in Windows, or tweak the rig or GPU in any way. We ran at stock CPU speed, stock Ram speed and stock GPU speed.
Lastly before we get to the software used we’d like to say we get inundated with E-Mail on a daily basis, E-Mails ranting about the temperatures will be deleted en-mass. These are screenshots of the GTX-480 in the Raven 2 and as such leave little doubt as to the resulting temperature. If you have specific concerns address them politely and if we get time we will respond.
Synthetic Benchmarks & Games
| Synthetic Benchmarks & Games | |
| 3DMark Vantage | |
| World In Conflict Benchmark | |
| Metro 2033 | |
| Call of Duty Modern Warfare 4 | |
| FarCry 2 | |
| Stalker COP | |
| Crysis Warhead | |
| Unigine Heaven v.2.0 | |
| Intel DX11 SDK | |
| ATITool | |
| Furmark | |
| Batman Arkham Asylum | |
| Battlefield Bad Company 2 | |
| Dirt 2 | |
| Dual Monitor | |
| Left 4 Dead 1 | |
| Nvidia Rocket Sled Demo | |
| Stone Giant | |
We ran 18 total games and benchmarks to get a good representative sample of what you can expect, thermally speaking, from the GTX-480.
Furmark
Methodology and Software Used
Initially we were going to just run games. Then the article got more grandiose in scale and we included Furmark, ATITool, 3DMark Vantage, several stand alone game benchmarks, and of course games.
Furmark we ran for 5 minutes. Why 5 minutes? Furmark is unrealistic in thermal measurement of GPU’s and bypasses normal GPU hardware and drives up temperatures. It’s designed like Intel Burn for CPU’s it heats them up as hot as they will go. You will never see a game or application that heats up your GPU like Furmark does. We don’t like the program as it misleads a lot of people about the real temperatures you can expect from your GPU. As long as we are here we went ahead and ran Furmark on several GPU’s and this is the only time we will looking at multiple GPU’s in this article. Until later anyway when we toss this table back up.
| Geforce GTX-480 | 96 |
|---|---|
| Geforce GTX-470 | 95 |
| Radeon HD 5970 | 95 |
| Radeon HD 5870 | 92 |
| Geforce GTX-295 | 92 |
| Geforce GTX-285 | 91 |
| Radeon HD 4870×2 | 86 |
| Radeon HD 5850 | 81 |
Every GPU listed here is a stock reference speed GPU, none are overclocked, every one was tested in the same exact rig, same ambient temperature, same version of Furmark, same 30 minute run. So for what it is worth we could actually stop here and say stop whining about the GTX-480 temperatures. The HD 5970 was 1°C lower than the GTX-480 and the same temperature as the GTX-470. The HD 5870 was 4°C lower than the GTX-480. The GTX-285 was 5°C lower than the GTX-480. The HD 5870 will throttle at 100°C so it has 8°C left, the GTX-480 throttles at 105°C so it has 9°C left yet the screaming about the thermals on the GTX-480 were ripped apart. I don’t remember anyone screaming about the 5870 or the 5970 yet they have less overhead to throttle than the GTX-480 and have far less transistors to service per core. Each core on the HD 5870 and HD5970 have 2.154 Billion Transistors and come within 4°C of the GTX-480 which supports 0.846 Billion or 846 Million more transistors.
It’s a given that the die on the HD 5870 is smaller than Fermi’s die size so in all fairness the HD 5870 is dissipating it’s heat across a smaller die but what does that mean to the end user as far as heat goes. Nothing at all, what we care about is how hot it got not how hot it get’s per square millimeter. (GTX-480 528mm², HD 5870




































