Nvidia GeForce GTX 580 (GF110)

Peter Kapas November 9, 2010 Hardware, NVIDIA, Reviews & Articles, Video Cards/Graphics Cards Leave a comment

The Nvidia GeForce GTX 580 is finally here! With the newly improved and optimized Fermi architecture (GF110), the Nvidia GeForce GTX 580 is capable of higher performance vs power, temperature, and noise ratio.

Introduction

For many, it felt like Nvidia took forever to launch their GeForce GTX 480 Fermi cards, with 3.2 Billion transistors and 512 CUDA cores (480 exposed). And while the launch was great and the performance was excellent, many expected a bit more from the 480. Several early buyers were quite dissapointed with the 480’s power consumption, temperatures, and noise. 7 months after the GeForce GTX 480 was launched, Nvidia is releasing their new flagship GPU, and the successor to the GTX 480: the GeForce GTX 580.

When the GeForce GTX 480 launched, it was aimed to excel at tessellation. With 15 PolyMorph engines, the GTX 480 was able to outperfom ATI’s single-GPU flagship card, the HD5870 in tessellation-demanding applications due to the 5870’s single dedicated unit for tessellation. The GeForce GTX 580 has a total of 16 PolyMorph Engines, as compared to the 480’s 15. This allows for higher performance in games and applications that have high geometry processing and tessellation.

The Nvidia GeForce GTX 580 also comes with 512 CUDA cores, but this time all 512 cores are exposed and fully functional. There are more cores available for rendering data, and there is an extra PolyMorph Engine for better tessellation. The new improved and optimized chip is now known as the GF110.

The GeForce GTX 580 comes with lots of improvements over the 480, as we have previously mentioned. However, there are some other factors that also add to the performance of the 580. Firstly, the GTX 580 has full-speed FP16 texture filtering support, which allows the users to gain performance in certain texture-heavy applications. It also comes with new tile formats that could improve Z-cull efficiency. While all CUDA cores have been unlocked, and new texture and SM units have been added, the 580 also has a higher clock speed. We’ll take a look at that in more detail on the following pages.

The Nvidia GeForce GTX 580 is priced at $499 SEP as of 11/09/10, but as time goes by and vendors start coming out with their own designs, we will start seeing different price points for the GTX 580. Stock clocked cards should be around the $499 price range, while OC cards may be a bit more.

Nvidia’s New Demos

Nvidia also released some exciting Demos that showcase Nvidia’s tessellation capabilities on the GeForce GTX 580 video card. While these demos will play on the 400 series video cards as well, to have optimal single GPU performance, users will need a GeForce GTX 580 video card. The first picture is from Endless City, a tessellated city landscape generated by the GPU. All the fine detail in the buildings are tessellations and not bump maps that we are well aware of. All the lights in the scene are able to produce excellent shadows because we no longer use bump maps, but rather a higher polygon geometry count.

The second picture shows the Aliens vs. Triangles tessellation demo, in which users can modify the aliens with very fine detail. Once again, instead of having bump maps, the alien is very high detailed in geometry. This allows extra options to be integrated, such as making sure that if something interacts with the alien, the skin would change accordingly. This also allows for much higher quality rendering that was not possible with bump maps in the past.

Click Image For a Larger One

The GF110 Architecture – Improved / Optimized FERMI

As we mentioned on the previous page, the GTX 580 has went through a lot of architectural enhancements and the two major changes were the FP16 texture filtering, which helps with texture-intensive applications, and the new tile formats that improve Z-cull efficiency. The chart below from Nvidia shows how the architectural enhancements improved performance from the GTX 480, the extra performance granted from the faster clock speeds on the core and memory, as well as the extra 32 cores that were unlocked on the GTX 580, making it a true 512 CUDA core GPU.

Earlier there was a Nvidia GTX 480 video cards which had 512 cores, but the power consumption was also much higher. With the optimized GF110 chip, the GTX 580 can maintain the same power efficiency as the GTX 480, and still gain performance.

With the new GF110 chip, PolyMorph and Raster Engines have been added to help with tessellation. While the new PolyMorph engine helps with tessellation performance in games, the extra Raster Engine helps with the conversion of polygons to pixel fragments. Now with 16 PolyMorph Engines and 512 CUDA cores, the 580 is able to achieve a stunning 2 billion Triangles per second. That is a tremendous amount of polygons, something we would only see in Hollywood blockbuster movies. Now all of this can easily be rendered real-time with the GTX 580 GPU. Nvidia’s new demo Endless City shows this off, rendering and playing back everything in real-time.

The Radeon HD series video cards still have a much harder time with tessellation based benchmarks, which means that when games start incorporating extensive tesselation into their geometry, the Nvidia cards will have an advantage over their AMD counterparts. There are some games that already take advantage of tessellation, like H.A.W.X II (coming out on 11/12/10). The Unigine Heaven 2.1 benchmark also tests tessellation capabilities. While the tessellation visual improvement is very limited at the moment, we believe that tessellation will be taken much further in the future, making it possible to make characters, terrain, and objects much more belieavable than they are now.

For the GF110 design, Nvidia completely re-engineered the previous GF100, down to the transistor level. The previous chip had to be evaluated at every block of the GPU. To achive higher performance with lower power consumption, Nvidia modified a very large percentage of the transistors on the chip. They used lower leakage transistors on less timing sensitive processing paths, and higher speed transistors on more critical processing paths. This is why Nvidia was able to add the extra 32 cores to the final Fermi architecture, while also adding another SM to the chip.

To compare the power consumption of a GTX 480 to that of a GTX 580, we tried to overclock the GTX 480 as far as we could, trying to match the performance of the GTX 580. While it was difficult to reach the performance of the GTX 580 with our Galaxy GTX 480, we got within 2 FPS of the performance of the GTX 580. While the performance was very close, the shocking part was that we were using well over 100W of power. The performance-to-power consumption ratio is definitely improved on the GTX 580’s GF110 chip. The following chart compares the GTX 480 with the GTX 580 for overall performance per watt, showing that the GTX 580 can perform over 35% better than the GTX 480 in 3DMark Vantage.

For many of Nvidia’s previous video cards, the GPU’s thermal protection features meant that the GPU would be downclocked when at extreme temperatures. This would protect the cards from unwanted damage. However, with the release of stressing applications such as FurMark, MSI Kombustor, and OCCT, the latest video cards can reach dangerously high currents, potentially causing damage to components on the card. Nvidia integrated a new power monitoring feature into the GTX 580, which will dynamically adjust performance in certain stress applications if the power levels exceed the card’s specifications. These dedicated hardware circuitries run real-time, monitoring the current and voltage on each of the 12V rails. These rails include the 6-pin, 8-pin, and the PCI-Express edge connector.

Cooler Design

Nvidia made improvements when developing the GTX 580 based on what consumers said about the GTX 480. The thermal characteristics of the GF110 chip are also much better than of the GF100. We’ll go into more detail about the GTX 580 reference card’s cooling solution on the following pages. What we see on this chart is that the GTX 480 is roughly about 9-10 dBA higher than the GTX 580. Generally, a human perceives each 10 dBA increase as being twice as loud as the previous noise level. The GTX 580 will perform much quieter than any high-end card Nvidia has released in the past few years.

Based on the tests we did in our labs, the GTX 580 does indeed perform very quietly during high loads. We tested the thermal improvements and acoustic improvements on the card, and with our Silverstone TJ-10 chassis and some acoustic dampaning on each side panel, the GTX 580 was totally inaudible during gaming. The other fans in the system were a bit louder than the GTX 580. When we ran Furmark, the fan speed starts getting faster. However Furmark is not a real-life based application because it actually pulls more power and heat out of the video card than a real-life application would. Also, if we push the fan speed on the GTX 580 to 100%, we can definitely hear the fan loud and clear. During our testing period, we played Metro 2033 for about an hour in a closed chassis with no side ventilation, and the fan speed only reached up to 66%, which kept a very quiet environment for gaming.

The new cooling solution on the GTX 580 uses a special heatsink design, including what is called a vapor chamber. Think of the vapor chamber as a heatpipe solution, but instead of just contacting the heatsink fins in certain areas, the vapor chamber 100% contact with every fin of the heatsink. This helps tremendously by spreading the heat out over a large block of a heatsink.

The GTX 580 also has a new adaptive GPU fan control, and the card is designed for great cooling potential in SLI setups. The fan has been redesigned to generate a lower pitch and tone, which allows for lower acoustic noise. The back of the cover is designed to route the air towards the rear bracket, improving SLI temperature performance.

The vapor chamber is a sealed, fluid-filled chamber with thin layered copper walls. When the heatsink is placed on the GPU, the GPU quickly boils up the liquid inside the vapor chamber, and the liquid evaporates to vapor. The hot vapor spreads throughout the top of the chamber, transferring the heat to the heatsink fins. Finally, the cooled liquid goes around and returns to the bottom of the vapor chamber, allowing the whole process to restart again. The hot heatsink fins are cooled by the air being pushed through the fins of the heatsink.

Continue onto the next page, where we examine the Nvidia GeForce GTX 580 in more detail.

Specifications & Features

The difference between the older GTX 480 (GF100) and the new GTX 580 (GF110) becomes visible once we take a look at the specs for the GTX 580. We can clearly see that the number of CUDA cores has been raised from 480 to 512. The Graphics Clock, Processor Clock, and Memory Clock frequencies have also been increased, allowing the GTX 580 to get ahead of the GTX 480 quite a bit. The Graphics Clock has jumped from 700MHz to 772MHz, and the Processor Clock speed from 1401MHz to 1544MHz. The memory frequency has gone from 1848MHz to 2004MHz, allowing the user to achieve a fantastic memory frequency of 4008MHz. Of course these numbers can be overclocked, and even more support will come once the overclocking tools enable users to change the voltages on the card.

As a result of the GPU redesign, the GeForce GTX 580 also has a lower temperature threshold than its predecessor. Whereas the 480’s GF100 chip had a temperature threshold of 105C, the 580’s GF110 has a threshold of 97C.

While the TDP of the GTX 580 has also dropped to 244W (compared to the GF100’s 250W), the actual power consumption that we measured in Metro 2033 has not changed at all. Also, we still see the standard 6-pin and 8-pin power connectors on the GTX 580, so the main power design has not changed by much. It is important to understand that the following measurements were taken in real-life applications, rather than benchmarking applications such as FurMark or OCCT. Usually, FurMark and OCCT push the cards way past their standard specs, and depending on the settings, could report values that the card could not achieve in real-life situations.

The length of the card is also the same as the GTX 480, and while the main specs for SLI state that the card is designed for up to 3-way SLI, actual 4-Way SLI motherboards are capable of running 4 x Nvidia GeForce GTX 580s in 4-way SLI.

When the GTX 480 was released, it was designed for gamers that wanted to enjoy their games on the maximum graphics settings. The GTX 580 follows the same basis idea, ensuring that users can run high resolutions and high AA while still maintaining excellent performance. This becomes even more enjoyable when two GTX 580s are used in an SLI configuration. The chart below shows the advantages of Nvidia graphics over the AMD HD5870 in Xfire configuration.

Newer driver updates also help with SLI scalability issues in games, and these numbers are always improved for better performance. The chart also shows how Nvidia’s single GPU GTX 580 has more performance than AMD’s single GPU 5870. However, while we do not have a comparison between the HD 5970 and the GTX 580, our sources tell us that the performance is still better on the dual-GPU HD 5970.

What is CUDA?

CUDA is NVIDIA’s parallel computing architecture that enables dramatic increases in computing performance by harnessing the power of the GPU (graphics processing unit).

With millions of CUDA-enabled GPUs sold to date, software developers, scientists and researchers are finding broad-ranging uses for CUDA, including image and video processing, computational biology and chemistry, fluid dynamics simulation, CT image reconstruction, seismic analysis, ray tracing and much more.

Background

Computing is evolving from “central processing” on the CPU to “co-processing” on the CPU and GPU. To enable this new computing paradigm, NVIDIA invented the CUDA parallel computing architecture that is now shipping in GeForce, ION, Quadro, and Tesla GPUs, representing a significant installed base for application developers.

In the consumer market, nearly every major consumer video application has been, or will soon be, accelerated by CUDA, including products from Elemental Technologies, MotionDSP and LoiLo, Inc.

CUDA has been enthusiastically received in the area of scientific research. For example, CUDA now accelerates AMBER, a molecular dynamics simulation program used by more than 60,000 researchers in academia and pharmaceutical companies worldwide to accelerate new drug discovery.

In the financial market, Numerix and CompatibL announced CUDA support for a new counterparty risk application and achieved an 18X speedup. Numerix is used by nearly 400 financial institutions.

An indicator of CUDA adoption is the ramp of the Tesla GPU for GPU computing. There are now more than 700 GPU clusters installed around the world at Fortune 500 companies ranging from Schlumberger and Chevron in the energy sector to BNP Paribas in banking.

And with the recent launches of Microsoft Windows 7 and Apple Snow Leopard, GPU computing is going mainstream. In these new operating systems, the GPU will not only be the graphics processor, but also a general purpose parallel processor accessible to any application.

For information on CUDA and OpenCL, click here.

For information on CUDA and DirectX, click here.

For information on CUDA and Fortran, click here.

PhysX

Some Games that use PhysX (Not all inclusive)

	Batman: Arkham Asylum Watch Arkham Asylum come to life with NVIDIA® PhysX™ technology! You’ll experience ultra-realistic effects such as pillars, tile, and statues that dynamically destruct with visual explosiveness. Debris and paper react to the environment and the force created as characters battle each other; smoke and fog will react and flow naturally to character movement. Immerse yourself in the realism of Batman Arkham Asylum with NVIDIA PhysX technology.		Darkest of Days Darkest of Days is a historically based FPS where gamers will travel back and forth through time to experience history’s “darkest days”. The player uses period and future weapons as they fight their way through some of the epic battles in history. The time travel aspects of the game, lead the player on missions where they at times need to fight on both sides of a war.
	Sacred 2 – Fallen Angel In Sacred 2 – Fallen Angel, you assume the role of a character and delve into a thrilling story full of side quests and secrets that you will have to unravel. Breathtaking combat arts and sophisticated spells are waiting to be learned. A multitude of weapons and items will be available, and you will choose which of your character’s attributes you will enhance with these items in order to create a unique and distinct hero.		Dark Void Dark Void is a sci-fi action-adventure game that combines an adrenaline-fuelled blend of aerial and ground-pounding combat. Set in a parallel universe called “The Void,” players take on the role of Will, a pilot dropped into incredible circumstances within the mysterious Void. This unlikely hero soon finds himself swept into a desperate struggle for survival.
	Cryostasis Cryostasis puts you in 1968 at the Arctic Circle, Russian North Pole. The main character, Alexander Nesterov is a meteorologist incidentally caught inside an old nuclear ice-breaker North Wind, frozen in the ice desert for decades. Nesterov’s mission is to investigate the mystery of the ship’s captain death – or, as it may well be, a murder.		Mirror’s Edge In a city where information is heavily monitored, agile couriers called Runners transport sensitive data away from prying eyes. In this seemingly utopian paradise of Mirror’s Edge, a crime has been committed and now you are being hunted.

What is NVIDIA PhysX Technology?
NVIDIA^® PhysX^® is a powerful physics engine enabling real-time physics in leading edge PC games. PhysX software is widely adopted by over 150 games and is used by more than 10,000 developers. PhysX is optimized for hardware acceleration by massively parallel processors. GeForce GPUs with PhysX provide an exponential increase in physics processing power taking gaming physics to the next level.

What is physics for gaming and why is it important?
Physics is the next big thing in gaming. It’s all about how objects in your game move, interact, and react to the environment around them. Without physics in many of today’s games, objects just don’t seem to act the way you’d want or expect them to in real life. Currently, most of the action is limited to pre-scripted or ‘canned’ animations triggered by in-game events like a gunshot striking a wall. Even the most powerful weapons can leave little more than a smudge on the thinnest of walls; and every opponent you take out, falls in the same pre-determined fashion. Players are left with a game that looks fine, but is missing the sense of realism necessary to make the experience truly immersive.

With NVIDIA PhysX technology, game worlds literally come to life: walls can be torn down, glass can be shattered, trees bend in the wind, and water flows with body and force. NVIDIA GeForce GPUs with PhysX deliver the computing horsepower necessary to enable true, advanced physics in the next generation of game titles making canned animation effects a thing of the past.

Which NVIDIA GeForce GPUs support PhysX?
The minimum requirement to support GPU-accelerated PhysX is a GeForce 8-series or later GPU with a minimum of 32 cores and a minimum of 256MB dedicated graphics memory. However, each PhysX application has its own GPU and memory recommendations. In general, 512MB of graphics memory is recommended unless you have a GPU that is dedicated to PhysX.

How does PhysX work with SLI and multi-GPU configurations?
When two, three, or four matched GPUs are working in SLI, PhysX runs on one GPU, while graphics rendering runs on all GPUs. The NVIDIA drivers optimize the available resources across all GPUs to balance PhysX computation and graphics rendering. Therefore users can expect much higher frame rates and a better overall experience with SLI.

A new configuration that’s now possible with PhysX is 2 non-matched (heterogeneous) GPUs. In this configuration, one GPU renders graphics (typically the more powerful GPU) while the second GPU is completely dedicated to PhysX. By offloading PhysX to a dedicated GPU, users will experience smoother gaming.

Finally we can put the above two configurations all into 1 PC! This would be SLI plus a dedicated PhysX GPU. Similarly to the 2 heterogeneous GPU case, graphics rendering takes place in the GPUs now connected in SLI while the non-matched GPU is dedicated to PhysX computation.

Why is a GPU good for physics processing?
The multithreaded PhysX engine was designed specifically for hardware acceleration in massively parallel environments. GPUs are the natural place to compute physics calculations because, like graphics, physics processing is driven by thousands of parallel computations. Today, NVIDIA’s GPUs, have as many as 480 cores, so they are well-suited to take advantage of PhysX software. NVIDIA is committed to making the gaming experience exciting, dynamic, and vivid. The combination of graphics and physics impacts the way a virtual world looks and behaves.

Direct Compute

DirectCompute Support on NVIDIA’s CUDA Architecture GPUs

Microsoft’s DirectCompute is a new GPU Computing API that runs on NVIDIA’s current CUDA architecture under both Windows VISTA and Windows 7. DirectCompute is supported on current DX10 class GPU’s and DX11 GPU’s. It allows developers to harness the massive parallel computing power of NVIDIA GPU’s to create compelling computing applications in consumer and professional markets.

As part of the DirectCompute presentation at the Game Developer Conference (GDC) in March 2009 in San Francisco CA, NVIDIA demonstrated three demonstrations running on a NVIDIA GeForce GTX 280 GPU that is currently available. (see links below)

As a processor company, NVIDIA enthusiastically supports all languages and API’s that enable developers to access the parallel processing power of the GPU. In addition to DirectCompute and NVIDIA’s CUDA C extensions, there are other programming models available including OpenCL™. A Fortran language solution is also in development and is available in early access from The Portland Group.

NVIDIA has a long history of embracing and supporting standards since a wider choice of languages improve the number and scope of applications that can exploit parallel computing on the GPU. With C and Fortran language support here today and OpenCL and DirectCompute available this year, GPU Computing is now mainstream. NVIDIA is the only processor company to offer this breadth of development environments for the GPU.

OpenCL

OpenCL (Open Computing Language) is a new cross-vendor standard for heterogeneous computing that runs on the CUDA architecture. Using OpenCL, developers will be able to harness the massive parallel computing power of NVIDIA GPU’s to create compelling computing applications. As the OpenCL standard matures and is supported on processors from other vendors, NVIDIA will continue to provide the drivers, tools and training resources developers need to create GPU accelerated applications.

In partnership with NVIDIA, OpenCL was submitted to the Khronos Group by Apple in the summer of 2008 with the goal of forging a cross platform environment for general purpose computing on GPUs. NVIDIA has chaired the industry working group that defines the OpenCL standard since its inception and shipped the world’s first conformant GPU implementation for both Windows and Linux in June 2009.

NVIDIA has been delivering OpenCL support in end-user production drivers since October 2009, supporting OpenCL on all 180,000,000+ CUDA architecture GPUs shipped since 2006.

NVIDIA’s Industry-leading support for OpenCL:

2010

March – NVIDIA releases updated R195 drivers with the Khronos-approved ICD, enabling applications to use OpenCL NVIDIA GPUs and other processors at the same time

January – NVIDIA releases updated R195 drivers, supporting developer-requested OpenCL extensions for Direct3D9/10/11 buffer sharing and loop unrolling

January – Khronos Group ratifies the ICD specification contributed by NVIDIA, enabling applications to use multiple OpenCL implementations concurrently

2009

November – NVIDIA releases R195 drivers with support for optional features in the OpenCL v1.0 specification such as double precision math operations and OpenGL buffer sharing

October – NVIDIA hosts the GPU Technology Conference, providing OpenCL training for an additional 500+ developers

September – NVIDIA completes OpenCL training for over 1000 developers via free webinars

September – NVIDIA begins shipping OpenCL 1.0 conformant support in all end user (public) driver packages for Windows and Linux

September – NVIDIA releases the OpenCL Visual Profiler, the industry’s first hardware performance profiling tool for OpenCL applications

July – NVIDIA hosts first “Introduction to GPU Computing and OpenCL” and “Best Practices for OpenCL Programming, Advanced” webinars for developers

July – NVIDIA releases the NVIDIA OpenCL Best Practices Guide, packed with optimization techniques and guidelines for achieving fast, accurate results with OpenCL

July – NVIDIA contributes source code and specification for an Installable Client Driver (ICD) to the Khronos OpenCL Working Group, with the goal of enabling applications to use multiple OpenCL implementations concurrently on GPUs, CPUs and other types of processors

June – NVIDIA release first industry first OpenCL 1.0 conformant drivers and developer SDK

April – NVIDIA releases industry first OpenCL 1.0 GPU drivers for Windows and Linux, accompanied by the 100+ page NVIDIA OpenCL Programming Guide, an OpenCL JumpStart Guide showing developers how to port existing code from CUDA C to OpenCL, and OpenCL developer forums

2008

December – NVIDIA shows off the world’s first OpenCL GPU demonstration, running on an NVIDIA laptop GPU at

SIGGRAPH Asia

June – Apple submits OpenCL proposal to Khronos Group; NVIDIA volunteers to chair the OpenCL Working Group is formed

2007

December – NVIDIA Tesla product wins PC Magazine Technical Excellence Award

June – NVIDIA launches first Tesla C870, the first GPU designed for High Performance Computing

May – NVIDIA releases first CUDA architecture GPUs capable of running OpenCL in laptops & workstations

2006

November – NVIDIA released first CUDA architecture GPU capable of running OpenCL

The GTX 580

Click Image For a Larger One

In order to show all the angles of the GeForce GTX 580, we took many pictures. At first, it looks very similar to the GTX 480, but it is missing the heatpipe solution that we saw on the GTX 480s. As we mentioned earlier, Nvidia totally redesigned the cooling on their new GF110 lineup, and the GTX 580 now comes with a vapor chamber cooler. This means that heatpipes are no longer required. With the vapor chamber heatsink design, the heat can be transferred more efficiently to the heatsink fins of the card, so the fan could easily blow out the heat from the back of the system. While overall the card has a very nice clean design, we are quite dissapointed that the bracket cooling design still uses the dense ventilation hole design. With the implementation of a less dense ventilation hole bracket design like the EVGA High Flow Bracket, the heat could be pushed out of the card with ease and it would also reduce the turbulence caused by air being pushed against the dense fins of the ventilation holes. Our previous lab tests have shown that with the EVGA High Flow design, we were able to drop the temperatures on the GTX 480 by around 3 degrees Celsius. Thankfully, because the GeForce GTX 580 also has the same connectors as the GTX 480, the EVGA High Flow bracket could easily be used on the GTX 580 cards as well.

Click Image For a Larger One

We also noticed that the fan is larger on the GTX 580 than on the GTX 480. We believe this is one of the new changes to the fan design that Nvidia was talking about. Usually with larger fans, it is easier to push more air through the card without using higher RPMs that could cause motor noise. The actual full card length with the back expansion slot bracket is roughly 11 inches, but the PCB is exactly 10.5 inches long. This means that to fit the GTX 580 into a case, users will need at least 10.6 inches of free space, but we always recommend having a case that has a bit more room for better air ciculation and an easier fit. The height of the card matches standard video card height specifications, and since there is no heatpipe solution on the GTX 580, the overall card size with the cooler does not exceed the standard 4.5 inch height. The cover for the GTX 580 has also been redesigned to allow for better air circulation through the heatsink area of the card, further cooling the GPU.

Click Image For a Larger One

The PCB design of the GTX 580 is essentially the same as the reference GTX 480 design. Of course we can see new components incorporated on the PCB, and also components removed. One of the most noticeable changes to the board is that the ventilation hole on the PCB is absent. The newly redesigned fan and heatsink design is supposed to take into consideration the fact that some users will use SLI systems, so the GTX 580 has been fine tuned to make sure there won’t be any ventilation problems even without the ventilation hole that we saw on the GTX 480.

Also, as expected, the GTX 580 has 6-pin and 8-pin power connectors which supply a maximum of 75 + 150W = 225W of power, and two SLI connectors enable the user to use the GTX 580 in an up to 4-way SLI setup with the appropriate motherboard. We noticed that instead using latches to secure the plastic cover as was done on the GTX 480, Nvidia decided to use screws to tighten the top cover of the GTX 580.

Click Image For a Larger One

This is Nvidia’s best cooling design to date. The new vapor chamber cooling is designed to provide extra cooling to make sure the GPU does not overheat. The new GF110 GPU is also designed to withstand up to 97C temperatures before it is down-throttled. The GF100 was designed to widthstand up to 105C before down throttling started. Just from looking at the reference board, we can also see that the GTX 580 does not have the capacitors in the middle of the card that we saw on the Galaxy GTX 480. Instead, they have been moved to the far edge of the PCB. Nvidia made improvements to the decoupling on the board to achieve higher clocks at a given voltage, which helps increase performance in a fixed power envelope.

Click Image For a Larger One

The vapor chamber is finally revealed. As we can see, it is very thin, but it is enough to allow the liquid to evaporate and cool down. It seems as though the thermal paste on the GPU is also better quality, to ensure that there is excellent contact between the GPU and the cooler’s base. The vapor chamber’s base is a copper base, with a smooth surface. While the base of the cooler is not mirror-finished, it is smooth enough to provide excellent contact with the GPU, especially with high-quality thermal paste in between.

The PCB cover can also be removed from the video card, which shows us that the memory, MOSFETs, and other components are also cooled through the cover of the video card. The cover is made of aluminum alloy, coated with black electrodeposit (this is designed to be electrically nonconductive). While the card was in operation, we could feel the cover transferring a fair amount of heat.

Click Image For a Larger One

This is the GF110 GPU. From first impressions, it looks like other aftermarket coolers should easily be compatible with the GTX 580 video card. The PCB design is very similar. We can also see that Nvidia once again uses Samsung memory chips on the PCB. The GTX 580 consists of six 64-bit memory controllers (384-bit) and 1536MB of GDDR5 memory.

Click Image For a Larger One

The GTX 580 is essentially the same size as GTX 480. The GTX 480 is actually slightly larger towards the end, as we can see in the pictures. Compared to the older Radeon HD 4870, which used to be ATI’s older top of the line single GPU video card, the Nvidia GTX 580 is a bit longer in overall size. The Palit GTX 460 Sonic Platinum video card is about 7.4 inches in lengtt, so this should give some great comparison between GPU sizes.

When we take a look at the GTX 480 and the GTX 580’s GPU cooler, we can definitely see the difference. It also seems as though the GTX 580 has some anti-vibration silicone padding around the cooler to prevent any vibration noise being generated by the cooler hitting against the video card’s cover.

Click Image For a Larger One

Finally, in these last 3 pictures, we can see some experimentation with how the video cards could be used in a system. Users should note that it is not possible to use a GTX 580 and a GTX 480 in SLI. We experimented with this (as shown in the first picture), but were only able to use the GTX 580 as the main dedicated video card, and the GTX 480 as a PhysX card. The second picture shows another option, where the GTX 580 could be used as a dedicated powerful PhysX card, with two other cards being used in SLI. Unfortunately we forgot to add the SLI bridge between the two GTX 460s in the second picture, but during the actual testing, we had everything set up correctly. The final picture shows the Nvidia GeForce GTX 580 running by itself in a single GPU setup.

Testing Methodology

The OS we use is Windows 7 Pro 64bit with all patches and updates applied. We also use the latest drivers available for the motherboard and any devices attached to the computer. We do not disable background tasks or tweak the OS or system in any way. We turn off drive indexing and daily defragging. We also turn off Prefetch and Superfetch. This is not an attempt to produce bigger benchmark numbers. Drive indexing and defragging can interfere with testing and produce confusing numbers. If a test were to be run while a drive was being indexed or defragged, and then the same test was later run when these processes were off, the two results would be contradictory and erroneous. As we cannot control when defragging and indexing occur precisely enough to guarantee that they won’t interfere with testing, we opt to disable the features entirely.

Prefetch tries to predict what users will load the next time they boot the machine by caching the relevant files and storing them for later use. We want to learn how the program runs without any of the files being cached, and we disable it so that each test run we do not have to clear pre-fetch to get accurate numbers. Lastly we disable Superfetch. Superfetch loads often-used programs into the memory. It is one of the reasons that Windows Vista occupies so much memory. Vista fills the memory in an attempt to predict what users will load. Having one test run with files cached, and another test run with the files un-cached would result in inaccurate numbers. Again, since we can’t control its timings so precisely, it we turn it off. Because these four features can potentially interfere with benchmarking, and and are out of our control, we disable them. We do not disable anything else.

We ran each test a total of 3 times, and reported the average score from all three scores. Benchmark screenshots are of the median result. Anomalous results were discounted and the benchmarks were rerun.

Please note that due to new driver releases with performance improvements, we rebenched every card shown in the results section. The results here will be different than previous reviews due to the performance increases in drivers.

Test Rig

Test Rig
Case	Silverstone Temjin TJ10
CPU	Intel Core i7 930 @ 3.8GHz
Motherboard	ASUS Rampage III Extreme ROG – LGA1366
Ram	OCZ DDR3-12800 1600MHz (8-8-8-24 1.65v) 12GB Triple-Channel Kit
CPU Cooler	Thermalright True Black 120 with 2x Zalman ZM-F3 FDB 120mm Fans
Hard Drives	4x Seagate Cheetah 600GB 10K 6Gb/s Hard Drives 2x Western Digital RE3 1TB 7200RPM 3Gb/s Hard Drives
Optical	ASUS DVD-Burner
GPU	Nvidia GeForce GTX 580 1536MB Galaxy GeForce GTX 480 1536MB Palit GeForce GTX460 Sonic Platinum 1GB in SLI ASUS Radeon HD6870 AMD Radeon HD5870
Case Fans	2x Zalman ZM-F3 FDB 120mm Fans – Top 1x Zalman Shark’s Fin ZM-SF3 120mm Fan – Back 1x Silverstone 120mm fan – Front 1x Zalman ZM-F3 FDB 120mm Fan – Hard Drive Compartment 1x Zalman ZM-F3 FDB 120mm Fan – Side Ventilation for Video Cards and RAID Card SAS Controller.
Additional Cards	LSI 3ware SATA + SAS 9750-8i 6Gb/s RAID Card
PSU	Sapphire PURE 1250W Modular Power Supply
Mouse	Logitech G5
Keyboard	Logitech G15

Synthetic Benchmarks & Games

We will use the following applications to benchmark the performance of the Nvidia GeForce GTX 580 video card.

Synthetic Benchmarks & Games
3DMark Vantage
Metro 2033
Stone Giant
Unigine Heaven v.2.1
Crysis
Crysis Warhead
Endless City
HAWX 2
Mafia II – PhysX

Crysis v. 1.21

Crysis is the most highly anticipated game to hit the market in the last several years. Crysis is based on the CryENGINE™ 2 developed by Crytek. The CryENGINE™ 2 offers real time editing, bump mapping, dynamic lights, network system, integrated physics system, shaders, shadows, and a dynamic music system, just to name a few of the state-of-the-art features that are incorporated into Crysis. As one might expect with this number of features, the game is extremely demanding of system resources, especially the GPU. We expect Crysis to be a primary gaming benchmark for many years to come.

The Settings we use for benchmarking Crysis

The Nvidia GeForce GTX 580 shows phenomenal gaming experience in Crysis at the 1680×1050 resolution. Never before has a single GPU been so powerful or acheived such scores at stock clocks. It is safe to say that the GTX 580 is at a level of performance where Crysis could easily be enjoyed without any lags at even higher resolutions like 1900×1200 with AA.

As we can see on the last 1900×1200 test with 2x AA, the actual game performance is still within excelleng gaming range, and though the user might not get the smoothest gameplay, it should be smooth enough for the game to be easily playable. With the GTX 480 falling behind by roughly 6FPS, users will no doubt have a better gaming experience with the GTX 580 video card. Unfortunately, the GeForce GTX 580 is not able to outperfom the Palit GTX 460 Sonic Platinum video cards in SLI, but with future Overclocking tools that will allow users to tweak the voltages on the GTX 580s, the performance will easily be reachable with overclocked (and overvolted) GTX 580s.

CRYSIS WARHEAD

Crysis Warhead is the much anticipated standalone expansion to Crysis, featuring an updated CryENGINE™ 2 with better optimization. It was one of the most anticipated titles of 2008.

The Settings we use for benchmarking Crysis Warhead

Crysis Warhead has been optimized to run smoother on even slower video cards. We can also see that SLI performance jumps quite high compared to the single GTX 580, which we have not seen very much in the Crysis benchmarks.

Even at 1920×1200, and 2x AA, the performance is still around 40FPS for the GTX 580, which will guarantee a very smooth gameplay, considering that the minimum FPS on the GTX 580 is above 30FPS. With the GTX 480, the FPS drops to 25.26FPS, which will make gameplay a bit difficult in action filled scenes. Based on testing so far, we notice that the AMD counterparts fall behind considerably. This makes the Nvidia GTX 580 the world’s fastest single GPU video card.

Unigine Heaven 2.1

Unigine Heaven is a benchmark program based on Unigine Corp’s latest engine, Unigine. The engine features DirectX 11, Hardware tessellation, DirectCompute, and Shader Model 5.0. All of these new technologies combined with the ability to run each card through the same exact test means this benchmark should be in our arsenal for a long time.

Unigine Heaven shows us exaclty what the GTX 580 is capable of. Since tesselation has been improved on the 580, it is not a surprise that it has a great advantage over the other video cards. Also, the extra clock frequency and CUDA cores add to the overall performance of the GTX 580. If we push all the video cards during extreme tessellation, we can easily see that the Fermi cards have a a great advantage over the AMD cards in tessellation. With more games being developed with high amounts tessellation, the Nvidia cards will have an advantage over the competitors.

Stone Giant

We used a 60 second Fraps run and recorded the Min/Avg/Max FPS rather than rely on the built in utility for determining FPS. We started the benchmark, triggered Fraps and let it run on stock settings for 60 seconds without making any adjustments of changing camera angles. We just let it run at default and had Fraps record the FPS and log them to a file for us.

Key features of the BitSquid Tech (PC version) include:

Highly parallel, data oriented design
Support for all new DX11 GPUs, including the NVIDIA GeForce GTX 400 Series and AMD Radeon 5000 series
Compute Shader 5 based depth of field effects
Dynamic level of detail through displacement map tessellation
Stereoscopic 3D support for NVIDIA 3dVision

“With advanced tessellation scenes, and high levels of geometry, Stone Giant will allow consumers to test the DX11-credentials of their new graphics cards,” said Tobias Persson, Founder and Senior Graphics Architect at BitSquid. “We believe that the great image fidelity seen in Stone Giant, made possible by the advanced features of DirectX 11, is something that we will come to expect in future games.”

“At Fatshark, we have been creating the art content seen in Stone Giant,” said Martin Wahlund, CEO of Fatshark. “It has been amazing to work with a bleeding edge engine, without the usual geometric limitations seen in current games”.

In the Stone Giant benchmark, the GTX 580 challanged the GeForce GTX 460s in both resolutions. The two results are very close.

Endless City

We used a 60 second Fraps run and recorded the Min/Avg/Max FPS rather than rely on the built in utility for determining FPS. We started the benchmark, triggered Fraps and let it run on stock settings for 60 seconds with the AutoPilot ON. We just let it run at default (1920×1200) and had Fraps record the FPS and log them to a file for us.

Endless City is one of the demo’s Nvidia is releasing today to show off their tessellation performance. The GTX 580 is capable of rendering up to 2 billion triangles created by tessellations in real-time. Endless City is supposed to test this by rendering the whole 2 billion triangles in real-time, and according to the benchmark, this is correct. The GTX 580 passed the 30FPS mark which is a good example of a real-time playback.

Metro 2033

Metro 2033 is an action-oriented video game blending survival horror and first-person shooter elements. The game is based on the novel Metro 2033 by Russian author Dmitry Glukhovsky. It was developed by 4A Games in Ukraine and released in March 2010 for the Xbox 360 and Microsoft Windows. In March 2009, 4A Games announced a partnership with Glukhovsky to collaborate on the game. The game was announced a few months later at the 2009 Games Convention in Leipzig; a first trailer came along with the announcement. When the game was announced, it had the subtitle “The Last Refuge,” but this subtitle is no longer being used.

The game is played from the perspective of a character named Artyom. The story takes place in post-apocalyptic Moscow, mostly inside the metro system where the player’s character was raised (he was born before the war, in an unharmed city). The player must occasionally go above ground on certain missions and scavenge for valuables.

The game’s locations reflect the dark atmosphere of real metro tunnels, albeit in a more sinister and bizarre fashion. Strange phenomena and noises are frequent, and mostly the player has to rely on their flashlight and quick thinking to find their way around in total darkness. Even more lethal is the surface, as it is severely irradiated and a gas mask must be worn at all times due to the toxic air. Water can often be contaminated as well, and short contacts can cause heavy damage to the player, or even kill outright.

Often, locations have an intricate layout, and the game lacks any form of map, leaving the player to try and find its objectives only through a compass – weapons cannot be used while visualizing it, leaving the player vulnerable to attack during navigation. The game also lacks a health meter, relying on audible heart rate and blood spatters on the screen to show the player how close he or she is to death. There is no on-screen indicator to tell how long the player has until the gas mask’s filters begin to fail, save for a wristwatch that is divided into three zones, signaling how much the filter can endure, so players must continue to check it every time they wish to know how long they have until their oxygen runs out. Players must replace the filters, which are found throughout the game. The gas mask also indicates damage in the form of visible cracks, warning the player a new mask is needed. The game does feature traditional HUD elements, however, such as an ammunition indicator and a list of how many gas mask filters and adrenaline (health) shots remain.

Another important factor is ammunition management. As money lost its value in the game’s setting, cartridges are used as currency. There are two kinds of bullets that can be found: those of poor quality made by the metro-dwellers, and those made before the nuclear war. The ones made by the metro-dwellers are more common, but less effective against the dark denizens of the underground labyrinth. The pre-war ones, which are rare and highly powerful, are also necessary to purchase gear or items such as filters for the gas mask and med kits. Thus, the game involves careful resource management.

We left Metro 2033 on all high settings with Depth of Field on.

Here comes the interesting part. As some of our readers might know already, Metro 2033 doesn’t have exceptional SLI support, certainly not as good as some other games on the market. Since we were running two GTX 460s in SLI, all the other benchmarks showed that the two GTX 460s were faster than a single GTX 580. With Metro 2033, where SLI support is not the best, we noticed that the GTX 580 was able to take the crown in these tests. This is understandable because with poor SLI support, the overall SLI score will be lower, and as a more powerful single GPU video card with more powerful tessellation performance, the GTX 580 performs better. The beautiful thing about the GTX 580 was also the fact that the fan ran very quietly, making it enjoyable to play Metro 2033. With two GTX 460s in the computer, even though Palit designed their cards to be quiet, two of the same video cards seem to make twice the amount of noise, making it more enjoyable to play video games with the GTX 580 instead on an SLI setup.

3DMark Vantage

For complete information on 3DMark Vantage Please follow this Link:

www.futuremark.com/benchmarks/3dmarkvantage/features/

The newest video benchmark from the gang at Futuremark. This utility is still a synthetic benchmark, but one that more closely reflects real world gaming performance. While it is not a perfect replacement for actual game benchmarks, it has its uses. We tested our cards at the ‘Performance’ setting.

In 3DMark Vantage, we see a fantastic improvement in overall GPU score. The leap from the GTX 480 to the GTX 580 seems to be around 4600 points. That’s an absolutely phenomenal result.

HAWX 2

Tom Clancy’s H.A.W.X. 2 plunges fans into an explosive environment where they can become elite aerial soldiers in control of the world’s most technologically advanced aircraft. The game will appeal to a wide array of gamers as players will have the chance to control exceptional pilots trained to use cutting edge technology in amazing aerial warfare missions.

Developed by Ubisoft, H.A.W.X. 2 challenges you to become an elite aerial soldier in control of the world’s most technologically advanced aircraft. The aerial warfare missions enable you to take to the skies using cutting edge technology.

HAWX 2 did not show too much improvement in overall graphics, but tessellation to the hills and terrain showed significant improvement. There were points when the high level of geometry and the combination of high quality textures made the hills look quite realistic. However, it would have been more intresting to see tessellation being utilized in more places rather than just the terrain.

From checking out the scores on both graphs, it is clearly visible that the Nvidia GPUs with the extra PolyMorph units are able to perform better in tessellation demanding applications than AMD’s GPUs.

PhysX

To test for PhysX capabilities of the video cards, we have benchmarked Mafia II with PhysX set at High.

What we can see in this example is that when PhysX was disabled in Mafia II, the two Palit GTX 460s were able to achieve up to 72.6FPS during the benchmark. Once PhysX was turned on and set to High, and we set the GTX 580 as a dedicated PhysX card, the GTX 580 was able to increase the score by roughly 20FPS just because it was dedicated to calculate only PhysX. Without having a dedicated PhysX card, our scores were around 35FPS. We measured the percentage that the PhysX card was working at, and it was only showing 50%-60%. So the lower score results in Mafia II most likely come from imperfect support for PhysX, and also the extra geometry the video card has to render when extra objects are added to the scene for a better PhysX experience.

3D performance compAred to standard

A quick test has been done in Mafia II to determine how the GeForce GTX 580 performs in 3D compared to standard 2D settings.

The 3D performance was actually expected to drop exactly half ways when we ran the benchmarks, however it seems that with Mafia II, the GTX 580 was able to achieve a higher score in 3D than usual. This could be due to the fact that when 3D is applied, the old PhysX calculations do not have to be recalculated again, but just rendered again on the main GPU (not dedicated for PhysX). This would be a good explanation for why we see such results in 3D performance in Mafia II.

Overclocking

Overclocking the GTX 580 was a snap, but its full overclocking potential is not yet possible as the current overclocking tools still do not have the capability to change voltages. Because of this, we tried to get as high as possible without changing any of the voltages on the card. The maximum we could get was 836MHz on the GPU clock speed, 1112MHz on the Memory, and 1672MHz on the Shader. This is a little bit of improvement and got us closer to beating the GTX 460s in SLI.

Here are some performance tests after the GTX 580 was overclocked:

Video Card

FPS – Unigene Heaven 2.1 / Crysis Warhead

1920×1200 Res (Extreme Tesselation for Unigine Heaven)

GeForce GTX 580

44.5FPS / 39.46FPS

GeForce GTX 580 OC

47.9FPS / 42.52FPS

Palit GTX 460 SP in SLI

49.0FPS / 46.39FPS

EDIT (11/10/2010)

MSI Afterburner 2.1.0 Beta 4 can now overvolt the GTX 580 and also control the fan speed.

Max stable OC without any artifacts and temperatures within GPU specs:

GPU: 933MHZ

Memory: 2275MHz

Voltage: 1235mv

Fan Speed: 85%

Ambient Temp: 23C

GPU Temps: 93C

Performance increase:

Stock GTX 580 in Unigine Heaven at 1920×1200 with Extreme Tessellation: 44.5FPS

Overclocked GTX 580 at 1235mv 933MHz/2275MHz at 1920×1200 with Extreme Tessellation: 52.6FPS.

Stock GTX 480 at the same settings: 37.9FPS. That’s a 14.7FPS increase from the stock GTX 480.

TEMPERATURES

To measure the temperature of the video card, we used MSI Afterburner and ran Metro 2033 for 10 minutes to find the Load temperatures for the video cards. The highest temperature was recorded. After playing for 10 minutes, Metro 2033 was turned off and we let the computer sit at the desktop for another 10 minutes before we measured the idle temperatures.

Video Cards – Temperatures – Ambient 23C	Idle	Load (Fan Speed)
2x Palit GTX 460 Sonic Platinum 1GB GDDR5 in SLI	31C	65C
Palit GTX 460 Sonic Platinum 1GB GDDR5	29C	60C
Galaxy GTX 480	53C	81C (73%)
Nvidia GeForce GTX 580	39C	73C (66%)

The Nvidia GTX 580 shows a significant temperature decrease during Idle temperatures. While the Load temperature is 8C lower than what the GTX 480 reached, we can also notice that the fan speed on the GTX 580 was also lower than the GTX 480. Overall, the cooling solution for the GTX 580 was very well designed along with the improvements and optimizations of the GF110 chip.

POWER CONSUMPTION

To get our power consumption numbers, we plugged in our Kill A Watt power measurement device and took the Idle reading at the desktop during our temperature readings. We left it at the desktop for about 15 minutes and took the idle reading. Then we ran Metro 2033 for a few minutes minutes and recorded the highest power usage.

Video Cards – Power Consumption	Idle	Load
2x Palit GTX 460 Sonic Platinum 1GB GDDR5 in SLI	315W	525W
Palit GTX 460 Sonic Platinum 1GB GDDR5	249W	408W
Nvidia GTX 460 1GB	237W	379W
Galaxy GTX 480	248W	439W
Nvidia GeForce GTX 580	225W	439W
Nvidia GeForce GTX 580 OC (Stock Voltage)	232W	461W
ASUS Radeon HD6870	235W	375W
AMD Radeon HD5870	273W	454W

We can see that the GeForce GTX 580 does not perform better in power consumption while it is under full load. However, it does perform about 23W lower than the Galaxy GTX 480.

Conclusion

The Nvidia GeForce GTX 580 (GF110) is the Fermi that Nvidia should have launched 7 months ago. The launch of the GeForce GTX 480 showed that while there was great potential in the Fermi lineup, the GF100 was not fine tuned enough to perform the way most users wanted. Nvidia made many improvements and optimizations on the GF110 core, to make sure that the launch of the GTX 580 would win back those gamers that were dissapointed with the GTX 480. The Nvidia GeForce GTX 580 provides the user with 32 more CUDA cores for a total of 512 CUDA cores. Both the 512 CUDA cores and the 16 PolyMorph rendering units provide excellent performance increase in today’s DX11 games in tessellation and other 3D rendering. The GTX 580 also offers higher clock frequencies for the GPU, Shader clock, and memory clock frequencies, allowing the gamers to squeeze even more performance out of the GPU during gaming on high resolutions and graphics settings. But that’s not all. The best part of the GTX 580 is that Nvidia was able to pull off all of this while also redesigning their cooler to use a vapor design for better cooling potential and a very quiet gaming environment. The video card does not run hotter because of the quieter design. In fact, it is actually able to perform much cooler than the previous GTX 480 under load. Nvidia has truly managed to bring us the fastest GPU on the planet, while also maintaining reasonable noise levels and temperatures on their card.

On the other hand, the GTX 580 still seems to be on the hungry side for power consumption. While we were able to get the extra performance from the GTX 580, the actual power consumption levels that we measured were still the same as the GTX 480. We made sure we were measuring the power consumption under the same testing environments as the GTX 480, and while playing Metro 2033. This allowed us to get the most real life power measurements.

Overall, while the Nvidia GeForce GTX 580 is the fastest single GPU video card on the planet, the AMD 5970 still takes the crown as the fastest dual GPU video card. The GeForce GTX 580 will take the GTX 480’s position from now on, and will be available for $499 SEP.

Bjorn3D.com Bjorn3d.com – Satisfying Your Daily Tech Cravings Since 1996

Nvidia GeForce GTX 580 (GF110)

Introduction

Nvidia’s New Demos

The GF110 Architecture – Improved / Optimized FERMI

Cooler Design

Specifications & Features

What is CUDA?

PhysX

Direct Compute

DirectCompute Support on NVIDIA’s CUDA Architecture GPUs

OpenCL

The GTX 580