Visiteur
| Jan 29, 2003
<br /> ------------
<br /> NV30 vs R300, current developments, etc
<br />
<br /> At the moment, the NV30 is slightly faster on most scenes in Doom than the
<br /> R300, but I can still find some scenes where the R300 pulls a little bit
<br /> ahead. The issue is complicated because of the different ways the cards can
<br /> choose to run the game.
<br />
<br /> The R300 can run Doom in three different modes: ARB (minimum extensions, no
<br /> specular highlights, no vertex programs), R200 (full featured, almost always
<br /> single pass interaction rendering), ARB2 (floating point fragment shaders,
<br /> minor quality improvements, always single pass).
<br />
<br /> The NV30 can run DOOM in five different modes: ARB, NV10 (full featured, five
<br /> rendering passes, no vertex programs), NV20 (full featured, two or three
<br /> rendering passes), NV30 ( full featured, single pass), and ARB2.
<br />
<br /> The R200 path has a slight speed advantage over the ARB2 path on the R300, but
<br /> only by a small margin, so it defaults to using the ARB2 path for the quality
<br /> improvements. The NV30 runs the ARB2 path MUCH slower than the NV30 path.
<br /> Half the speed at the moment. This is unfortunate, because when you do an
<br /> exact, apples-to-apples comparison using exactly the same API, the R300 looks
<br /> twice as fast, but when you use the vendor-specific paths, the NV30 wins.
<br />
<br /> The reason for this is that ATI does everything at high precision all the
<br /> time, while Nvidia internally supports three different precisions with
<br /> different performances. To make it even more complicated, the exact
<br /> precision that ATI uses is in between the floating point precisions offered by
<br /> Nvidia, so when Nvidia runs fragment programs, they are at a higher precision
<br /> than ATI's, which is some justification for the slower speed. Nvidia assures
<br /> me that there is a lot of room for improving the fragment program performance
<br /> with improved driver compiler technology.
<br />
<br /> The current NV30 cards do have some other disadvantages: They take up two
<br /> slots, and when the cooling fan fires up they are VERY LOUD. I'm not usually
<br /> one to care about fan noise, but the NV30 does annoy me.
<br />
<br /> I am using an NV30 in my primary work system now, largely so I can test more
<br /> of the rendering paths on one system, and because I feel Nvidia still has
<br /> somewhat better driver quality (ATI continues to improve, though). For a
<br /> typical consumer, I don't think the decision is at all clear cut at the
<br /> moment.
<br />
<br /> For developers doing forward looking work, there is a different tradeoff --
<br /> the NV30 runs fragment programs much slower, but it has a huge maximum
<br /> instruction count. I have bumped into program limits on the R300 already.
<br /> As always, better cards are coming soon.
<br /> -
<br /> Doom has dropped support for vendor-specific vertex programs
<br /> (NV_vertex_program and EXT_vertex_shader), in favor of using
<br /> ARB_vertex_program for all rendering paths. This has been a pleasant thing to
<br /> do, and both ATI and Nvidia supported the move. The standardization process
<br /> for ARB_vertex_program was pretty drawn out and arduous, but in the end, it is
<br /> a just-plain-better API than either of the vendor specific ones that it
<br /> replaced. I fretted for a while over whether I should leave in support for
<br /> the older APIs for broader driver compatibility, but the final decision was
<br /> that we are going to require a modern driver for the game to run in the
<br /> advanced modes. Older drivers can still fall back to either the ARB or NV10
<br /> paths.
<br />
<br /> The newly-ratified ARB_vertex_buffer_object extension will probably let me do
<br /> the same thing for NV_vertex_array_range and ATI_vertex_array_object.
<br />
<br /> Reasonable arguments can be made for and against the OpenGL or Direct-X style
<br /> of API evolution. With vendor extensions, you get immediate access to new
<br /> functionality, but then there is often a period of squabbling about exact
<br /> feature support from different vendors before an industry standard settles
<br /> down. With central planning, you can have "phasing problems" between
<br /> hardware and software releases, and there is a real danger of bad decisions
<br /> hampering the entire industry, but enforced commonality does make life easier
<br /> for developers. Trying to keep boneheaded-ideas-that-will-haunt-us-for-years
<br /> out of Direct-X is the primary reason I have been attending the Windows
<br /> Graphics Summit for the past three years, even though I still code for OpenGL.
<br />
<br /> The most significant functionality in the new crop of cards is the truly
<br /> flexible fragment programming, as exposed with ARB_fragment_program. Moving
<br /> from the "switches and dials" style of discrete functional graphics
<br /> programming to generally flexible programming with indirection and high
<br /> precision is what is going to enable the next major step in graphics engines.
<br />
<br /> It is going to require fairly deep, non-backwards-compatible modifications to
<br /> an engine to take real advantage of the new features, but working with
<br /> ARB_fragment_program is really a lot of fun, so I have added a few little
<br /> tweaks to the current codebase on the ARB2 path:
<br />
<br /> High dynamic color ranges are supported internally, rather than with
<br /> post-blending. This gives a few more bits of color precision in the final
<br /> image, but it isn't something that you really notice.
<br />
<br /> Per-pixel environment mapping, rather than per-vertex. This fixes a pet-peeve
<br /> of mine, which is large panes of environment mapped glass that aren't
<br /> tessellated enough, giving that awful warping-around-the-triangulation effect
<br /> as you move past them.
<br />
<br /> Light and view vectors normalized with math, rather than a cube map. On
<br /> future hardware this will likely be a performance improvement due to the
<br /> decrease in bandwidth, but current hardware has the computation and bandwidth
<br /> balanced such that it is pretty much a wash. What it does (in conjunction
<br /> with floating point math) give you is a perfectly smooth specular highlight,
<br /> instead of the pixelish blob that we get on older generations of cards.
<br />
<br /> There are some more things I am playing around with, that will probably remain
<br /> in the engine as novelties, but not supported features:
<br />
<br /> Per-pixel reflection vector calculations for specular, instead of an
<br /> interpolated half-angle. The only remaining effect that has any visual
<br /> dependency on the underlying geometry is the shape of the specular highlight.
<br /> Ideally, you want the same final image for a surface regardless of if it is
<br /> two giant triangles, or a mesh of 1024 triangles. This will not be true if
<br /> any calculation done at a vertex involves anything other than linear math
<br /> operations. The specular half-angle calculation involves normalizations, so
<br /> the interpolation across triangles on a surface will be dependent on exactly
<br /> where the vertexes are located. The most visible end result of this is that
<br /> on large, flat, shiny surfaces where you expect a clean highlight circle
<br /> moving across it, you wind up with a highlight that distorts into an L shape
<br /> around the triangulation line.
<br />
<br /> The extra instructions to implement this did have a noticeable performance
<br /> hit, and I was a little surprised to see that the highlights not only
<br /> stabilized in shape, but also sharpened up quite a bit, changing the scene
<br /> more than I expected. This probably isn't a good tradeoff today for a gamer,
<br /> but it is nice for any kind of high-fidelity rendering.
<br />
<br /> Renormalization of surface normal map samples makes significant quality
<br /> improvements in magnified textures, turning tight, blurred corners into shiny,
<br /> smooth pockets, but it introduces a huge amount of aliasing on minimized
<br /> textures. Blending between the cases is possible with fragment programs, but
<br /> the performance overhead does start piling up, and it may require stashing
<br /> some information in the normal map alpha channel that varies with mip level.
<br /> Doing good filtering of a specularly lit normal map texture is a fairly
<br /> interesting problem, with lots of subtle issues.
<br />
<br /> Bump mapped ambient lighting will give much better looking outdoor and
<br /> well-lit scenes. This only became possible with dependent texture reads, and
<br /> it requires new designer and tool-chain support to implement well, so it isn't
<br /> easy to test globally with the current Doom datasets, but isolated demos are
<br /> promising.
<br />
<br /> The future is in floating point framebuffers. One of the most noticeable
<br /> thing this will get you without fundamental algorithm changes is the ability
<br /> to use a correct display gamma ramp without destroying the dark color
<br /> precision. Unfortunately, using a floating point framebuffer on the current
<br /> generation of cards is pretty difficult, because no blending operations are
<br /> supported, and the primary thing we need to do is add light contributions
<br /> together in the framebuffer. The workaround is to copy the part of the
<br /> framebuffer you are going to reference to a texture, and have your fragment
<br /> program explicitly add that texture, instead of having the separate blend unit
<br /> do it. This is intrusive enough that I probably won't hack up the current
<br /> codebase, instead playing around on a forked version.
<br />
<br /> Floating point framebuffers and complex fragment shaders will also allow much
<br /> better volumetric effects, like volumetric illumination of fogged areas with
<br /> shadows and additive/subtractive eddy currents.
<br /> John Carmack |