NVIDIA GeForce3 Investigation: What NVIDIA didn't tell us
by Kert Chian on May 15, 2001 1:18 AM EST- Posted in
- GPUs
The Performance Impact of Vertex Shaders
In the 'Dronez rolling demo', enabling vertex shaders has the effect of doubling or tripling GeForce3's triangle load (Figure 8). Triangle loads with vertex shaders enabled and disabled average 15300 and 7800 triangles per frame respectively. It must be emphasized that the degree of tesselation and quality of animation is similar in either mode. The more likely explanation to account for the divergent triangle counts is that in vertex shader mode, streams of vertex data are retrieved in multiple passes. Due to the limited space with which to store vertex attributes on the graphics processor, it is concievable that data may be retrieved in multiple passes for keyframe interpolation (two or more positions), vertex blending (matrices) and lighting (texture space coordinates). This does not necessarily imply that absolute geometry bandwidth is increased by a factor of two to three, provided that components of vertex data rather than the entire contents are retrieved during each pass.
Figure 8: Triangle counts over 9000 plus frames of the 'Dronez rolling demo'.
Bump mapping enabled.
The rate at which vertices are modified is heavily dependent upon the instruction length of the vertex shader, in contrast to the invariant rate of hardwired transformation. GeForce3 executes one instruction in one cycle. To put this in perspective, consider that a simple transform with a six-instruction vertex shader processes 3.3 million vertices in one second. On the other hand, GeForce3's fixed transformation pipe is capable of approximately 16.6 million vertices per second (as measured on 3D Mark 2001). This dependency on instruction length is reflected in the fillrate graphs in figure 9. A vertex shader that only controls keyframe interpolation and vertex blending is faster than one which adds per pixel lighting after these animation routines. Having said this, figure 9 indicates that GeForce3 actually does an excellent job of relieving the central processing unit of vertex operations. At high resolution, the bottleneck shifts to memory bandwidth bandwidth.
Figure 9: Dronez rolling demo. GeForce3 - vertex shader enabled. GeForce2
- vertex shader disabled.
Resolution (16-bit color) | 640x480 | 800x600 | 1024x768 | 1280x1024 | 1600x1200 |
GeForce2, bump mapping |
66.05 |
64.36 |
63.92 |
61.14 |
56.15 |
GeForce2 |
94.5 |
93.35 |
92.22 |
87.96 |
79.76 |
GeForce3, bump mapping |
112.45 |
104.18 |
98.46 |
82.67 |
69.56 |
GeForce3 |
144.46 |
141.6 |
141.17 |
124.29 |
104.96 |
0 Comments
View All Comments