Hardware-Assisted Occlusion Tests
Some modern video cards (GeForce3 and later, and most ATI recent boards) support specific calls to make occlusion detection easier by computing it on the hardware. Although these calls will never be a substitute for our occlusion code completely, they are a nice addition because code complexity is largely reduced. Let's examine how hardware-assisted occlusion detection works, and then reformulate some of our algorithms to take advantage of them.
Generally speaking, all video cards perform early Z-rejection these days. Z-buffer is checked for as early as possible, so subsequent passes (texturing, color interpolation, and so on) can be skipped for those fragments where the Z test fails. However, we still need to send the data to the video card, thus spending time on the bus transforming vertices, projecting, and so on. For hardware occlusion detection to really make a difference, we would need calls that would work on an object level and have the ability to reject many triangles at once, thus saving lots of effort. This is exactly what modern cards provide—tests that can be used on full objects. The idea is quite simple. First, we need to activate the occlusion query mode. Second, we send whatever geometry we want to test occlusions for. This geometry will not be painted, but only checked against the Z-buffer using speedy routines. Third, the call will tell us if the said primitive actually altered any pixels of the Z-buffer. It will even tell us how many pixels were updated, so we get a measure of relevance for the object. Now, we could use this approach for each triangle on an object, but then we wouldn't benefit much. In fact, performance would degrade due to more vertices being sent.
The real benefit from occlusion queries comes from using them with bounding objects, such as boxes or spheres. A bounding box is just 12 triangles, and by sending them and testing for occlusions, we can effectively avoid sending thousands. Complete objects can be rejected this way. So, here is the full algorithm:
for each object
activate occlusion query
send bounding box
deactivate occlusion query
if pixels were modified by the test
render object
end if
end for
We can further improve this approach by sending data front-to-back, and thus maximize occlusions. Closer objects will be painted earlier, and thus many primitives will be fully covered by them. Additionally, we can implement our occlusion tests in a hierarchical manner. If the bounding box of a set of objects is rejected, all bounding boxes from descendant nodes (such as in a quadtree) will be rejected as well, and thus we can prune the tree data structure. This last approach can be elegantly integrated with a clipping pass, resulting in a global visibility algorithm that performs all tests efficiently. Here is the pseudocode for such an algorithm:
Paint (node *n)
Sort the four subnodes using their distance to the viewer
for each subnode (in distance order)
if subnode is not empty
if subnode is not clipped
activate occlusion query
paint bounding box
if pixels were modified
paint object
end if
end if
end if
end for
When using occlusion queries, not all bounding volumes will offer the same performance. Spheres should be avoided because painting the bounding volume while performing the occlusion query pass will be costly. There are a lot of faces and vertices in a sphere. Boxes, on the other hand, offer tighter packing and require much less rendering effort.
For the sake of completeness, let's look at two working examples of hardware occlusion queries that could be implemented in OpenGL using a proprietary NVIDIA extension and in DirectX 9. Here is an implementation for a GeForce board using OpenGL:
Gluint queries[N];
GLuint pixelCount;
glGenOcclusionQueriesNV(N, queries);
for (i = 0; i < N; i++) {
glBeginOcclusionQueryNV(queries[i]);
// render bounding box for ith geometry
glEndOcclusionQueryNV();
}
for (i = 0; i < N; i++)
{
glGetOcclusionQueryuivNV(queries[i], GL_PIXEL_COUNT_NV, &pixelCount);
if (pixelCount > MAX_COUNT)
// render ith geometry
}
The same functionality comes built-in directly in DirectX 9. To access occlusion queries, we must create an IDirect3DQuery9 object with the call
CreateQuery(D3DQUERYTYPE Type,IDirect3DQuery9** ppQuery);
Here is a complete example:
IDirect3DQuery9 *myQuery;
g_pd3dDevice->CreateQuery(D3DQUERYTYPE_OCCLUSION, &myQuery);
myQuery->Issue(D3DISSUE_BEGIN);
// paint the object to test occlusions for
myQuery->Issue(D3DISSUE_END);
DWORD pixels;
while (myQuery->GetData((void *)&pixels, sizeof(DWORD), D3DGETDATA_FLUSH) == S_FALSE);
if (pixels>MAX_COUNT) {
// render the object
}
The philosophy behind the code is very similar to the OpenGL version. We send an occlusion query and render the object we want to test visibility for. Notice that occlusion queries are asynchronous (as in OpenGL, by the way). This means that GetData might be executed prior to the occlusion test actually returning any results, and hence the while loop. In the end, the GetData call returns the number of painted pixels, so we can use that information to paint geometry or not.
Now, some advice on hardware-assisted occlusion queries must be provided to ensure that you get good performance. Although the technique looks very powerful on paper, only careful planning will allow you to get a significant performance boost. Remember that rasterizing the occlusion query objects will take up some fill rate, so use these wisely. Make sure culling is on, and turn off any textures, lighting, and so on. You need your occlusion queries to be as fast as possible. You don't want to speed up rasterization at the cost of wasting fill rate in the query. This issue is less important with hierarchical occlusion queries. Build a bounding box hierarchy, and prune it with an occlusion query so large parts of the scene are culled away. Eliminating several objects with one test will definitely pay off the effort of rendering the occlusion query object.
Another interesting idea is to use occlusion queries with geometry you actually would render anyway. A good example here is multipass rendering. If we need to render the same object two or more times, we can activate the occlusion query for the first pass, and only if it returns a no occlusion result, render subsequent passes. This way we are not throwing away our precious fill rate because we would be rendering the object anyway. Another good idea is to substitute objects by lower triangle count approximations: Convex hulls can be a useful primitive in this situation.
|