I've been benchmarking various implementations of glReadPixels, to get a feel for what kind of performance hit I'm going to take by using it for getting the depth buffer. It's not that bad at all. Here's some findings and observations:
- glReadPixels performance is comparable to using occlusion queries. The performance hit is almost identical. As I'm going to be coding this path anyway, I now have no reason to code a second path that uses occlusion queries. I'll be going glReadPixels all the way.
- There is no difference between placing the glReadPixels call before the main render or after it. This will be caused by the fact that I'm not using glFinish, so I deduce that a full pipeline flush is happening in both cases.
- Placing a few glFlush calls throughout the main render (e.g. after each texture chain and alias model that is rendered) can dramatically reduce the performance impact, as there is less of a pipeline stall when the time comes to do the glReadPixels. This is the single most effective thing that helps performance - without glFlush I lose 23% FPS, with it I only lose 5%. It's well worth investigating this more to find the optimal amount of calls (and places to put them).
- I'm using Jay Dolan's recursion avoidance technique, so I'm only doing a glReadPixels on each frame that I do a full recursion on; otherwise I assume that the previous frame's depth buffer is good to work with. This doesn't help performance as much as the glFlush technique (about 1% to 2% gain).
- There is no difference between reading a 10 x 10 chunk and a 64 x 64 chunk; the performance impact comes more from the pipeline flush than the size of the buffer that is read back.
- I only do a glReadPixels every other frame, the rationale being that there's not going to be much in the way of difference between the depth buffers for 2 consecutive frames (at least for the purposes of this exercise). This virtually eliminates the performance hit.
- Benchmarks were all done using a timedemo, meaning that the recursion avoidance and usefulness of glFlush will be somewhat less than in real gameplay.
The next part will involve actually putting something into the depth buffer that is captured!