I took some time today to dig through the documentation on IDirect3DQuery9, and now have the naive version of occlusion queries successfully implemented in DirectQ. As I suspected I was doing things the wrong way, and also the information on what the right way was had been buried a little (although fortunately not as deeply as I had originally feared).
With the naive version we're basically issuing the query and testing for it's result in the same frame (immediately after issue actually, so there is no delay to allow the query to flush). This requires completely flushing the command buffer and stalling the GPU while fetching the result, which lops a clean 100 FPS off ID1 timedemos and effectively wipes out the speed increases I had gained since the alpha release in complex scenes. Useless in other words.
The next step is to implement the non-naive version. IDirect3DQuery9 is essentially set up as a mini-state machine, but it does not provide a means of testing it's state outside of directly querying the query (!) so I'm going to wrap it in either a struct or a class that will more or less manage that aspect of it for me.
If this doesn't bear fruit, I have found an article on software occlusion which treats each occluding object as a frustum and then uses frustum culling to manage the occlusion. Obviously a far more complex setup (not least because occluding objects can have more than 4 sides); at the very least I would need to spin it off in it's own thread and run it concurrently with the main render.
If you've turned on r_speeds 1 with the alpha release you may have noticed that the counts are quite a lot lower than with GLQuake. What I did here was increment each count for each DrawPrimitive rather than for each polygon that is rendered. With hindsight this was the wrong thing to do, and DirectQ r_speeds should be consistent with GLQuake (if for no other reason than to show the performance differences for any given set of values). I'm going to restore this to the way it should be for the beta.
I'll just make the excuse that I was interested in seeing how many Draw* API calls I had managed to save. ;)
Aaaaahhhhh, success. Sweet sweet success.
Here's the deal:
- I currently only have occlusion queries on static entities. It would be slightly less trivial to put them on permanent entities, but I'm going to go for it anyway - it's not a deal-breaker if I fail though (they don't go on the viewmodel at all - for obvious reasons).
- It's probably not worth the bother putting them on temp entities. Most of these will be knight, vore or wizard spikes, or nails or bolts, and will therefore pass the tests anyway.
- Trivial entities don't get occlusion queries run against them at all. Running queries requires some state-changes (which I can batch up, but they're still there) plus drawing a bounding box for each entity tested. The bounding box requires 12 triangles, so I've set the "trivial mark" at 24 triangles in the model just to be sure.
- By their nature the results will lag a few frames behind what you see on screen. In practice you don't notice at all; there are no models suddenly popping into view or anything like that. The alternative is worse (the naive implementation).
- I don't run occlusion queries at all during timedemos, so don't go looking for timedemo speedups here because you won't get them. Reasons why are firstly because they actually slow down timedemos, and secondly because a timedemo runs so fast that by the time the results come in the scene will have changed so much that they are well out of date.
- I currently don't have a cvar to disable them, but I'm inclining towards creating one.
I now lock at 72 in those ne_tower scenes.