Monday, July 2, 2012

If Plan B Fails...

So plan B for CS-based lightmap updates failed.  Plan A was to just update the lightmap texture directly in-place, which ultimately (once I had fixed up the skipped texels/etc) ran considerably slower than CPU-based lightmaps.  I had reasoned that the in-place update was a contributing factor to this, so for Plan B I decided to update a staging texture (not D3D11_USAGE_STAGING, but just a second copy of the lightmap array) then copy the changed portions across to the live texture; the reasoning being that having to unbind the live texture array from the PS pipeline, bind it to the CS pipeline, have two separate views on it (shader resource and unordered access), and then reverse the binding when done was not working well.

So with Plan B I got as far as doing the copy from staging to live (which - because they're both D3D11_USAGE_DEFAULT - happens on the GPU) and it benchmarked as no faster than my old CPU-based method.  Obviously adding the actual update code would pull things down even more, so I decided not to proceed any further.

The final solution ended up being a modification of my old CPU-based method.  With this I exploit the D3D11_MAP_FLAG_DO_NOT_WAIT option, which tells D3D that if the GPU is currently busy with a resource to not bother giving me a mapping on it, but just return immediately.  If that happens I don't bother updating the lightmap, but leave the surface properties dirty so that it will try again next frame.

Aside from some minor glitches with BSP models (which there was always going to be, and which I'll probably resolve by forcing the mapping if it fails two consecutive times) it works, and quite well too.  Resource contention and pipeline stalls are almost completely eliminated from the engine, and everything runs smooth and fast.

With hindsight the CS-based updates were most likely a CPU/GPU balancing issue.  Unlike most other Quake engines DirectQ puts a lot more work on the GPU, which is part of the reason why it runs so fast (a typical Quake engine only uses ~50% of your GPU's power, loading everything else on the CPU and meaning that a GPU upgrade tends not to give you as much of a performance increase as it should) but also exposes the GPU as somewhat more of a candidate for bottlenecks.  In short - because the GPU is already running at full capabilities, loading more work on it has a risk of making things slower.

Definitely an interesting learning experience overall.

2 comments:

mhquake said...

Still some glitches with the BSP models in ne_tower - I'm thinking that if I fail to get a mapping on the original texture I might just grab another one from a pool - there are some fiddly complexities to work out with this but it seems like it should be good.

mhquake said...

Unfortunately BSPs break the entire setup...