Over the years I've evolved a number of different solutions to the problem of updating dynamic lightmaps. This is one of the major bottlenecks in Quake and - depending on your hardware - is capable of dropping framerates down to single digit in extreme circumstances.
To set the context, GLQuake (in it's multitextured path) did some of the worst things possible with lightmap updates, and Quake II took it up another notch. The Quake II example is most relevant here as it applies to modded Quake engines that support coloured light. The format used didn't directly map to how the texture was stored in hardware, meaning that the driver has to swap the R and B channels CPU-side before updating, and it called glTexSubImage2D once per surface that needed to be updated, and directly before the surface was drawn, meaning that each such call incurred a pipeline stall (and there may have been hundreds or thousands of these per frame).
I have no doubt that drivers of the time did some awful things behind the scenes to allow this to run well - for run well it did. But run it on a lower quality modern driver and what's really happening tends to slap you in the face pretty hard. A few nips and tucks, some code reorganisation, and things do get much better, but there is still plenty of room for improvement remaining.
The current DirectQ solution uses something evolved from GLQuake. Lightmaps are stored as 4-component textures, updated on the CPU as required, then bulk updated to the GPU before drawing. A scaling factor is stored in the alpha channel to compensate for clamping, and the pixel shader uses this to evaluate the final light level.
There are a number of problems with this approach. There is a lot of shuffling of data around different memory buffers CPU-side before the lightmap can go to the GPU, the scaling factor requires a division (which the GPU will translate into two operations - a reciprocal and a multiply) and there is a potential for pipeline stalls if the CPU needs to update a texture that is currently being used by the GPU for drawing with (at worst there will be a fraction of the stalls that GLQuake/Quake II suffered from).
Different choices of texture format can help relieve some of this. I've tested 32-bit textures using 10 bits each for R, G and B, and 2 bits for alpha (or unused), which provide some extra headroom for dynamic range, but it still doesn't go as high as Quake needs (dynamic lights in particular can overflow it), meaning that you need to do clamping. The spare 2 bits are not enough for a scaling factor, and the choice remains to sacrifice bits of precision. Reading from this type of texture GPU-side is also slower than reading from a traditional 4-component 8-bits-per-component texture, and it does nothing about the problem of shuffling data through two different CPU memory buffers before you can transfer it.
64 bit lightmaps are another solution. With 16 bits of precision per colour component you can Lock a texture rectangle and write directly into it; Quake lightmaps can still overflow even that, so you need to bring your d_lightstylevalue multiplier down to 1 (instead of 22) which means you also need a scaling factor. That gets rid of all of the CPU-side memory shuffling and clamping, but reading from this kind of texture on the GPU is slower than anything mentioned so far. On the other hand the scaling factor is only a multiply, it's constant, and you have 16 bits spare if you want to use them for anything (you could, e.g, pack 2 components of a surface normal in there and rebuild the full normal in shader code). On balance this is the fastest method of all, but - of course - 64 bit textures aren't available on all of the hardware that DirectQ currently targets.
I have two different GPU lightmapping solutions in use, one in the RMQ engine (OpenGL, publicly released) and one in an experimental engine (D3D11, unreleased). Of these the D3D11 method is superior in terms of both performance and quality, but comes at the cost of significantly higher hardware requirements (the RMQ one would likely run on a GeForce 3).
The current RMQ method uses 3 textures per lightmap, with an optimal case that can only need one. Each texture represents one of the R, G and B channels of the final composited lightmap, and each colour component of a 4-channel texture represents one of the surface styles (whoever said that colour had to represent colour, eh?) The d_lightstylevalue values are loaded into the vertex shaders constants registers once per frame, each surface vertex contains 4 lookups (one per style) into this, these are used to build a vertex shader output and from there it's then a simple matter of doing a DotProduct with this output and the texture lookup for each channel to get the final image. (The version of the engine released with the recent demo was slightly different, by the way - the code has evolved some since then.)
The D3D11 method uses 4 textures, one per style on the surface, with optimal cases of 3, 2 or 1. A broadly similar approach is then used, with appropriate differences where required - it's just a matter of how the data is used and which operations are performed. The case where a surface has 2 styles is faster in D3D11, the case where a surface has 4 styles is faster in RMQ. The D3D11 method obviously has a higher texture memory requirement too. (The RMQ optimal 1 case is essentially identical to the D3D11 optimal 1 case, and it makes this decision during load time.)
Where things really diverge is with dynamic lights. RMQ does dynamic lighting per-vertex, which is a compromise given it's lower hardware requirements. In practice it looks fine - most of the time. There are cases where the limitation of per-vertex lighting come through - when a relatively small light is reasonably close to a relatively large triangle, for example. It also involves sending a lot of data to the GPU every frame - 3 dynamic light colours per vertex - but it remains faster than updating textures dynamically.
The D3D11 code essentially replicates GLQuake's handling of dynamic lighting but runs it on the GPU rather than on the CPU. This needs shader branching, bitwise operations in shaders, higher instruction counts, more powerful hardware, but gives much better quality. It also requires loading dynamic light positions and colours (and the radius of each light) into the shader constants each frame (the minimum of 256 constants registers, and the availability of constant buffers, makes this painless; it's also not an option for RMQ as the GL extensions I use only guarantee 96 registers and RMQ needs more than the 32 dynamic lights that standard GLQuake - and the D3D11 code - uses).
Both of these are actually marginally slower than updating textures where no dynamic lights or animating lightstyles are used, but they completely avoid the problem of hitching and uneven framerates caused by texture updates (frametimes can jump from 3ms per frame to 15ms per frame 10 times per second).
On paper the D3D11 code should be slower, but D3D11's closer mapping to how modern hardware actually works, together with improvements in shader model 4.0/4.1 (I haven't tried 5) bring the performance right up to and above the RMQ method. The D3D11 method, of course, also doesn't require an extra transfer of data per surface vertex from CPU memory each frame.
I've an itch to try a deferred renderer (build on my D3D11 code) but I'm holding back on that as I want to keep the current code reasonably intact for use as a reference when I'm porting DirectQ to D3D11 (I haven't yet decided if I'll use GPU lightmaps or 64-bit textures for that, by the way). It's also the case that I want to do some more research and probably write a simple experimental demo with one light before I jump into it for real. This will of course be the slowest method of all, but a fully real-time lighting and shadowing solution does have it's appeal.
All in all there is no perfect solution for lightmaps (at least that I've found). Everything involves some degree of compromise and it's typically a case of: performance, quality, low hardware requirements - pick two.
Thursday, January 19, 2012
Musing on Lightmap Solutions
Posted by
mhquake
at
10:22 PM
Subscribe to:
Post Comments (Atom)
1 comment:
It seems that DirectQ doesn't work with the Reaper Bot well. It lags too much with some sound glitch.
Post a Comment