It's cooled down a little for the night so I've done some more initial experimenting with my new lightmap representation idea.
There are a number of interesting problems this kicks up, and the main one is - what is the most efficient texture size for holding a grid of all lightmaps used?
We're definitely talking about multiples of 128 here, although it could become multiples of 64 or even 32 (I think the default max extents lets us go this low) - these open up the possibility of interesting tradeoffs between number of tiles, tile size for updates and number of updates needed (each tile is one update). It's also possible to read the max extents used by a map at load time and tune the tile size based on that.
Currently I have a rough calculation going on, which takes the square root of the number of lightmaps (each is a single tile), rounds it down, and that's the number of tile rows; the number of tile columns then becomes whatever number which, when multiplied by that, is equal to or higher than the number of tiles needed.
It works fairly well, is moderately faster than the texture array on my AMD, but could it be better? What about square textures? Powers of two? Performance on other hardware? Is there something useful to be gained by accepting the extra space overhead in exchange for performance? What about non-square tiles, even? Or are we entering the realms of micro-optimization here?
Whatever the answers are, it seems necessary to know, but for now the tiling setup I'm using has (a) proven itself by being no slower than the old array method, and (b) turned out to be easily interchangable with other layouts, so fortunately I don't need to delay proceeding with the next steps pending discovery of what those answers are.
Wednesday, June 27, 2012
Representing Lightmaps
Posted by
mhquake
at
10:05 PM
Subscribe to:
Post Comments (Atom)
6 comments:
my concerns would be hardware limits. you'd still need to retain multiple lightmaps.
a banshee for instance as a max texture size of 256*256. you can say 'noone will ever run this on a banshee', but you could also say 'what if I run this in 15 years and get 65k*65k max dimensions'. HUGE textures, but hey, only a single batch!
Realistically, you only need one lightmap per wall texture. There's not much point in exceeding your buffer sizes (lots of hardware has a 16bit index limit, so 32bit indicies need special handling by drivers). Now, if you're only using 16bit indicies anyway, there's not much point in shoving more than 65k worth of verticies into that lightmap image, so would huge mega textures actually be advantageous, except for a single batch that straddles your lightmaps because you ran out of space and had to create an entirely separate huge mega texture?
The whole thing makes me think of rage's megatexture stuff.
If you limit the texture width to 1024, that's 1/64th of the texture switches. Is switching lightmap textures really expensive enough to warrent larger textures?
Note that certain DP maps allow quite large extents, so I'd not really recommend reducing that capability.
Increasing it? Perhaps, but what for? Your dlight rect calculations will suffer.
Making the size of your megalightmaps variable at least means you can support banshees. Also it means that if you come across dodgy drivers that hate glTexSubImage and read the entire thing back to ram each time, then at least you have the option to reduce it down to 128 again.
Its only the actual upload code that cares (besides map changes).
Which is the aproach I'll probably be trying at some point. :)
If I was a mapper, I'd say something along the lines of 'can we have higher resolution lightmaps now?'. But I'm too lazy to think of that, so I won't mention it.
My lingering concern with lightmaps is how to render rbsp/fbsp maps properly, efficiently, and with glsl.
Much of this is groundwork for a switch over to compute shader updates so running on a banshee is not really a concern... (do people still even use those - eek!) - nor are weird drivers that misbehave with TexSubImage calls.
It's also being guided by target hardware. I've set a minimum of D3D10 class for this setup, so that means that I can afford to require a higher texture size - D3D10 specifies that if you can create a device at all then you're guaranteed at least 8192x8192 textures (11 guarantees 16k) - not that you'd use that much for an id1 map; 512x512 is enough.
There are a number of factors in lightmap textures. Having to break batches at arbitrary points in the render and texture switching do slow things down, but the bigger killer is dynamic updates scattered over multiple lightmap textures. There are some maps where a single dynamic light can hit almost every single lightmap texture used by the map during an update, and that hurts - a lot (especially if you're adding dynamic lights to .bsp brush models). No, one big texture won't make it hurt less, but CS updates will (free multithreading, no contention, no stalls, etc).
One per wall texture is sufficient (changing 2 textures in D3D10/11 is no slower than changing 1 - provided they change at the same time) but one for everything simplifies the code a hell of a lot. Most Quake maps will fit in the 16-bit index limit anyway, even with this (all id1 maps do), and again D3D10/11 guarantees 32-bit indexes to be available.
For rect calculations one can make use of the optional "offset" param to texture.Sample which is uint2 - no FP precision worries (again, D3D10/11 guarantee that all internal FP ops happen at 32-bit precision so you don't need to worry about drivers pulling a sneaky one on you there).
This isn't really something that would be proposed as a general-purpose replacement for traditional lightmaps, but within the correct environment it seems quite viable - at least to the extent that it's worth trying.
yeah, old hardware isn't that common, however with FTE I'm trying to target gles1+gles2 as well as just gl, so I'm really not a fan of using texture arrays (other than for optional rtlight features).
I can see that combining multiple batches into a single texture to avoid splitting batches is a really good thing for large/complex maps, but I hope the dlights are small enough that they don't update too much at once.
threads are also an option for lightstyle (and dlight) animations. If its only using a quater of my cpu power, and wasting the gpu on things *other* than blitting when a spare cpu core can do it instead, then its an overall loss.
not ideal for gles type devices though, but you don't even get compute shaders there. On the plus side, tegra is supposed to have multiple (usable) arm cores, so even on phones it might be handy.
if the worker thread updates a pbo, and the main thread does the actual merging, it should be okay. not that I've done much with pbos.
d3d11 and gl4 are probably quite similar to each other, but I cannot say I have much experience regarding either, beyond d3d9 and gl3.3. I would quite like to add a d3d11 renderer to fte, but for now I lack the motivation for the intitial steps.
On the plus side, at least there's no specific code for map/model formats, though I'm not really sure how to generate hlsl from q3 shaders - this is already a somewhat major problem with the gles2 version of fte. :P
But yeah, I do want to try megalightmaps in FTE, just to see if marcher+rmq performs any better. I just think I'll find them easier to maintain considering the other requirements of fte's backends.
Maybe I'll manage to come up with a good way to merge batches properly in fte. The problem is at least all in the loader, right now at least.
Still, should be interesting.
One thing I've been itching to try for a long time is generating GLSL/whatver shaders from Q3A shaders. It just seems like a really interesting project and worth doing. Astonishing that nobody seems to have done it yet.
let me know where to get a copy from if you do. :P
the only real need for such that I've seen so far is with gles2 platforms. specifically nacl.
android supports gles1 well enough, generally, to be able to get by (I finally got android+gles2 working in the emulator! 30fps with both cpu and gpu acceleration... its still slow, but 10 times faster than it was at least).
desktop gl will probably always provide a compatibility context anyway.
gles2 on the desktop is generally pointless (to use in fte: vid_gl_context_es 1; vid_gl_context_version 2 - requires the proper gl extension).
d3d9 has fixed function still
d3d10/d3d11: no idea.
I think that some multi-pass blending modes where the initial pass is not opaque may result in some interesting issues that can only be solved with multiple ?lsl passes, or a screengrab beforehand.
Also, avoiding compiling the exact same shader once for every wall texture is a good optimisation to make.
In theory it should be quite simple. You've got well-defined Q3A shader ops (TC_MOD, etc), a coupla waveforms and a list of shaders that can be walked through and parsed at load time. The different shader for each wall texture problem - yeah I can see how that could happen. Some form of hash of the GLSL code would be needed to avoid that.
The main interest I'd have would be quality - anything involving a waveform or sqrt/normalize isn't going to work well per-vertex so moving those ops to per-fragment would be a much better result. You can quite easily see artefacts from that all over Q3A.
Fixed pipeline is dead in D3D10/11 - even some states (like alpha test) were removed.
Post a Comment