Here we go. Like I said, this isn't really noticeable in static screenshots, but anyway...
The top one is the original shader, the bottom is my replacement.
First of all, ignore the framerate differences. This just comes from writing the image to file and sometimes it shows up in a screenshot, sometimes it doesn't.
Secondly, I did warn you that it wasn't very noticeable in screenshots.
The places to look are around the seat padding. This is actually a single flat surface in the map. Notice in the original that the bumpmapping is smudgy, blurred and quite indistinct, whereas in my replacement it's tight and well-defined. That's the key difference.
Wednesday, November 30, 2011
Here we go. Like I said, this isn't really noticeable in static screenshots, but anyway...
Posted by mhquake at 7:49 PM
Tuesday, November 29, 2011
I've been playing around a little with Doom 3's main interaction shader. This is something I likely could have done before the source release (although getting the env params from the vertex shader into the fragment shader may have been slightly tricksy), but it's fun to do it now.
So, pretty much everything that was in the vertex shader has now moved over to the fragment shader, normalization is done using math instead of the cubemap, some internal 0..1 clamping has been removed, the specular exponent has moved to math too, and the normal map has been renormalized after doing the texture lookup.
The end result is quite surprising, and looks somewhat different to the way the original game looked.
Before discussing that, it's fair to say that this is now quite a heavy fragment shader. Obviously running on Doom 3's original target hardware is out of the question; we need to jump quite a few hardware generations to get performance up again (not to mention better support for the higher instruction count). One you do get reasonable support, it runs quite well - minimal performance difference (confirming that the engine was CPU-bound on my machine) and locking at a fairly steady 60 fps (the cases where it drops below are ones where it would have done so previously anyway).
So the end result? The quality improvement is nothing short of astonishing. The original looked pretty grungy in places, but everything just smooths out now and looks really clean and solid. The bump mapping is particularly appealing, but specular highlights also tighten up and gain clarity and definition.
Did I mention bumpmaps? These turned out to be really really good. There's an extra level of quality here, and I'm frequently walking up to objects, walking around them, looking at them from different angles, trying to figure if they're actual real geometry or just bumpmapped. The original didn't survive this kind of close inspection.
Specular highlights have changed. As well as being tighter, sharper, and in no way smudgy, they're also a good deal brighter. Part of that was removal of the specular lookup texture and replacing it with a POW instruction, part of it was removal of some other clamping.
The rest of the game looks slightly darker, but it's a rich, solid-feeling darker, rather than a grubby/murky darker. I may still hack the light scale up a little.
All in all this is positive stuff. It's a worthwhile improvement to the look of the game, and is getting me more enthused about doing a mini-project with it. I'm already thinking through the ramifications of a D3D port, which would be a definite feature if I do decide to proceed.
One complicating factor is that the interaction shaders are game content, and don't fall under the GPL release. I have rewritten them significantly; the vertex shader is just simple pass-through now, and much of the fragment shader is different, but I still have doubts if even a port to HLSL would be legal (the vertex shader probably would be - I think it would be hard for anyone to claim copyright over a pass-through shader! - but the fragment shader is doubtful).
Interestingly, a solution to this question might lie in the NV20 and R200 paths. Neither of these use external shader scripts, so their setup and algorithms do fall under the GPL, and so a port of them, together with the modifications I've made, would be a legal option.
Posted by mhquake at 11:07 PM
Thursday, November 24, 2011
Just been looking at the stencil shadow code again. What's really annoying about this is that what Creative have patented essentially boils down to this:
-1 + 1 = 0 1 - 1 = 0 0 - 1 = -1 0 + 1 = 1This really comes through when you see it expressed using glStencilOpSeparate, you just see the lines of code stacked up, and it's so obvious.
Posted by mhquake at 10:02 PM
Still uncertain if I'm going to actually do anything serious with the Doom 3 source. Been playing around with it a bit more, and I find myself making what are relatively minor changes here and there - there may sometimes be quite a bit of work involved for sure, but nothing is leaping out as a "oh yeah, this would really improve the engine for me" type thing.
In other words, I don't actually have that much that I'd like to change about it; I can play using the original engine just fine and nothing really annoys me or makes me think that it needs fixing.
So far with the source code I've ported it to MSVC 2010 Express; this just involved removing all of the tools code (except for the AAS compiler which it still needs) and getting rid of a few other MFC-dependent things. This hugely reduced both compile times and the executable size to maybe one third of what they were before.
I've also played around a little with texture loading, which seems to be the main bottleneck in map load speeds. There are quite a few CPU-side passes over the texture data, and getting rid of some of them got me maybe 25% faster, but at the expense of less flexibility than the original had. It's OK.
Other stuff I've done is go over one of the shaders (the main Interaction shader), converting a normalization cubemap lookup to shader math (the code was already there but commented out) and getting rid of the red/alpha switch for normal maps (since I also got rid of the CPU-side switch). Nothing dramatic.
Finally I switched the stencil shadowing to use two-sided stencil for a good performance boost. The original also used two-sided however (it was removed from the released source) and with Carmack's Reverse in place it was still faster than even my souped-up version. I did successfully recreate the Reverse (based on a GLIntercept log of the original) but obviously that can't be released (even if I do release anything).
All in all it's looking as though there's not much in the way of what I consider fun in this code. With Quake 1 it was cool as there was just so much to fix, to tidy up, to rearchitect, to make better. Quake II less so; I do dabble from time to time alright. Subsequent code releases - it's just not there (that's why I've never done - for example - a Q3A project, although converting it's fixed-func "shaders" to real shaders seems like a fun project I may try sometime).
Glacial compile and load times make the turnaround time quite frustrating too, and the ultra-sluggish performance of MSVC 2010 isn't helping matters.
I'll probably play with it a little more just to see if anything cool comes out, but right now it's looking doubtful. Doom 3 is coming across more as a source to mine for ideas that I can use elsewhere rather than something I'd consider doing a serious project with at the moment.
Posted by mhquake at 8:09 PM
Wednesday, November 23, 2011
Now that the Doom 3 source is out, I've been having a quick look over it and deciding what on earth I'm going to do with it.
My initial objectives are to get a sane codebase to work from, so for now I'm just running through it and doing some reorganization here and here.
A few notes:
- The working codename for the Quake 3 engine was "Trinity"; Doom 3 is called "Neo". Cute.
- It's a Visual Studio 2010 project which is both good and bad. My own experience of 2010 is that C code compiles faster but C++ code compiles much slower. I'll likely port it to 2008 at some point in time.
- It won't compile with the Express Edition owing to use of MFC in the tools. This is a Bad Thing; compiling with Express is important for GPL source releases as it makes them more accessible to more people. I'll likely remove the tools from the code.
- The compiled Debug executable is 17 MB (!!!) compared to just under 6 MB for the released engine.
- I don't like the code structure they used; I prefer to have a single project and all source files in one subfolder, using the IDE to organize if necessary. Another possible change?
- Removal of the alternate renderer backends is an initial goal; in 2011 it should be ARB2 and ARB2 only.
- I see that they used the same "qgl" subsystem that began with Quake II. Grrrr.
- Yes, it runs clean.
Posted by mhquake at 7:15 PM
Monday, November 21, 2011
OpenGL is just a constant struggle against broken drivers. Right now I'm moving RMQ's code from old-style glEnableClientState/gl*Pointer calls to generic attribute arrays, basically because I have an extreme loathing for glClientActiveTexture (plus I also need them to be able to specify MDL positions as 4 bytes instead of 3 floats for a video RAM saving to offset that which I lost from lightmaps), and the suffering is quite extreme.
At the moment the main culprit is Intel. Yes, I want RMQ to run on Intel graphics, mainly because they support the extensions I use so it should be able to run on them. But Intel seem to have a really weird implementation of attribute arrays that constantly shuts down certain arrays, meaning that they're sensitive to the order you draw things in and you need some funky driver workarounds.
The code works fine on both AMD and NVIDIA so I'm reasonably confident that it's solid and that this is a driver bug. But I just wish that I could spend this time doing something, y'know, productive instead.
This area of the API is another case of +1 D3D and -1 OpenGL. D3D decouples the vertex layout from the data specification, which in practice gives really clean code. OpenGL doesn't, and it sucks.
Posted by mhquake at 10:28 PM
Thursday, November 17, 2011
The way things are curently going, it's looking like I'm going to be committing to GPU lightmaps in DirectQ as well. There are a couple of awesome things coming out of this code that make both the slight performance hit and the texture memory overhead a more than worthwhile tradeoff.
The smoother gameplay I mentioned in my previous post is the main one, and this on it's own is enough to swing the balance. Other things include some improved consistency with dynamic lights, as well as a great big huge dollop of code cleanliness.
The GPU lightmapping code is substantially cleaner than the old texture upload code; making no bones whatsoever about that. It's almost scary-clean by comparison. As we all know, cleaner code means easier maintenance and less bugs, which means in turn that it's possible to add new features without breaking things (or making an even bigger mess).
This is Important - capital "I".
For example, one feature request for RMQ for a while was higher resolution lightmaps. I had resisted this because I was terrified of the code mess involved, and even more terrified of the performance implications. We could have had frames requiring 100ms or more just to upload modified lightmaps. With GPU lightmaps it suddenly becomes not a problem; there are no runtime texture uploads any more so performance worries just don't exist.
So - most definitely a worthwhile thing to do, and a good illustration of what can be achieved if one is willing to bump the hardware requirements sufficiently.
Posted by mhquake at 8:50 PM
I'm porting my GPU lightmap setup to RMQ at the moment, but this is without commitment to any final decision. The objective is to try it out and see what happens when it's hit with some really heavy data sets. Some interesting things are falling out of this work.
For RMQ we're going to need 5 simultaneous textures to light the world; one for diffuse, 3 for lightmaps (each is one of the RGB channels and has 4 components representing a style) and optionally 1 for fullbright. My experimental setup got away with 4 because I was able to stash the fullbright info into the alpha channel of the diffuse texture, but that's not going to work with RMQ which must support external textures (and the fullbright may be at a different resolution) and textures with real alpha channels in them.
An interesting thing about this is that some older NVIDIA cards only support 4 simultaneous textures on the fixed pipeline, but more than that if the programmable pipeline is used. This will be an opportunity to put that to the test.
This would also put RMQ quite firmly into the "needs D3D9 class hardware" camp (even though it uses OpenGL; the point is that only D3D9 class hardware or above absolutely guarantees more than 4 simultaneous textures). Since we were already requiring ARB assembly shaders and making use of VBOs (but I wrote a software emulation layer for hardware that doesn't support VBOs) this isn't a particularly huge step up, but it is something that may influence a final decision.
Performance is interesting. I only have un-vised maps to test on at the moment, so a large part of it is not really representative of what a real player experience would be, but it is useful to see how things hold up under extreme strain (and even a vised version of one of these maps will stress your hardware like no other Quake map).
Even in scenes with heavy animating light it is slower than using texture uploads. I'm putting much of that down to a lot more fragment shader work going on per frame owing to the un-vised nature, but while I expect it to pick up after a vis, I'm currently not expecting it to get all the way.
The real reason why is that Quake doesn't actually upload new lightmaps every frame (by default it's 10 times per second). This should have been obvious, but I missed it.
However, that one is double-edged and the other side is that while performance is slower it also feels a lot smoother. Think back to my last paragraph: if we don't upload new lightmaps every frame then it follows that there are also frames in which we do upload lightmaps.
So what happened with the old way was that every 5 frames or so there would be a (relatively) massive hitch during which lightmaps get uploaded. Most frames would take, say, 4ms to run, but every fifth frame would need 15 or more ms. Quite ragged.
With GPU lightmaps that just goes away. Instead of 4ms it's now maybe 5ms, but it's consistent; there are no hitches.
Now let's look at lightstyle interpolation. If we interpolate animating lightstyles we get 15ms frames all the time under the old way, but GPU lightmaps are completely immune to that; still 5ms frames.
So what we're talking about here is a tradeoff, and it remains to be seen which side of it we'll come out on.
Posted by mhquake at 2:09 AM
Wednesday, November 16, 2011
Now that I've got my face stuck in Rage a little bit less, it's time to revisit the topic of GPU lightmap updating. You'll remember that I had made some inroads into this with a fork of DirectQ a while back. Lately I've been playing around with it some more in an experimental GLQuake build, and have pretty much cracked the problems I had encountered before.
Here's the important part:
!!ARBfp1.0 TEMP tex0, tex1, tex2, tex3; TEX tex0, fragment.texcoord, texture, 2D; # diffuse TEX tex1, fragment.texcoord, texture, 2D; # red TEX tex2, fragment.texcoord, texture, 2D; # green TEX tex3, fragment.texcoord, texture, 2D; # blue DP4 tex1.r, tex1, fragment.texcoord; # apply styles to red DP4 tex1.g, tex2, fragment.texcoord; # apply styles to green DP4 tex1.b, tex3, fragment.texcoord; # apply styles to blue ADD tex1, tex1, fragment.texcoord; # add dynamic light TEMP fb; MUL fb, tex0.w, tex0; MUL tex0, tex0, tex1; MUL tex0, tex0, 2.0; MAX result.color, tex0, fb; END
Yes, I like using ARB ASM shaders in OpenGL, the main reasons why being that they work on a wider range of hardware and require considerably less boilerplate code than GLSL.
So, the result is that performance is maybe 80% or so of what it was when there are no lightstyle animations (or dynamic lights), but performance drop off is zero when the animations or dynamics come in. That doesn't mean much for ID1 maps, but for scenarios where the mappers can go nuts with lighting it means a lot. So it's a tradeoff, but it seems a good one.
What it also means is that there is a system memory saving (as lightmap data no longer needs to be kept for glTexSubImage2D updates) (at the expense of extra video memory though) and nice features like lightstyle interpolation come completely for free. One of my test scenes renders in 20ms with lightstyle interpolation on; I can get it down to 5ms or less with this code.
Dynamic lights (such as muzzleflashes and explosions) get evaluated per-vertex on the CPU - the main reason being that it keeps shader code from requiring too many branches or instructions. It doesn't look as good as per-pixel (owing to interpolation artefacts) but it is fast and I reason that these lights are so short-lived that it doesn't really matter so much anyway.
Per-pixel could probably be done using attenuation maps or some other technique, but there are code complexity and performance implications to this. On balance, it's fine,
So will it make it into production code? I'm unsure at present; there are some advantages aside from just performance (the HDR stuff I had done becomes totally unnecessary, and a lot of code becomes much cleaner) but it does involve ripping apart an established codebase. I may yet fork DirectQ again and implement it, then make a call.
Posted by mhquake at 1:09 AM