Friday, July 27, 2012

Going Public

I've written about this before and - while I ended up not doing anything specific at the time - it's something I feel sufficiently strongly enough about that I've recently revisited it, consolidated my thoughts, and made a decision.

Certain components of the next release of DirectQ will be fully, 100% public domain.

The license I've chosen for this is the unlicense - mostly because I think the way it states it's intent in legalese is kinda cute, but also because it's patterned after previous public domain licenses that have been proven effective.

In terms of software freedom this is my own preferred approach - unless you are also free to do the wrong thing, how can you even begin to call yourself "free"?  Sorry, but you're not.  This is a problem and contradiction at the heart of the GPL and it's something that I have no small measure of personal doubt over.

Yes, I'm aware of the inherent risk in this.  Yes, I accept it.  No, I don't subscribe to the full-on GPL ideology on account of the question I posed above, among other reasons, and I find no satisfactory answer in the GPL nor in any associated material.

Yes folks, public domain is GPL-compatible.  Let's not have anyone crying about "violation" because they think that the GPL is about what they would like it to be - actually read the thing first, please.

So what components?

As a general rule, where I am satisfied that a module (.cpp file) is 100% authored by me and me alone and with no reasonable traceable ancestry to the original code, I will be making it public domain.  That includes my MDL renderer, my matrix library, my particle system renderer (not the setup), my state manager, my texture manager (but not loaders for certain individual file types), my HLSL manager, my video startup code, and a few other things.

In cases where a module has definite traceable ancestry to the original code it will remain GPL.  An example would be my sky code - the old R_InitSky function is still there and is a sufficiently important part of the code that I cannot in all good conscience do anything other than retain the original GPL license.  Likewise my surface renderer still contains enough that's descended from the original code to require retention of the GPL.

In one or two cases I've been able to shift some lines of code or a function or two to another module in order to satisfy this requirement.  Yayy for "extern" as well...!

Of course I can't revoke any prior GPL releases of the same or similar code, so previous versions will remain as-is, but the move to D3D11 is an appropriate cutoff point beyond which things can be shown to be sufficiently changed, so this now becomes possible.

Monday, July 23, 2012

Resource Management

One of the more annoying things about Direct3D (but at least it's better in 10 and 11 than it was before) is the whole world of resource managment.  OpenGL makes this really easy - the driver effectively does much of the heavy lifting for you - but with Direct3D you really do need to be more careful and more specific about how you manage the lifetime of GPU resources.

My current scheme is quite simple (and simplistic) - each module registers itself on program startup and supplies 3 callback functions - OnCreate, OnDestroy and OnResize (the third is needed for resources such as render targets that need to be resized if the current video mode changes).  These then get called at various stages throughout the program as it runs, and everything else happens automatically.

Where it's weak is that I still need to be careful about creating and destroying new resources.  It is so easy to add a new (say) pixel shader to the OnCreate callback, forget to destroy it in OnDestroy, and end up with resource leaks.  What makes it worse is that a single resource leak can cause a whole pile of other stuff to also leak, including your Device, your Immediate Context and your Swap Chain.  Fortunately there are the D3D debug runtimes to help with that, and the ability to give each resource a unique name helps with identifying which resource leaked.  But things could and should be much better.

In an ideal world resources should manage their own lifetimes.  This would involve registering themselves with a global resource manager on creation and automatically destroying themselves when the Device is also destroyed.  None of that should interfere with the ability to freely create and destroy resources yourself as needs dictate, of course.

In the absence of a sane mechanism for handling this in D3D itself it's necessary to write your own.  I guess that this is a bit of code that everybody using D3D ends up writing for themselves at one time or another, and most implementations would have a hell of a lot in common, so that's a fairly strong signal that it needs to move from per-application and into the core D3D runtime itself.  This represents one of the few remaining "designed by a troop of monkeys on LSD" items left in D3D, and hopefully - especially now that the dreaded D3DERR_DEVICE_LOST is a thing of the past - it's something that will be addressed in a future version.

Small consolation for those of us stuck with it now, of course, but we can live in hope of a future time when things will be better.

Saturday, July 21, 2012

Update - 21st July 2012

I'm overdue an update so here's where things currently stand.

The D3D11 port is functionally complete, and has been for some time.  At present the focus has shifted to ironing out bugs, tweaking some things here and there, and sneaking in the occasional new feature.

I'm unfortunately still not 100% satisfied with my current implementations of either particles or lightmaps, but barring any sudden revelations I'm going to leave them stand as they are for now.

A fairly late-breaking performance problem with using external textures for HUD elements came to light a while back; I haven't yet tackled that.  I also need to fix up window handling a little better than it currently is.

Regarding the viability of using D3D11 (and remember that with feature levels D3D10 class hardware is also targetted), the latest news from the Steam hardware survey continues to encourage - we're now at over 90% market penetration for D3D10/11 class hardware.  See http://store.steampowered.com/hwsurvey/videocard/ for full details.

If it becomes necessary I can still backport to D3D10, but the only real reason for that would be if a sudden influx of hidden Vista users who never bothered installing updates were to come out of wherever they've been hiding.

I've already done a test backport about a month ago, just to confirm that it works, and have full confidence that there would be no issues in doing so.

The overall code quality is quite possibly the highest I've ever written (so far as Quake is concerned).  It's really really good and clean, and has become quite resistant to hard knocks.  I can't personally take much in the way of credit for that, as I had great assistance from some really good tools.  I've also really started taking advantage of the added expressiveness of C++ in many cases, and that's something I expect to continue with.

Monday, July 9, 2012

Ogg Support? I Don't Think So!

Today I started bringing on .ogg support in DirectQ, and I guess that you've probably guessed the end result by now. It's not going to happen for the foreseeable future.

Let me explain a bit here.

I've already excluded the traditional route of adding the 4 (or whatever) DLLs to the distribution - things are already enough of a mess with other source ports doing the same, and using conflicting DLL versions; I'm not going to make things worse. The same applies to anything else distributed in this manner.

Next I looked at stb_vorbis. On paper it looks just the ticket. A single .h file that you drop into your project, make some function calls, and everything happens automatically. Or rather, it doesn't, because you still have to interface it with a sound API, write your own streaming code, etc, and even then it fails to load the sample files from the Vorbis website.

On to FFMPEG. This is a close second-best, being a bunch of static .libs (the Windows distribution even comes with .ogg pre-compiled in) but the API is horrible and besides - the header files are set up for UNIX and don't work with Visual Studio.

The option of pulling something together from the Doom 3 source was considered. Sure it would bloat the executable and source code distributions, but something that works should be possible. You'd also need to include all of the idLib stuff and write some public interfaces of your own.

At this point I stopped.

To quote from something someone else once said a long time ago in a galaxy far far away:

"I have better things to do with my time."

Is it that bad, really? As far as I'm concerned right now, yes. This is a strange Twilight Zone where priorities are upside-down and where flavour-of-the-month rules over practical considerations. Is it hosted using CVS, SNV, Git, Mercurial? It is cross-compiled for 40 different platforms and oh-so-endian-aware? Does it have a command-line interface with pipes? Guess what?

I DON'T CARE.

All I want here is a nice single static library with a header file that I can drop into my project. No dependencies, no versioning conflicts, 6 functions needed: PlayFromFile, PlayFromMemory, Pause, Resume, Stop, SetVolume. It doesn't have to be technically perfect, it doesn't have to be any of the above things, it just has to not suck.

And it seems as though such a thing doesn't exist. Rant over.

Update

I've been doing some work on code quality recently.  Previously I'd been clean with warning level 3, but with 10 or so warnings disabled; I've now re-enabled 8 of those and cleaned up where required.  The remaining two warnings (conversion/possible loss of data and a signed/unsigned mismatch) are marked for cleaning up later on, but right now generate a very poor signal to noise ratio.

Before re-enabling those 8 warnings I got as clean as I could go with level 4, which amounted to compiling with only one warning (a nasty setjmp that there's no really good way of avoiding with the current architecture) but needing to disable a couple more in some specific cases.  Level 4 is really picky, with some of the warnings it throws being of dubious value.  Unused formal parameter is one that I had a lot of (and did clean out some of), but that's something you're going to be doing anyway while developing.  Another construct that level 4 throws on is "i = i", which I use frequently enough to enable me to set breakpoints at the end of functions quickly and easily and without side-effects.

I've been compiling with "warnings as errors" for at least the past 6 months, and highly recommend it to anyone, but I'm unsure of the merit of warning level 4.  It's more convenient to develop at level 3 and do a level 4 pass semi-regularly as a form of sanity check; with level 4 on all the time it's like being hauled in front of the class by the headmaster on account of getting your Latin past tense variants mixed up.

On to code analysis.  I'm aware that Carmack has been advocating it in recent times, and I gave it a try myself sometime last year, but the tool I used at the time (cppcheck) wasn't really giving me useful information and was slow as molasses.  Having access to premium editions of Visual Studio in work allowed me to try a few others, with one being a trial version of PVS Studio and the other being Microsoft's built-in Analyzer.

PVS Studio was nice enough to use, and (thankfully) the trial version isn't too hobbled - at full functionality but limited to 100 clicks it's good enough for a few passes over a project the size of DirectQ.  It did find some important stuff, and I got reasonably clean with it, but when I went to export the full log file my eyes opened.

There were still some actionable items in it for sure, but for the most part it was riddled with low-level warnings about how this 32-bit program, built for 32-bit platforms, using a 32-bit compiler, linking to 32-bit libs, had some constructs that weren't 64-bit safe.

No shit Sherlock.

Couple that with nitpicking over C vs C++ style casts and advice to use the -Ex version of some Windows API calls, and I decided to call it a day with that.  At least it didn't nag me about endianness issues on a purely x86/x64 platform (note to PVS Studio developers - don't start getting ideas...)

Microsoft's tool by comparison was an awesome experience.  It ran quite a deal slower than PVS Studio (but still at light-speed compared to cppcheck) and the stuff it was digging out was real, actionable, problematic code.  Stuff like locally defining the variable name "cl" for an auto-completion list while it's also globally defined as the client state, potential buffer overflows in cases where the original buffer definition may have been many levels up the call stack and/or even in a completely different module, potential NULL pointer dereferencing, etc.

I didn't even try to get completely clean with it.  For one there are some false positives (but for the most part they do prompt you to go over the code and re-check your assumptions which is quite good) and for another thing some of the external library code I'm using turned out to be incredibly dirty indeed.

Two other observations spring out of this.  One is that compilers can and will swallow (and even make sense of) some of the most awful junk, and two is that my own code quality has visibly improved in recent times - some of the gnarlier older code I had written threw up a lot of problems, but some of my more recent code (file system, surface refresh, MDLs, particles) was squeaky clean.

I'm adding a pass through that to my "do semi-regularly" list.

None of this translates into "totally bug free", of course, and it would be foolish to expect that.  What it does translate into is an extra layer of protection against stupid, careless and embarrassing mistakes.  I've long since lost my programmer ego and I'm well aware that just because I've written a few lines of code that happened to work it doesn't suddenly make me all-capable and all-knowledgable.  I make mistakes and it's great to have tools that help me at least stand a chance of preventing some of them from going public.

Thursday, July 5, 2012

A Scurrilous Bug Squashed

I've been setting up my new render-to-texture framework, and have it working with the underwater warp the same as the old code did, but now with the ability to chain multiple effects.  I haven't quite tested that last part yet, but will do so soon.

It needed some heavier refactoring throughout more of the codebase than I had initially planned, but I'm (mostly) happy with the result.  As a first cut it's pretty good and any further changes will be for code-tidiness rather than functionality.

This of course opens the possibility of adding more effects - which will usually be of the optional-and-disabled-by-default variety - SSAO was one that was requested (not certain how easy that is in Quake without heavy rewriting elsewhere, but at least this obstacle is gone), heat-haze is definitely down for a good attempt now, and others are possible.

Things can slow down considerably as more such effects pile up, of course, but that's a normal side-effect and overall I do think that this kind of subtle-but-interesting effect is of more value with Quake than trying to beat on the data and model formats in order to get them to do things they were never designed to do.  It's also a good use of the rendering muscle behind DirectQ.

One nice side effect of all this is that an old problem whereby the wrong render target would be cleared if gl_clear was set to 1 has been fixed up.

Another nasty bug was that when I first set it up it was running a lot slower than it should.  That was odd as the extra framework code should be quite lightweight - the heavy work happens in the shaders (which are the very same as before).  A quick bit of investigation soon revealed that the overlay screen tint was also being drawn - normally I add this to the waterwarp shader which lets me get it for free (almost) and relieves a LOT of the fillrate burden.

That was just a simple order-of-operation bug that didn't affect the old code but has become significant now.  Put the offending function call in a better place and it goes away.  A good example of the kind of traps that Quake's wackiness lays for you.

Wednesday, July 4, 2012

Heat Haze!

Heat haze is something I'd wanted to add for a long long time; proper heat haze, not some cheesy substitute effect.

Today I got something that looks good and runs fast.  Right now I've just plugged it into the underwater code so that I can test it/experiment with it, but it definitely is going to happen.

I don't propose overdoing it and putting it on every single little thing that could potentially have it - a reasonably subtle shimmer over lava pools is enough.

One thing that needs doing is a proper render-to-texture interface for managing multiple layered effects.  It will have to co-exist with the underwater warp for starters, and the interface I have around that is nasty and dates back to my first experiments with render-to-texture without much dramatic change since.

I'll probably write some about that later on.

What does it look like?  Well, you really need to see it moving to get the full effect, but zoom in on this shot for an idea of it (the final version will add some blurring):


Tuesday, July 3, 2012

A Dog's Eye View

Normally when something doesn't work, a good technique is to go off and do something completely different so that the subconscious has time to filter through details of the original thing that didn't work.

This is what Quake looks like if you're a dog:



In addition to "dog vision" I did some research on 8 different types of colour blindness (9 if you include normal vision) and simulated them by colour matrix transforms using info from here: http://kaioa.com/node/75.  Then I cvar-ized the lot to let me switch between them at run time.

It's not exactly set up to be optimal in terms of performance - copying the back buffer to another texture then drawing it over the scene, through the filter, as a fullscreen quad is always going to be slow (true render to texture would be faster) - but then again it's not meant to be.  It is however a useful tool for accessibility testing in-game, and lets those of us with full colour vision see the impact that our work can sometimes have on those of us who don't.

Monday, July 2, 2012

More Lightmap Techniques That Didn't Work

I'm writing this up primarily for myself, so that I don't come back in a few days time thinking "maybe I could..." - but it may be of some interest to others.

First up is compute shader lightmaps.  There are problems with the transition between regular shading and DirectCompute here, and it just loads too much onto the GPU, so on balance it loses more than it gains.

The "do not wait" map technique breaks with BSP models as these need to share the same set of lightmaps, and multiple such models could be on-screen in any one given frame - deferring the update to the next frame if we don't get a mapping in this frame is not an option for these as a shared map must always be updated.

Only updating lightmaps once every - say - 1.0 / 72.0 of a second doesn't work because it also breaks with BSP models.  Again, if there are multiple models on-screen which share the same lightmap, that lightmap must be updated for each model.

The old GPU lightmap technique (a 1-4 texture blend plus adding dynamics per pixel) is incredibly slow when dynamics come in and eats video RAM.  A variation with dynamics per-vertex breaks static vertex buffers and has unacceptable quality.

Attenuation maps need multiple blend layers (with consequent overdraw) and a 64-bit-minimum render target to avoid clamping.  Too slow and render target switches will kill it.

A reasonably common thread here is BSP models.  Yes, I want to dynamically light these properly, and yes, I think it's sufficiently important that I'll throw out otherwise perfectly good options.  The motivation for this comes from the heavy use of this model type in some maps/mods - without dynamic lights they are a huge visual anomaly (I guess the mappers always ran with gl_flashblend 1).

Despite all this I'm feeling reasonably close to something that I think will work.  It will probably look like a pool of textures, when we want to do an update we pull from it and try to map with do-not-wait, if we don't get a map we pull another, and so on.  The complication here is how to handle data that's in the update rect but didn't get touched during the update.

If I don't get a solution to that I'll probably try something like using either the do-not-wait or the 1.0 / 72.0 seconds techniques on regular models but forcing a map on BSP models.  That feels slightly ugly, but it will work.

If Plan B Fails...

So plan B for CS-based lightmap updates failed.  Plan A was to just update the lightmap texture directly in-place, which ultimately (once I had fixed up the skipped texels/etc) ran considerably slower than CPU-based lightmaps.  I had reasoned that the in-place update was a contributing factor to this, so for Plan B I decided to update a staging texture (not D3D11_USAGE_STAGING, but just a second copy of the lightmap array) then copy the changed portions across to the live texture; the reasoning being that having to unbind the live texture array from the PS pipeline, bind it to the CS pipeline, have two separate views on it (shader resource and unordered access), and then reverse the binding when done was not working well.

So with Plan B I got as far as doing the copy from staging to live (which - because they're both D3D11_USAGE_DEFAULT - happens on the GPU) and it benchmarked as no faster than my old CPU-based method.  Obviously adding the actual update code would pull things down even more, so I decided not to proceed any further.

The final solution ended up being a modification of my old CPU-based method.  With this I exploit the D3D11_MAP_FLAG_DO_NOT_WAIT option, which tells D3D that if the GPU is currently busy with a resource to not bother giving me a mapping on it, but just return immediately.  If that happens I don't bother updating the lightmap, but leave the surface properties dirty so that it will try again next frame.

Aside from some minor glitches with BSP models (which there was always going to be, and which I'll probably resolve by forcing the mapping if it fails two consecutive times) it works, and quite well too.  Resource contention and pipeline stalls are almost completely eliminated from the engine, and everything runs smooth and fast.

With hindsight the CS-based updates were most likely a CPU/GPU balancing issue.  Unlike most other Quake engines DirectQ puts a lot more work on the GPU, which is part of the reason why it runs so fast (a typical Quake engine only uses ~50% of your GPU's power, loading everything else on the CPU and meaning that a GPU upgrade tends not to give you as much of a performance increase as it should) but also exposes the GPU as somewhat more of a candidate for bottlenecks.  In short - because the GPU is already running at full capabilities, loading more work on it has a risk of making things slower.

Definitely an interesting learning experience overall.