Monday, February 4, 2013

Update - 4th Feb 2013

No work done at all last week; it was semi-expected but I had hoped despite everything to sneak in a coupla hours here and there.

I've recieved a few emails and PMs from various sources, but am really not in a position to even look at them at the moment.  I'm not ignoring anything; I just can't do stuff right now.

Meanwhile - and in other news - the new My Bloody Valentine LP was released the other day.  If you're at all interested in good music you owe it to yourself to try crash their server (again!) by downloading it - http://www.mybloodyvalentine.org/ - jump to it!

Tuesday, January 29, 2013

Update - 29th January 2013

It's looking as though tomorrow is going to be the earliest I can get back to stuff; it may not even happen then but it definitely won't happen before.

With hindsight, adding the bloom effect at the time I did was a mistake.  It took up far too much time that would have been better spent elsewhere, and a release would have happened by now if I had not done so.  I'm not going to remove it - it's there now and removing it won't give that time back - but I know that in it's current state there is room for more work and the shape it will be in on release will be something that's not going to please everyone.

This lesson is something that's going to need to apply to any major undertaking in the future.  I won't declare a moratorium on such, but I will need to be aware and bring forward more the fact that anything large will involve huge time problems and sudden, unexpected and protracted delays that just weren't there in the past.  In other words, it will need to be something really worth doing (or that I'm really interested in doing) to lift it onto the priority list.

Monday, January 28, 2013

Bug Alert!

I got a freeze the other day on AMD hardware.  This happened while firing the railgun so I suspect that it's something in my particle code, but I need to dive back in and make sure.  It may be a case of something I've observed before, where AMD will quite happily let you overflow the size of a Locked vertex buffer, but I've as yet no evidence either way.

Real Life getting busy again but I hope to steal some time and get this out soon.  It's so near ready.

Wednesday, January 23, 2013

Nothing getting done today

Just spent 8 hours programming, testing, deploying, debugging, etc in work.  No way am I even contemplating opening a code editor this evening.  I shall scream if I do.

Tuesday, January 22, 2013

Got the speed up

It's more like about 15% faster but it is still a good gain; it came from being able to combine a downsample pass with one of the blur passes, but unfortunately there is a small quality loss from it which may or may not be more noticeable depending on your gfx hardware.

What I'm probably going to do is offer it as an option, not sure if it will be the default path, but it seems a case where if the quality loss is acceptable on some hardware then it would be a shame to not have the extra performance that comes with it.

Got the quality up


In the end it turned out to not be some fancy algorithm or late-night ninja-coding trick, but just using a more suitable texture format.  Successive blur passes were causing a very noticeable loss in precision and everything was coming out looking like 16-bit colour.  The hack I had to work around that fixed the precision loss but caused unbloomed colours to bleed into bloomed colours.

Selecting a different texture format did it.  I experimented with a few and settled on a 10/10/10/2 format; this turned out to be a great choice as the extra 2 bits per colour channel were enough to wipe out all quality loss (I wasn't using the alpha channel this time so only having 2 bits there was no big deal - a perfect fit).  It will still fall back on 64-bit formats, then finally standard 8/8/8/8 32-bit if it can't create one of these, so at least something should work for everyone (even though the fallback of last resort will look bad).

Sometimes in the desire to try a clever solution one can miss the most basic.  Sigh.

Next up is performance; I reckon that there's maybe another 25% to be had from it, so I'll plug at that and see how we go.

Monday, January 21, 2013

Update - 21st January 2013

The delay has been well worth it; in addition to some code clean-up elsewhere I took the opportunity to revisit the bloom code and rework some parts of it.

It now runs a good bit faster than it had before; still some room for improvement, but I hope to get that over the next few days.

One thing in it that had bugged me slightly was the the amount of bloom flickered and changed rapidly as the average scene luminance changed.  I fixed that by sampling the luminance at regular intervals - 0.1 seconds is good for a fast-moving FPS - and interpolating between the last two samples.

That had some other benefits.  Firstly, performance is up because the average luminance no longer needs to be sampled every frame.  Secondly, some transient effects (such as the railgun particle trail) will no longer diminish the overall bloom so much - that's a good thing as if a sudden change in brightness happens it means that your eyes don't have time to adapt so everything is brighter (or darker) for a short while until they do.

The sampling interval is, of course, cvar-ized.  Remember - all bloom cvars will begin with "r_bloom".

Currently I'm bashing at a balance between maintaining floating point precision in the bloom effect versus having un-bloomed colours bleed slightly into bloomed ones.  That's a tough one that I have a partial solution for, but I'd like to get it better.

Sunday, January 20, 2013

Release Delayed

There are some things left over I need to work through a bit more; I could release now but I'd prefer to just get these tightened up a little first.  A better release a little later seems preferable.

I've deleted the last post by the way; the point made in it still stands but there's no need to publicly drag it out.

Thursday, January 17, 2013

Update - 17th January 2013

I'll probably release over the coming weekend; just really at the stage where I'm playing around with parameters and defaults, and I think there's probably another short while of that to be done, plus I'm being social on Friday night.

I got a nice speedup by moving the average scene luminance from the pixel shader stage to the vertex shader stage and doing some vertex texture work; this was possible as it's a 1x1 texture anyway, and it means that it only needs to be fetched 4 times instead of tens of thousands.

That brings me onto what I'd threatened earlier - some D3D9 versus 11 differences.  In general both APIs are equally capable, but some features are not so nice to use in 9.

Vertex texture work was the obvious first candidate; in 9 (as with other SM3 features) it's quite obviously been crudely hacked onto the existing API; D3D11 (and, indeed, 10) generalizes this a whole lot better and is just nicer to use.

Buffer updates were nicer in 11; in 9 you need to know the size of the buffer to Lock, whereas in 11 you don't, meaning that for dynamic data (like 2D GUI stuff or particles) you can just write until you run out of space.  Code becomes a lot cleaner and clearer, with less intermediate steps.

And speaking of intermediate steps, lack of texture arrays in 9 means that lightmapped surfaces need a second sorting pass to get batching and state-change reduction working well.  I'd consider real-time lighting for this engine, but the requirement to also do radiosity complicates matters quite a bit (I'm aware of real-time radiosity solutions but they're currently quite high-end).  DrawIndexedPrimitive (in 9) is nasty to use in this regard when compared to DrawIndexed (in 11) too; there are some additional params that you need to calculate in advance here too.

No constant buffers in 9 - I missed those.

One area where 9 does show up positively compared to 11 is texture updates; this is back to lightmaps again.  I've never found a satisfactory approach to these under 11 whereas the approach with 9 works just fine.

The instancing API in 9 is another example of an SM3 feature that looks like it was crudely hacked on.

Under 11 I would have probably done the bloom stuff using a compute shader; multiple pixel shader passes like what I currently have in the 9 code work fine but things really pile up.

Lack of D3DERR_DEVICELOST in 11 is a good thing, but I have a nice framework for handling that under 9 and it never caused me much in the way of problems.

I also missed being able to draw without any vertex data (using the SV_* builtins) - there are a few cases where that could have come in handy.

That's pretty much the full list of things that really struck me during the process of doing this update.  The next post should be the release.

More Bloom Samples

Think it's looking pretty good now. I did blueshift and tonemapping but left them disabled by default; they look really weird for Quake II. One final tweak on the brightpass was to add a lerp factor (cvar-ized, of course) between the average scene luminance and the current texel to get a more even-looking blend.