Monday, May 19, 2008

More Progress

Just been spending some time working on the input code. More of a general tidy up than anything dramatic, although I have made a few changes.

  • Removed joystick support. This might get some people's backs up, but it's a practical necessity: I just plain-old-fashioned do not have a joystick to test other changes on, so better to have no support than broken/unpredictable support.
  • Added cvar controlled mouse looking (m_look, on by default). This is in addition to, rather than instead of, the old Quake 1 look controls, which still work exactly the same way as before with mouse looking switched off. You can even still have +mlook in your autoexec. While I was doing this, I removed the mouse slide forwards = move forwards a little funtionality (thinks: might restore this but make it controllable too).
  • Switched Direct Input and m_filter on by default (m_filter is now an archive variable). It's just smoother and more responsive on XP, which I guess is the version of Windows most people have these days. I may yet decide to re-initialize Direct Input at a map load/change as it doesn't seem to properly flush it's state between maps.
  • Added mouse wheel support to Direct Input. No thanks at all to the non-existent documentation, but a bit of digging around in header files and experimenting with stuff got me there.
If you're interested in implementing that last item, here's the code. It should be obvious where to put this.
case DIMOFS_Z:
// detect the mouse wheel movement
if ((int) od.dwData < 0)
{
Key_Event (K_MWHEELDOWN, true);
Key_Event (K_MWHEELDOWN, false);
}
else
{
Key_Event (K_MWHEELUP, true);
Key_Event (K_MWHEELUP, false);
}

break;

In other news, I finally threw out my old wimpy 350 watt PSU and got a nice beefy 650 PSU in. That wasn't quite the source of some of my recent problems (I'm fairly convinced that was an NVIDIA driver bug), but it's good to have the extra juice on tap, and it opens up GPU upgrade possibilities. Plus it's so quiet by comparison. Happy to report that changing it was a breeze, and I didn't end up all blackened and charred with smoke rising from the remains of my hair.

Thursday, May 15, 2008

Fun with my PC

What an evening of joy that's been! It started when my beloved GeForce 6600 GT complained about using too much power (which it never did before), but I fixed that by swapping things around a bit. Then I upgraded to AVG 8, which was a whole barrel of laughs. Hello ultra-slow start-up times, excessive memory use, and a fairly graphics-heavy user interface that takes positively ages to load. So after years of being loyal to AVG I'm ditching it; the new version is just not worth it - all I want is a lightweight AV program that's effective at stopping nasties, not some ghastly bloatware Norton 360 clone. To top it all, my idea for caustics in MHQuake didn't work out quite as well as I'd hoped (fortunately I have another).

On the plus side, Windows XP Service Pack 3 is so far rather nice. It actually gives a fairly hefty speed-up!

Wednesday, May 14, 2008

2 TMU Path

I've finally written most of the 2 TMU path for world rendering. Tackling this was fairly off-putting, but I'm there. The remaining item is handling of caustics, but that was always something that I had intended to move away from textures and on to vertex colours, so I'm going to switch them over to that.

I think I'm going to drop torch coronas too. I'm not a massive fan of the code I've used for them, and I came very near dropping the ball on this engine after the effort it took to even get that far. Better off without them than to have code in there I don't like and would risk freaking myself out over again.

Wednesday, May 7, 2008

Slightly Happy Today...

I've finally figured out the whole mess of how Hipnotic draws the Grenade Launcher icon in the status bar/HUD. This was a big jump forward, as having the same mod support as regular GLQuake is important.

I've also decided to drop the 3D HUD. It looks pretty, but the game assets are just not there to support it properly, and it requires far more hard-coding than I'm happy with. I've gone back and reworked the original SBar code to support a 2D HUD, so this will give the best mixture of compatibility and moving-forward.

Finally, I'm going to be reworking the video startup architecture a bit, as I want to support in-game resolution changing. Plus I think that the original is a bit of a nasty hack anyway.

Thursday, May 1, 2008

Not Gone Away

Just took an extended break. I'm in the cleaning up and filling in of loose ends part of the work right now, so don't worry - this will still be released!

Tuesday, March 18, 2008

The proof of the pudding...

Here's the entrance to the main central hall in E1M3 with some torch coronas shining. Notice how all visible torches have a corona which is (correctly) not depth tested, but the two torches off to the right (beside the gold key door) don't show.

If the obscuring wall was a brush model, they would still not show. Using traces, that part would not work correctly.

This gives the same end result as using hardware occlusion queries, but it's faster and will work on a card that doesn't support hardware occlusion queries.

Monday, March 17, 2008

Entity Occlusion

It's there and it works. Pretty damn beautifully, even if I must say so myself.

I was able to get the Z Buffer updates down to 10 times per second, which gave another handy speed boost. I could have got away with 5, but the increase wasn't too much, so I decided to run with the extra accuracy.

Here's the code: a lot of it is engine-specific, but if you can pull anything useful from it, it's my pleasure.

float r_z_update_time = 0.0f;
#define Z_UPDATE_INTERVAL 0.1f
#define Z_UPDATE_SIZE 64

// software z buffer
float zBuf[Z_UPDATE_SIZE * Z_UPDATE_SIZE];


/*
==================
R_ProjectPoint

project a point from world co-ordinates to screen coordinates
==================
*/
void R_ProjectPoint (vec3_t vin, vec3_t vout)
{
float fvin[4] = {vin[0], vin[1], vin[2], 1};
float fvout[4];
float *mm = r_world_matrix;
float *mp = r_world_project;

// transform our points - fvin will hold the final transformation
fvout[0] = mm[0x0] * fvin[0] + mm[0x4] * fvin[1] + mm[0x8] * fvin[2] + mm[0xc] * fvin[3];
fvout[1] = mm[0x1] * fvin[0] + mm[0x5] * fvin[1] + mm[0x9] * fvin[2] + mm[0xd] * fvin[3];
fvout[2] = mm[0x2] * fvin[0] + mm[0x6] * fvin[1] + mm[0xa] * fvin[2] + mm[0xe] * fvin[3];
fvout[3] = mm[0x3] * fvin[0] + mm[0x7] * fvin[1] + mm[0xb] * fvin[2] + mm[0xf] * fvin[3];

fvin[0] = mp[0x0] * fvout[0] + mp[0x4] * fvout[1] + mp[0x8] * fvout[2] + mp[0xc] * fvout[3];
fvin[1] = mp[0x1] * fvout[0] + mp[0x5] * fvout[1] + mp[0x9] * fvout[2] + mp[0xd] * fvout[3];
fvin[2] = mp[0x2] * fvout[0] + mp[0x6] * fvout[1] + mp[0xa] * fvout[2] + mp[0xe] * fvout[3];
fvin[3] = mp[0x3] * fvout[0] + mp[0x7] * fvout[1] + mp[0xb] * fvout[2] + mp[0xf] * fvout[3];

// prevent division by 0
if (fvin[3] == 0.0) fvin[3] = 0.000001;

// normalize
fvin[0] /= fvin[3];
fvin[1] /= fvin[3];
fvin[2] /= fvin[3];

// map x and y to range 0..1, then scale to buffer dimensions
vout[0] = (fvin[0] * 0.5 + 0.5) * Z_UPDATE_SIZE;
vout[1] = (fvin[1] * 0.5 + 0.5) * Z_UPDATE_SIZE;

// scale to the depth range we're using
vout[2] = (fvin[2] * 0.25 + 0.75);

// move points outside the image into the image
if (vout[0] < 0) vout[0] = 0;
if (vout[0] >= Z_UPDATE_SIZE) vout[0] = Z_UPDATE_SIZE - 1;
if (vout[1] < 0) vout[1] = 0;
if (vout[1] >= Z_UPDATE_SIZE) vout[1] = Z_UPDATE_SIZE - 1;
}


/*
==================
R_ProjectBBox

project a bounding box from world coordinates to screen coordinates, then take a 2D
"bounding box of the bounding box" for use in the occlusion culling tests
==================
*/
void R_ProjectBBox (float *mins, float *maxs, float *minsout, float *maxsout)
{
int i;

// initial corner points
minsout[0] = minsout[1] = minsout[2] = 999999999;
maxsout[0] = maxsout[1] = maxsout[2] = -999999999;

for (i = 0; i < 8; i++)
{
vec3_t bboxptin;
vec3_t bboxptout;

// get the correct corner to use
bboxptin[0] = (i & 1) ? mins[0] : maxs[0];
bboxptin[1] = (i & 2) ? mins[1] : maxs[1];
bboxptin[2] = (i & 4) ? mins[2] : maxs[2];

// project to screen
R_ProjectPoint (bboxptin, bboxptout);

// store min and max
if (bboxptout[0] < minsout[0]) minsout[0] = bboxptout[0];
if (bboxptout[1] < minsout[1]) minsout[1] = bboxptout[1];
if (bboxptout[2] < minsout[2]) minsout[2] = bboxptout[2];
if (bboxptout[0] > maxsout[0]) maxsout[0] = bboxptout[0];
if (bboxptout[1] > maxsout[1]) maxsout[1] = bboxptout[1];
if (bboxptout[2] > maxsout[2]) maxsout[2] = bboxptout[2];
}
}


int R_BoxInFrustum (vec3_t mins, vec3_t maxs);


void R_RunOccludeEntityTest (entity_t *ent, vec3_t mins, vec3_t maxs)
{
vec3_t screen_mins, screen_maxs;
int x;
int y;

R_ProjectBBox (mins, maxs, screen_mins, screen_maxs);

for (y = screen_mins[1]; y <= screen_maxs[1]; y++)
{
int p = y * Z_UPDATE_SIZE;

for (x = screen_mins[0]; x <= screen_maxs[0]; x++)
{
if (zBuf[p + x] > screen_mins[2])
{
// not occluded
ent->occluded = false;
return;
}
}
}

// occluded
ent->occluded = true;
}


void R_RunOcclusionTest (void)
{
int i;
entity_t *ent;
vec3_t mins, maxs;

if (!r_worldentity.model || !cl.worldmodel) return;

for (i = 0; i < cl_numvisedicts; i++)
{
ent = cl_visedicts[i];

// not occluded
ent->occluded = false;

switch (ent->model->type)
{
case mod_brush:
case mod_alias:
case mod_sprite:
// get entity origin
VectorAdd (ent->origin, ent->model->mins, mins);
VectorAdd (ent->origin, ent->model->maxs, maxs);

// do the bbox cull here
if (R_BoxInFrustum (mins, maxs) == FRUSTUM_OUTSIDE)
{
// occluded
ent->occluded = true;
}
else
{
// test for regular occlusion
R_RunOccludeEntityTest (ent, mins, maxs);
}

break;

default:
break;
}
}
}


void R_CaptureDepth (void)
{
texture_t *t;
extern texture_t *texturelist;
extern float r_farclip;

// accumulate update time always
r_z_update_time += r_frametime;

// don't update if it's not time to do so yet
if (r_z_update_time < Z_UPDATE_INTERVAL && r_framecount > 5) return;

// begin the timer again
r_z_update_time = 0;

// render at Z_UPDATE_SIZE x Z_UPDATE_SIZE in the bottom-right corner
// create the viewport for the capture
R_SetupGLViewport (vid.glwidth - (Z_UPDATE_SIZE * 2), Z_UPDATE_SIZE, Z_UPDATE_SIZE, Z_UPDATE_SIZE, r_refdef.fov_y, 4, r_farclip);

// store modelview and projection matrixes for reuse
// fixme - do this in software to prevent a sync-wait
glGetFloatv (GL_MODELVIEW_MATRIX, r_world_matrix);
glGetFloatv (GL_PROJECTION_MATRIX, r_world_project);

// set up the depth range for the capture
// we can use a good chunk of the depth buffer here
glDepthFunc (GL_LEQUAL);
glDepthRange (0.5f, 1.0f);
glDepthMask (GL_TRUE);

// shut down everything we don't need for this
glDisable (GL_TEXTURE_2D);
glColorMask (GL_FALSE, GL_FALSE, GL_FALSE, GL_FALSE);

// base vertex arrays
vaEnableVertexArray (3);

for (t = texturelist; t; t = t->texturelist)
{
// get the texture chain
msurface_t *surf = t->texturechain;

// no surfs in use
if (!surf) continue;

// skip over these surf types (fixme - this is ugly)
if (surf->flags & SURF_DRAWTURB)
{
if (surf->flags & SURF_DRAWOPAQUE)
{
}
else continue;
}

if (surf->flags & SURF_DRAWSKY) continue;

// walk the chain
for (; surf; surf = surf->texturechain)
{
glpoly_t *p;

// draw polys here as we're sending some liquids through it too
for (p = surf->polys; p; p = p->next)
{
int i;
glvertex_t *v;

vaBegin (GL_TRIANGLE_FAN);

for (i = 0, v = p->verts; i < p->numverts; i++, v++)
vaVertex3fv (v->tv);

vaEnd ();
}
}
}

// done with the render
vaDisableArrays ();

// capture the depth buffer
// per the spec, this scales to a 0..1 range, irrespective of the actual depth range
// (http://www.opengl.org/documentation/specs/man_pages/hardcopy/GL/html/gl/readpixels.html)
// but this is a lie...
glReadPixels (vid.glwidth - (Z_UPDATE_SIZE * 2), Z_UPDATE_SIZE, Z_UPDATE_SIZE, Z_UPDATE_SIZE, GL_DEPTH_COMPONENT, GL_FLOAT, zBuf);

// bring stuff back up
glEnable (GL_TEXTURE_2D);
glColorMask (GL_TRUE, GL_TRUE, GL_TRUE, GL_TRUE);

// glColorMask leaves the current colour state undefined
glColor3f (1, 1, 1);
}

Sunday, March 16, 2008

Premature Optimization

One of the golden rules of programming is to avoid premature optimization, but yet with my Z Buffer capture I have not only optimized prematurely, but have deliberately set out to do so. This was for a number of reasons.

Occlusion queries are a fairly tried and trusted technique, but the current implementation requires a pipeline commit before you can read back the data. For multiple entities, where you only want entities to be tested against world geometry (i.e. not against other entities) that translates into multiple pipeline commits. On one of my test machines, the net result is that implementing hardware occlusion queries causes a drop in FPS from 230 to 170. In many cases, this is acceptable as you can win back by skipping subsequent rendering. In Quake, the win back is not sufficient to justify the loss.

The primary goal of this technique was to implement occlusion. The secondary goal was to do so without any appreciable performance loss. Achieving the primary goal is relatively easy, any first year student could write the code, and even a non-programmer could sketch out the basics. Pythagoras knew how to do it. However, doing it in an acceptable and feasible real-time system is not easy.

As soon as I knew that I wasn't going to be able to get a fully-in-software implementation working, and I soon as I made the decision to walk away from even attempting it, I knew that any solution would have to be optimized like crazy. At that point in time I had fully intended to abandon it completely, but something about glReadPixels and GL_DEPTH_COMPONENT kept nagging at the back of my brain. The performance loss from glReadPixels is from two areas: the pipeline commit and the actual read back. If the impact of these could be minimized, then I would have a viable solution.

From then on it was a case of 2 + 2 = 4. Since a highly optimized solution was part of the basic requirement, it followed that any initial prototype would have to be optimized from the outset. Otherwise there was no point in even continuing beyond the depth buffer capture stage. So in order to meet the basic requirement, I broke the rules.

I suppose that the moral of this story is that the old rule of "premature optimization == bad" still stands as a good general rule, but it's important to realize that it's not universally applicable, and that when cases arise where optimization is required even at proof-of-concept stage, you need to sit down and consider whether or not it actually is premature.

Saturday, March 15, 2008

More on Linux

I've been following some discussions over on The Daily WTF with interest. It started out as a simple question on why Windows didn't include a command-line "sleep" tool in a default installation (which was quickly answered), but fairly quickly degenerated into a mud-slinging contest. It's only a matter of time before Godwin's Law is invoked.

One interesting thing about these discussions is that it's normally the Linux devotees who do most of the mud-slinging (although in this particular case, there are some reasonable people who seem to have thought things through on that side of the fence). I couldn't even begin to list the number of anti-Windows arguments I've seen in the past that have been based on things that may have been true in 1992 (but are no longer), or that are outright falsehoods.

This is sad because Linux has lots of strengths in lots of important areas. Yet it's adherents seem either unable or unwilling to sell it on those strengths, and instead resort to highlighting weaknesses (or perceived weaknesses) of the competition. It gives the impression to an outsider looking in that they really don't have confidence in their favoured platform; that they view it as "the best of a bad lot" rather than "the best, period".

Why is this, I wonder?

There has been a colossal push to get Linux established as a viable alternative desktop platform, but even it's most loyal devotee (I'm excluding the rabid/fanatical types here) would admit that it's still not ready. Ubuntu is probably the nearest, but that is still riddled with quirks and difficulties that would be deal-breakers to the typical desktop user. This is all deeply rooted in Unix culture, where there is an implicit assumption that the person using an OS would be intimately familiar with the inner workings of that OS. This is no longer the case, and hasn't been so for well over a decade.

For both platforms there is a Wall that the Hypothetical Typical User will eventually hit, beyond which they cannot progress without making an effort to improve their skills (this may come as a surprise to some, but most HTUs have no inclination whatsoever to improve their skills). Linux places that Wall far far nearer to the user than Windows does. A lot of energy has been wasted in the Linux camp on hot air, hyperbole, FUD and scare tactics. Maybe it's about time that this energy was redirected into something positive and productive, of benefit to everyone, and directed at achieving the goal of putting Linux on the desktop. Like pushing that Wall back.

Or maybe, underneath it all, the Linux camp (who seem to be more motivated by ideology than by practicalities) are really just plain old fashioned not interested in getting there?

Thursday, March 13, 2008

Z Buffer Capture

Got it working :)

The screenshot on the right shows the capture for the start hall. I've mangled the intensities (and inverted the range) so that you can see things a bit clearer. The image is also somewhat larger than I will use for production; again, this is just for demonstration purposes.

Further performance optimizations. I only capture the Z buffer under the following conditions:

  • The viewleaf has changed (always capture, no matter what).
  • We're in the first few frames of the map.
  • The view origin or angles have changed significantly.
  • 0.1 seconds have passed (i.e. capture at 10 FPS).
The last one might seem a bit controversial, but it does make perfect sense if you think about it. If the view origin and angles haven't changed significantly enough, then the resulting scene will be relatively static, so there's not much need to update the captured buffer. I might be able to get away with capturing less frequently (or even not at all!) under this condition, but a final decision on that will have to wait until I start bringing entities into play.

It's good to be back on track with this.