I finally made the big breakthrough and now have a solution to both DirectQ's and RMQ's integer millisecond timers not giving you the framerate you ask for via host_maxfps.
At the most basic level this is because we're counting time in milliseconds and as integers; for Quake's standard 72 FPS the problem is that for 72 this translates to 13.88888888... milliseconds per frame, and the time passed since the last frame must equal or exceed this in order for the next frame to run. The only integer equal to or higher than this is 14, so we were getting 14 millisecond frames, or about 71.42857 FPS.
Things get more interesting at higher framerates. With host_maxfps set to 300 we get 3.3333 milliseconds per frame, which translates to 4, which translates back down to 250.
The solution was simple and elegant. Instead of measuring time passed we now use something similar to QC's "nextthink" - so we don't have the amount of time that must pass before the next frame runs, but instead we have the value of realtime that must be equalled or exceeded before the next frame runs. When that occurs, this value is incremented by a fixed amount which is by default 1 / host_maxfps.value (for timedemos it is set to the value of realtime, which causes the next frame to run immediately; when not connected it drops back to 1/72).
The end result is that framerates tend to coalesce around the correct value, so if you select host_maxfps 300 then you'll see a fairly solid 300 FPS with variations of about 0.2 FPS every now and then. If any frame runs a little slower the next one will run a little faster, and vice-versa. The player experience is incredibly smooth and even feeling, and it works seamlessly with things like vsync, decoupled timers and host_framerate/host_timescale.
One interesting thing happens if you set host_maxfps to in or around the maximum average framerate your hardware can handle. During a typical scene you get slow frames and fast frames, and if a lot of slow frames occur at the same time then you'll get faster frames later on to balance out the rate. On the Intel 945 I occasionally run on, and which can manage maybe 280 FPS in timedemos with DirectQ, I set host_maxfps 300. Entering the start map gives me maybe 220 FPS in the initial scene with the skill halls. Spin around and face the wall, then FPS starts going up to maybe 360 before settling back down over a second or so to 300. It's kinda weird but I like the effect.
Like I said, this is not the normal case. It's only if your host_maxfps is in or around the normal average max FPS you can do, and performance ever drops below that. If you set it to 72 you'll get a solid 72, because performance never drops below that.
Tuesday, August 30, 2011
I finally made the big breakthrough and now have a solution to both DirectQ's and RMQ's integer millisecond timers not giving you the framerate you ask for via host_maxfps.
Posted by mhquake at 7:46 PM
Monday, August 29, 2011
Been doing more work on RMQ's IQM support. I've managed to move a good chunk of the animation - the final per-vertex position and normal transform - over to the GPU, with consequent performance gains. The joint setup remains on the CPU, as does setup of bone matrices for each vertex.
This has meant that IQMs can now go into static VBOs, which helps a little more too.
A problem was hit when I went to write r_shadows 1 mode for them. My initial implementation drew all models in a single batch, then switched state and drew all shadows in a single batch. This mirrors what I had done with MDLs and kept things nice and clean so far as state changes were concerned, but it does require recreating the animation.
Now, with MDLs all I need is two ints and a float to recreate the animation (I animate and interpolate entirely on the GPU these days); with IQMs I need a 3x4 matrix per-vertex. OK, that's big and I can't cache it anywhere because multiple entities can share the same model, so memory requirements for caching can fast become prohibitive.
Switching over to drawing in model/shadow, model/shadow, model/shadow order removed the need to recreate the animation, but did mean that I was back in state change ugliness land. On balance however it ended up running almost twice as fast, indicating that the GPU was barely even noticing the extra load, but the CPU was being put under some strain. Ow!
Other solutions presented themselves, such as loading the joints into the GPUs constants registers and running the full animation GPU-side. That was discounted because constants register space is limited (I can't go over 96 with the hardware I'm targetting and each 3x4 matrix requires 3 slots) and it would have become necessary to break the model into a multitude of separate draw calls. That can get real messy real fast too as I'd probably also have to break apart all the model vertexes and indexes and reorder them based on joint usage in order to make everything fit in the limited space.
So all in all it remains a slightly yucky experience, but less so than my original encounter. There's still more of it on the CPU than I'm happy with, but at least some use is now being made of the GPU, which is much better at this kind of general calculation, and which was otherwise lying idle.
Posted by mhquake at 12:21 AM
Sunday, August 28, 2011
Been steadily transitioning the RMQ engine over to shaders for the past few days, and things are now nearing completion. It's been a huge cleanup across the entire codebase, with a lot of messy fixed pipeline crud just yanked out. Take sky drawing for example; for the scrolling sky layers there were previously a total of 3 passes required; one to lay down a depth buffer based on the original polygons, one to draw the layers, and one to blend a sky fog layer over it. These all had separate states which needed to be toggled on or off as appropriate, and the end result was quite complex and sluggish. These have now all collapsed to a single pass, with the scrolling layers getting per-pixel accuracy, no requirement for an initial depth buffer, and fog blended in using the exact equation I choose, rather than trying to shoe-horn something that might look a reasonably approximation into a limited set of predefined blends.
One pleasant side-effect of using shaders is that they make static vertex buffers possible. It's no longer necessary to compute turbulent factors, blended poses and so on per-vertex on the CPU. You just pass parameters to your shaders and let them do the work. As a result the engine has picked up static VBOs for much of the on-scene geometry - all MDLs, brush models, IQMs now live in big static VBOs. I still have fallback vertex array paths for cases where VBOs aren't available, but I may yet rip them out - if you have shaders you're going to have VBOs too.
OpenGL VBOs compare poorly to those in D3D. Aside from the API differences there seem to be some crucial differences lurking in the driver, and the performance gain is not what it should be. There are scenes that DirectQ runs up to twice or three times as fast, and toggling VBO support off in my development build shows that the gain from using them is only in the order of 20% to 50%.
The choice of assembly language shaders has proven to be both good and bad. Good because the OpenGL API interface to them is utterly clear and unambiguous, there is only one version so version conflicts are impossible, and they run correctly without any vendor specific hacks on NVIDIA, ATI and Intel. That last one is important - I'm not able to test on ATI hardware at present, so I had to back out GLSL support on ATI in the past. Now I can just write a shader and be confident that it will work anywhere despite that. That's a huge productivity gain - less time spent debugging and troubleshooting is more time to do the good stuff.
Bad because a high level language is something that you kinda take for granted. You never really do appreciate it until it's taken away from you. Some things - like a simple array index into the constants registers - I haven't yet figured out.
On the other hand I am enjoying the feeling of more precise control over what exactly gets done. I don't have to worry about things like "will the compiler translate this to a MAD instruction" - I just write a MAD instruction. Job done. And getting away from the shoddy quality of GLSL compilers is always a good thing.
Posted by mhquake at 5:04 PM
Tuesday, August 23, 2011
It's looking as though the RMQ engine is going to be a shaders-only engine too, although in this case it won't be GLSL but will be using the older ARB_vertex_program and ARB_fragment_program extensions. These are what Doom 3 used (although RMQ will remain more lightweight) so support for them is widespread and quite rock-solid.
The extension specs themselves date back to 2002, so I guess saying that "you must have hardware that is not more than 9 years old" is reasonable enough.
This isn't just for gratuitous eyecandy; there are a number of gameplay related effects on the want list that can really only be reasonably done with shaders. They run faster on most hardware than using the fixed pipeline. It's a lot easier and clearer to express a complex multitextured blend using a few lines of shader code than it is to do the same using pages of TexGen/TexEnv/texture matrix calls. A lot of messy fiddly buggy crap just goes away.
Hardware support is quite similar to DirectQ - some kind of D3D9 class hardware (even though it's OpenGL, but it's in the same class) is needed. This, interestingly enough, also includes the lower end Intel 9xx chips which actually do support these extensions. They're also preferable to GLSL because the GLSL specs sometimes seem to change more often than most people change their socks, there are wide incompatibilities between versions, and driver vendors never miss an opportunity to screw up wherever possible.
I've been road-testing them in a test build of GLQuake over the past few days, running on various hardware and OS combinations. The subject of "Intel" and "driver vendors screwing up" naturally comes to mind, and there is one definite bug I've encountered so far. It turns out that you need to do a glEnable (GL_TEXTURE_2D) on the appropriate TMU if you're going to use any texcoord set other than vertex.texcoord (or vertex.texcoord, or vertex.attrib, which are the same thing anyway). There are a few cases where I pass multiple sets of texcoords into a vertex shader, and a few where I generate extra sets of texcoords on the fly in a vertex shader, and these worked fine of everything (including other Intels) aside from this one specific driver. Even more interestingly, they also work fine of you use regular vertex arrays, but don't work if you use glEnableVertexAttribArrayARB/glVertexAttribPointerARB (I haven't tested with immediate mode) - only the first set of texcoords gets passed on, and the others are read by the fragment shader as (0,0).
I've reported driver bugs to Intel before (and even gotten responses from them!) but this time out, well it's an older chip (a 945) using older extensions so I don't think it's going to be too high on their priority list. Plus Doom 3 also uses glEnable (GL_TEXTURE_2D) here too (although it uses regular vertex arrays and not glEnableVertexAttribArrayARB/glVertexAttribPointerARB)...
I'm not fully decided which route I'll take with RMQ. Vertex attrib arrays make a LOT of tedious crap with vertex arrays Just Go Away (including glClientActiveTexture - yayyy!) but this Intel bug gives me a slight bad vibe. The workaroud works though, and with or without it everything works on everthing else, so I'm leaning towards attrib arrays.
(An interesting alternative is to enable glShadeModel (GL_SMOOTH) and pass the texcoords out in the color attrib slots, but color interpolation seems to happen at a lower precision than texcoords, so it's not really viable. Plus it's evil and ugly. Another option was to pack all 4 texcoords for 2 stages into GL_TEXTURE0 and split them at the fragment shader level, but there is one case where I need 3 sets of texcoords so that won't work.)
And back to Doom 3. I've had a look at it's light shaders via GL Intercept; combined with a GL Intercept log of it's OpenGL calls and some texture images, it should be possible to do a test program that does Doom 3 lighting. I might even modify GL Intercept sometime to capture calls to glBuffer(Sub)Data and write out the buffer data to file, or I might just dump some static data from Quake. Realtime Doom 3 lighting has now become something that's an intriguing possibility, although the legalities of reverse-engineering the renderer in this manner (in advance of the GPL source release later on this year) are somewhat dubious, and until I gather more data on the whole process it's all purely theoretical anyway. Interesting enough though.
Curious Doom 3 fact: the renderer only uses one light per surface.
Useful skill learned: being able to convert from ARB assembly to GLSL/HLSL and back on the fly.
Posted by mhquake at 7:17 PM
Thursday, August 18, 2011
DirectQ is likely going to get a patch release maybe sometime next week; there are a bunch of small updates which individually are nothing major, but taken together warrant it.
My D3D 11 experiments continued a while back, but have now stalled again. I got texture loading done - it's not a nice API for loading textures (to say the least) but I managed to get something that looks reasonably sane wrapped around the raw API calls.
Been playing with GL_ARB_vertex_program and GL_ARB_fragment_program. These are the old assembly language shaders that predated GLSL, and I have to say that my impression of them is fairly - well - impressive. The OpenGL interface for them is clean and minimal, and the shading language is simple and unambiguous - both of which compare favourably to GLSL which seems like a madhouse sometimes. There's also less for driver vendors to mess about with (and - inevitably - mess up). As an experiment I ported a stock GLQuake to using them throughout the renderer in maybe half a day (already having the required algorithms for water and sky warps in my GLSL/HLSL code helped a lot here). I've since replaced the water and sky GLSL in RMQ with these, and am considering doing a full shader version of the engine - if only I could convince the team to agree to going shaders-only. :(
There's more but I'll save that for another day.
Posted by mhquake at 8:32 PM
Sunday, August 14, 2011
Squished another Nehahra bug. It seems as though, if you have a model called "player.mdl", the most logical thing in the world is to use it for non-players (and, of course, use something completely different for the player).
OK, I'm going to need to get my thoughts together and write up something outlining my official position on support for Nehahra "features".
Posted by mhquake at 3:12 PM
Friday, August 12, 2011
Now that DirectQ 1.8.8 is out, it's getting time to start hitting at the RMQ big map problem. Bottom line is that we need to support over 64k clipnodes, and probably vertexes (this one was hit twice but the mappers hacked around it) and a few other things too. This is pretty much a non-negotiable requirement, and in the absence of any indication from the community as to what an acceptable solution might look like, I'm just going to do what seems right to me.
Right now what I'm thinking of doing is extending the BSP 29 format to support these higher limits. The extended format will consist of taking places in the BSP file format which currently use short (or unsigned short) for a data type (thus enforcing the 64k limit) and switching it to int.
The advantages I see of this approach are:
- The map file format stays the same. You can still use your favourite map editor without changes so this part of the content creation pipeline stays as it always was.
- Tools modifications are minimal. It's just a few data type changes. Any tool that uses big static arrays will need more in-depth changes to it's memory management, of course.
- Engine modifications are minimal. The in-memory formats can be changed to support both original BSP 29 and the extended version. Only the loaders need more work, and that's quite lightweight.
It's worth noting that other options were explored, including using other BSP formats such as Half-Life, Quake II or Q3A. In all cases they involved some or all of heavy engine work, switching tools (especially map editors), map format conversions, not actually solving the problem at all (HL and Q2 retain the 64k limits) or missing essential features (no animating lightstyles in Q3A).
So that's how that one currently looks. The only thing left to mention is that a different BSP version number will be needed, of course, but that's something that will be decided later on. It might be a good idea at this stage to put a proper binary ID in it (Q1 BSP never had one) and also a set of flags for optional content (such as RGB lightmaps directly in the BSP), but that latter is more of a "nice to have" feature than anything else. Priority is fixing the limits.
Before I launch into that I took some time out to play some more with my experimental software Quake work. This is just something I like to do on and off, not a serious project that will have a releasable end-result. I think everyone should play with software Quake every now and then; it's good to reset some of your assumptions, and quite refreshing to work in an environment where you've got a completely different set of capabilities (some better, some worse).
Today's thing was a 32-bit renderer for it. It's actually shockingly easy to put a 32-bit renderer on software Quake, just a matter of walking through a few code files and making some small changes. The downside is that you lose a lot of the asm, and half of your framerate as a result, but otherwise it's cool.
It's not really 32-bit, of course. All it does at this stage is a d_8to24table lookup, so everything is still sourced from 8-bit data. But it's a nice first step towards getting proper lightmap blending in, for example.
Posted by mhquake at 1:14 AM
Thursday, August 11, 2011
Wednesday, August 10, 2011
Hoping to get the DirectQ release out tomorrow; it would have been today but I needed to look up some stuff in other source ports for cross-checking compatibility of some cvar names. While doing that I managed to sneak in a few extra last-minute bonus features.
Worked over screenshots a little more. PCX screenshots are now fully run-length encoded so size is down a little. I also added gamma adjustment to all screenshots (via scr_screenshot_gammaboost - named this for compatibility with Darkplaces, although it's really an adjustment rather than a boost) so you can calibrate your screenshot brightness if they're coming out too dark.
Speaking of gamma, here's a nice unplanned surprise. I managed to get my hands on a copy of the source code to IDGamma (here if you're interested) so I now have IDGamma gamma adjustment built in to the video options menu. This is 100% compatible with the original utility, so you can now tweak your gamma inna IDGamma stylee in-engine. Of course, because this modifies the palette file you'll need to restart the engine for the change to take effect, but on the plus side it leaves the original palette in your PAK file alone (and doesn't need to generate any new PAK files).
IDGamma ONLY works with native Quake textures!!! If you're using external textures it won't work on them.
So, release time coming soon. This will be the final version of 1.8.8; I'll be moving on to 1.8.9 after that, which will include the video mode stuff I talked about (what seems like years ago) and possibly some other stuff.
For various reasons this will be going up on the CodePlex site a few hours before I annouce it on here, so that's the place to watch. Till then.
Posted by mhquake at 10:17 PM
Tuesday, August 9, 2011
Getting back into the swing of things a little today; it's been over a month since I really sat down and did Quake code to any degree of seriousness, but fortunately I'm not too rusty.
To ease myself back I added PCX screenshots to DirectQ. This is a total non-feature as it's lower quality than TGA, BMP or PNG, has a bigger filesize than PNG, DDS and JPG, and takes longer to save than anything else (downsampling to 8bpp is slooooooooowwwwwwww), but it's still nice to have it as it was something that software Quake did.
It occurs to me that many folks may not be aware that DirectQ supports TGA, BMP, PNG, DDS, JPG and now PCX screenshot formats (via the - you guessed it - scr_screenshotformat cvar). I'm going to add that to the video options menu.
RMQ-wise I'm welcoming feedback from the community about what the most appropriate and acceptable approach to resolve the BSP clipnodes limit overflow is. I know the approach that I'd prefer to take, but opinions on the whole matter are always good to hear.
Posted by mhquake at 7:02 PM
Monday, August 8, 2011
Real-Life crap is still ongoing, but I'm going to be easing myself back in shortly so expect updates and releases to start coming more frequently soon.
First of these will be the full 1.8.8 release of DirectQ. This isn't really too much evolved from where I left it off, but does contain a handful of extra (important) fixes and features. That should happen any day now.
Next block of work will be some RMQ stuff. The cat is now out of the bag regarding what needs to be done here - yes, we have a map which completely blows apart the standard 64k clipnodes limit. And yes, this is totally legit and not on account of the usual suspects.
Obviously RMQ has now gone beyond the capabilities of the Quake BSP format (in reality it had done so a good while back, but some heroic efforts reined it in), so a solution of some description has to be found. It's worth noting here, by the way, that both HL BSP and Q2 BSP also have this same limit and so are not viable options. So what is to be done? Answers on a postcard, please.
Following that, or perhaps at the same time, I'll likely be doing some work on MHColour again. It's about a year or so since the last release of it, and I've had a few requests for some features that seem reasonable, so it's probably timely. This isn't something I'll make a definite guarantee on though; look on it as "I'll fit it in if I get the chance".
Doom 3 Source Release
With the recent announcement that the Doom 3 source code will be out later on this year, I guess one thing that people might like to know is whether or not I have any plans to do anything with it. Right now I can't say, it's far too early and I would definitely need to review the code post-release to determine if anything in it interests me.
One thing I did have an idea to do was to make a dumbed-down renderer for it; one that could run reasonably well on older, slower, not fully-featured hardware. OK, Doom 3 already has such a renderer, but my idea was to add some features such as baking the bumpmaps and specular maps into the diffuse textures, which would give quite reasonable visual quality at higher performance on older hardware. Possibly using D3D, possibly using OpenGL.
All of this is pure speculation of course. Like I said, I'll need to wait until I get my hands on the code and have a read over it before I can make any decision. The only thing that is certain is that if I did undertake a major Doom 3 project, something else would have to give way.
What this does highlight is an issue with more modern source releases. As rendering functionality moves from engine code to game content - in the form of shaders - it becomes more and more difficult to do meaningful source ports (at least so far as the renderer is concerned). Oh sure, I could do a D3D9 port with HLSL shaders, but it would be only useful for running the original game. If a mod (and I think this is even the case with RoE) used different shaders then any port I might make would be useless for that.
So the real value of the Doom 3 code for me would be a source to mine for ideas, and I'll most likely leave the more exciting work with it for other people. Things like soft shadows, better multiplayer, bug fixes and performance enhancements will most likely be early features, and I look forward to seeing what people come up with.
On the other hand older game source presents more opportunities, and updating old classics to run on modern operating systems is both worthwhile and fun. In other words I'd probably chew off my right arm for a copy of the original Tomb Raider code.
Posted by mhquake at 6:25 PM
Monday, August 1, 2011
Poking my head above the parapet to follow up on my previous post. As usual, Raymond Chen said it first and said it better, so here's a pretty essential piece of additional reading: http://blogs.msdn.com/b/oldnewthing/archive/2010/05/31/10017567.aspx.
The comments are also good for seeing how different perspectives work out (or don't work out, as the case may be). You'll need to fight through some blatant trolling though.
Posted by mhquake at 11:43 PM