Been working over the interpolation code some, and a few important insights are coming out. This one could get bumpy. :)
The first one is an old one, and it's that movement interpolation should only be applied to MOVETYPE_STEP entities. This is correctly reflected in the FitzQuake and DarkPlaces source code (and possibly a few other engines) and on it's own automatically resolves a lot of the weirdness and special cases that otherwise needed workarounds in the original interpolation tutorial code. Some other odd behaviours still exist, which is where the rest of this stuff comes in.
So from here we need to start looking at the nature of movement interpolation itself. This next one is actually hinted at in the Quake source code but made explicit by the Quake II code, and it's that movement interpolation should be synchronized with frame interpolation.
The reasoning behind this is that MOVETYPE_STEP entities move by - you guessed it - stepping. So the movement needs to begin at the same time as the step animation begins, and likewise end at the same time as the step animation ends. Quake II movement interpolation works that way, and in Quake the "don't mess up the step animation" comment on setting the U_NOLERP flag suggests that the same also applies here.
A third one is that - as should be obvious - step animations only occur in the horizontal planes, and so movement interpolation should also be done in only those planes. This is more of a deduction on my part than being based on any solid evidence from either code-base, but the end-result bears it out.
That end result is that absolutely everything to do with movement interpolation now appears to be cleanly resolved. Monsters step smoothly, non-monsters move smoothly, Scrags no longer fall jerkily from the sky, and Hell Knights in e4m6 now gracefully ascend on platforms (without even the little sink-in that standard movement interpolation gives).
With hindsight the whole interpolation drama was a victim of initial over-enthusiasm in the early days of Quake engine coding - a feeling that "interpolation is so awesome that it should be applied to everything!" - even things where it's either not appropriate or causes problems.
There's still a bit more to be worked out, but it's all just icing on the cake from here. One in particular is to switch rotation interpolation from Euler angles to Quaternions (the rotation transform itself is already done using Quaternions so this is just a matter of moving the conversion to earlier in the process) and slerp them rather than using linear interpolation so as to get smoother rotation movement.
A nice result.
Monday, May 21, 2012
Been working over the interpolation code some, and a few important insights are coming out. This one could get bumpy. :)
Posted by mhquake at 9:11 PM
Friday, May 11, 2012
As well as the awesome stuff that does work, I thought it might be interesting to talk a little about areas where I've screwed up. As well as giving the "tried that, it didn't work" perspective on some things that might become feature requests, as well as offering an explanation for why some things might be the way they are, it's also fun to look at some of the things I've tried that were - in retrospect - plain stupid, trying too hard to be clever or cute, or quite obviously heading for a trainwreck.
It also illustrates some things that might be bad but the alternative is worse. You'll see....
One past example of this was my early attempts at trying to detect if a player could see through water and allowing r_wateralpha to be set depending on the result. Edge cases and special cases just kept on mounting up, the whole thing was becoming far too complex, and despite everything it was still fragile and throwing up unwanted failure cases. There was some wisdom in knowing when to just walk away from something that wasn't working, and it took me a long time to realise it.
Here's another one (the one I promised you'd see) that just came up.
Interpolating Particle Ramp Colours
These are used for some particle effects, and give the result of the particle colour shifting over time with rocket trails and explosions. In vanilla Quake it's an abrupt shift from one colour to another after a certain amount of time (half a second or so) has passed.
So in a fit of "wouldn't it be cool if..." I decided to try interpolating between the colours. On paper this looks awesome - it's easy to set up, can be quickly done on the GPU, has no measurable performance overhead, and seems like a subtle but worthwhile visual quality improvement.
In reality however the common case is rocket trails, and rocket trails contain a sudden shift from orange to grey. When you interpolate those the end result is that these particles spend a small but significant portion of their lifetimes as a rather nasty shade of green.
There's probably a workable solution in there - something like only interpolating between two colours if they lie on the same row of the Quake palette, otherwise do a sudden shift - but with discretion being the better part of valour, this feature was grabbed by the ears and unceremoniously ripped out.
For now a clear case of accepting second best because a technically better and quite elegant alternative turned out to have completely unwanted consequences. It's not been the first and I'm sure that it won't be the last.
Posted by mhquake at 12:59 AM
Wednesday, May 9, 2012
MDL instancing was done and removed. It gave a decent enough balance of being faster when there was lots of MDLs on screen but slower when there was fewer, but one of the major problems was that the only case where it benchmarked appreciably faster was a stupidly excessive one - my 400 Knights Death Map. Draw calls in D3D11 have low enough overhead that you need to go to in or around this kind of level before it's worthwhile.
If that was all I would have probably kept it, but it turned out that the one genuine gameplay case I was hoping to improve - the overuse of the Quoth longtorch model in ne_tower - turned out a good deal slower.
So when it came down to the final decision, a real gameplay case outweighs a synthetic benchmark any day of the week.
I'm retaining a copy of the code anyway as I may work over it again sometime with the intention of switching between instanced and non-instanced cases as required.
I've also heavily reworked the filesystem code, and cleaned up a lot of evil mess. The full memory-mapping for PAK files is there, as are some improvements in PK3 handling. Load speeds are up a little; not too much but enough. Memory-mapping also makes file reading become a LOT simpler and cleaner - it's now just a straight memcpy to the destination rather than having to allocate a buffer, read it in, copy over, clear the buffer.
I can work over that a bit more to get it better again, but the current cut of it is quite nice.
Speaking of PK3 files, I'm now quite convinced that the crashes I had reported when loading QRP textures from a PK3 file are nothing to do with my texture loader and everything to do with my old PK3 loader. When working over this part of the code, I discovered a few places that were - shall we say - somewhat less than robust.
The new loader is much better. Promise.
Posted by mhquake at 1:45 AM
Monday, May 7, 2012
Keep the comments coming in; they're providing a great source of information on what does or doesn't work. I don't always get the chance to reply directly to each, but all feedback and bug reports are definitely welcome and everything will be considered.
I've been playing a little with D3D11 instancing lately, and thinking over how it compares with both D3D9 and OpenGL. My general opinion is that D3D's separation of vertex layout from buffer and offset specification (which is present in both 9 and 11) is superior to OpenGL, but instancing is one case where the OpenGL method is cleaner and clearer.
My current thinking is that I'm going to use an instanced path for all MDLs and Sprites. I already have instancing written and working for sprites, but just drawing one sprite per-instance, so the batching element remains to be done.
Of course instancing has it's own overhead, but this is a balancing act. You can switch between instanced and non-instanced drawing, but then you get the overhead of this switching to handle as well. You also need to consider where to set the cutoff point beyond which you switch to instancing. Keeping everything on the instanced path and hoping that the runtime and drivers handle it sensibly seems a better approach overall.
This may change of course depending on how benchmarks work out. :)
Particles are something else that may benefit from instancing, but I'm leaning more towards keeping my current geometry shader approach. There is a certain amount of extra calculation that needs to be done per-particle, and not using a GS means that it would need to be done 4 times (once per-vertex) rather than just once. It's worth noting that I use instancing for particles in my GL3.3 Quake II engine, but that still performs all of the gravity/velocity calculations on the CPU (DirectQ 2.0.0 does them on the GPU, which is where most of the extra calculation I mentioned comes from). DirectQ 1.9.x and earlier also used a form of instancing for particles, but it was a bit hacky (mostly to still support SM2 hardware).
I'm removing geometry shaders from the 2D GUI stuff and from Sprites (the latter already done). It's unfortunate that I'd let myself get so far down this route, particularly with the 2D stuff, as there is quite a job of work now involved in their removal, but it's the right decision. The initial use of geometry shaders for these was one of those "hey, I wonder if you can do this..." moments rather than being based on sound technical reasoning.
The filesystem code is likely to make a transition to using memory-mapped files, which should result in much improved loading times. I currently have a naive implementation of this written and working, which just creates a memory mapping, maps the file, allocates a buffer, copies the file into the buffer, then takes everything down. It's no faster nor slower than using classic file I/O, so that's a good indication that improvements will come when I do it right. In particular I intend leaving PAK file handles open and memory mappings active throughout the lifetime of the engine running (no, this doesn't use any additional memory; that only happens when you map a view) so a lot of extra setup and tear-down work involved in loading a file will go away.
These advantages will only apply to PAK files - PK3s are a little more complex owing to the need to go through unzipping code, and it's unrealistic to leave file handles open for raw filesystem files.
Some framerate-dependencies in some areas have been identified and removed. Generally these occurred in cases where a delta frametime was accumulated rather than an absolute difference between current time and a start time being used. One slightly embarrassing consequence of that was that bonus and damage flashes lasted a LOT shorter than they should have when you were running fast (and a lot longer if running slow). All fixed now.
I'm considering removing the host_maxfps cvar. This is just a thought that occurred to me rather than something I'm set on doing, but a lot of things would get simpler if it went away. What would happen instead is that the engine would by default run flat-out, and various other methods of throttling framerate (if you don't want to run your GPU too fast, for example, or if you were running on battery) would be available.
These would include enabling vsync or using the new sys_sleep cvar to specify an amount of time to sleep for each frame. I've tested both of these on a netbook and they work great, with the engine still feeling incredibly smooth and responsive (and generating next to no extra heat!)
(Aside: one of the really nice things about D3D11 is that vsync is just a parameter to the Present call, so it doesn't require a mode change to enable/disable, and can be easily and automatically switched off when you're running a timedemo.)
By contrast, host_maxfps 72 will just spin in a tight loop, so it doesn't achieve much to save on either CPU usage or battery, and has nowhere near the same smoothness. Even vsync enabled with a refresh rate of 60 feels just as smooth as smooth as running at 1000fps.
Like I said, I'm still not totally set on this approach, so I'm open to comments and suggestions here. Any arguments in favour of host_maxfps versus vsync and/or sys_sleep?
Posted by mhquake at 1:47 PM
Thursday, May 3, 2012
The GL3.3 Quake 2 engine is now running faster than DirectQ. It timedemos a few percent slower for sure, but when you factor in Quake II's higher polycount and overall greater scene detail it comes out a bit faster. What's really odd about this is that my previous experience has shown D3D code to be consistently 10% to 25% faster than the equivalent GL code, and what's even more odd is that this was benchmarked on AMD graphics.
It's obvious that something unusual is happening here, and theories abound. Either the AMD driver really loves this style of GL code, or something funny is happening in the DirectQ code. What might support the latter is that the current DirectQ code uses a few rendering techniques - such as geometry shaders for particles and the HUD/menu/console stuff - that may be suboptimal for the use cases I have.
Another possible explanation is that other parts of the Quake II engine are just more efficient than Quake (not having to interpret QC bytecode is one immediately obvious example), and this efficiency is starting to come through now that the old GL 1.1/1.2 bottlenecks have been removed.
Either way it's an interesting result.
Ambient occlusion - I've had a request to implement it in DirectQ. It's a nice idea but I don't think it's going to happen, for various reasons. Ambient occlusion is not just something that you switch on or off with a few lines of code - it needs heavy work on supporting infrastructure and framework, and requires a fairly generalized render-to-texture/multiple-rendertarget setup to exist in the engine first.
This ties in a little with ongoing discussion in the comments about DirectQ's performance versus DarkPlaces, and a wish for something with DirectQ performance but DarkPlaces looks.
What's important to realize here is that one of the main reasons why DarkPlaces can run slow is precisely because it supports all this extra eyecandy. Implementing it in DirectQ would very likely just result in a DirectQ that runs slow too. You can't have extra effects without performance impact as a tradeoff.
XP support - my ability to consistently support Windows XP is now limited to testing on a VM. That kinda sucks but it's also useful as performance testing on a VM can bring out factors that don't show up otherwise - in the past I found that it identified a bottleneck in drawing console characters that I was completely unaware of on a more mainstream platform.
Of course, with the move to D3D11 XP support will be gone, but I still want to retain something in the 1.9.x series - I won't be able to consistently support it, but at least it will be there.
A comment was made about "a version of DirectQ that doesn't crash on XP" and I need to address this. No released version of DirectQ crashes on XP; the cause of any crashes or other issues people may experience is nothing to do with XP and everything to do with other factors - misbehaving external content, bad config settings, weird hardware, conflicts with other software on the system, or whatever; it's important to state that every released version has been tested on XP and would not have been released if it didn't work.
I'm a little bit pissed about this kind of comment so I'll say nothing more on that.
Releases - I've confirmed one bug fix for 1.9.1 but another late-breaking bug has cropped up. I need to see if I can reproduce it before I can proceed with that. I was going to do a source-code-only public pre-beta release for DirectQ 2.0.0 and the GL3.3 Quake II engine recently, but I'm now holding off on that for another while as work continues. I'm still keen on the idea of source-code-only releases though, so it's something I'd hope to do sooner rather than later.
Posted by mhquake at 7:26 PM
Tuesday, May 1, 2012
I've resolved many of the problems that were reported with 1.9.0, and - despite being unable to reproduce the problems myself - have identified probable causes for some others, so it's looking as though the patch release will be able to come soon.
Posted by mhquake at 12:45 AM