Wednesday, June 6, 2012

Memory Allocation

Worked over some of the memory allocation relating to short-lived objects at runtime.  Previously these would have mostly been in their own pre-allocated memory buffers; now I just have a single big buffer (it starts at 1mb but can grow to 512mb if needed) - that also doubles up as a temp staging buffer for load time stuff - that these objects pull memory from as required, and that resets back to the 0 mark at the end of each frame.

This gives a much cleaner, more robust and slightly faster end result.  Framerates are up a percent or so on account of this, and objects just come in and out of the scene perfectly naturally and normally.  A lot of slightly hairy cases where I knew it should have been OK but still had a bad feeling about it have also gone away.

That's a fairly cool achievement that has ramifications beyond it's immediate effect.  A key thing here I want to talk about is that percent-or-so performance improvement.

If I can get a 10% performance improvement with nasty, messy and unmaintainable code - I probably won't bother.

If I can get a 1% performance improvement with something clean and neat - I very probably will.

There's a school of thought that says that you shouldn't sweat over the 1% improvements; focus on the big stuff, and so on.  That's very valid in many ways - we are firmly in sub-millisecond territory here.

On the other hand, sub-millisecond can be incredibly important.  Depending on your hardware and the load that the map you're running puts on it, that sub-millisecond improvement can be critical.

Let's look at a hypothetical but not too far-fetched scenario.  You're running at a 60hz refresh rate with vsync enabled.  For arguments sake, let's say that each frame needs 16.6 milliseconds to draw; all is good and you'll get your 60fps.

Now let's say that something happens and as a result each frame is now taking 16.8 milliseconds.  That's a 0.2 millisecond difference, but it's enough to cause you to miss a vsync interval.  Suddenly you're no longer running at 60fps, you're running at 30.

That's a sub-millisecond difference that's caused a loss of half of your framerate, and when viewed on these terms sub-millisecond starts looking very important.  Obviously the rules have to be interpreted in context, and where context says that what otherwise seems like a no-brainer is in fact errant nonsense, then you need to ditch your preconceptions and view things differently.

All the same, that would in no way justify the scenario of 10% from nasty code I spun earlier on.  That's all context and tradeoffs again, and - in that case - it's more likely the case that if creating a mess gets you a performance improvement then you have some fundamental design flaw which needs to be corrected first.

All interesting stuff.

2 comments:

David A. said...

Are you essentially using a garbage collector now? Does Quake use one already? Not too familiar with the guts of this engine.

mhquake said...

It is effectively a (simplified) garbage collector, yes. Quake already uses something slightly similar, but it's per-map whereas I'm now doing it per-frame (with the appropriate modifications to enable it to work fast, of course).