I didn't want to knock the bugs clearing house off the top of the page for a while, so hence the lack of recent updates.

The D3D11 port is now fully up to the old speeds and considerably exceeding them in heavier scenes. Functionally it's nearing completion; fog, coronas and player model colourmapping are left and I think that's more or less it.
I've already taken multiple passes over various parts of the code and cleaned things up a lot, tuned for better performance, quality and stability, and generally improved the whole thing. There are still some outstanding bugs and glitches with various parts of the core code which will need further work before I consider it "done", and then on to future work. I'll talk about that later on, today is for the renderer.
Brush surfaces now share a 100% common code path for absolutely everything (aside from some initial setup). I had made some strides towards that in previous versions and achieved it quite recently. This has resulted in an absolutely huge code reduction - my "d3d_brush.cpp" is down to 535 lines (from 956) with the first 368 of those being loading and object creation. This is a theme I've noticed throughout my use of D3D11 - loading and creating stuff takes a larger amount of code, but actually drawing is a substantial reduction.
Being able to drop the software T&L fallbacks helped a lot here, but the big win has been from treating everything as a texture chain. That doesn't actually need D3D11 but having it certainly helps as it's a much cleaner and tighter API for the actual drawing stuff.
Lightmaps have gone through a few evolutions, with a very early stage (in the experimental engine) being entirely on the GPU, before I decided to push creation and modification back to the CPU for the real port. There are about 3 or 4 different ways to update a lightmap texture, and in the end I went for the one that all the hardware vendors advise is the slowest (UpdateSubresource) - simply because for the Quake lightmap use case it benchmarked as the fastest on all hardware.
I believe that this is because the UpdateSubresource call will manage resource contention automatically for you, attempting to do an asynchronous transfer whereever possible (one of the reasons why D3D doesn't have any equivalent of OpenGL's PBOs is because it doesn't need one), which is where the real bottleneck in Quake lightmaps is. This renderer may not even need GPU lightmaps.
Earlier on I successfully multithreaded the lightmap update, but I've since removed it as it wasn't a performance win and had started piling up complexity. I'd still like to exploit multithreading some in the renderer - especially because D3D11 makes it so easy - but I haven't yet found any case where the tradeoffs are worth it.
Particles are hybrid CPU/GPU. All the info needed to draw and position a particle (set at it's spawn time and never changed) is written into a dynamic vertex buffer at draw time, then velocity and gravity are applied on the GPU and the point is expanded to a quad in a geometry shader. This is something I've longed to do ever since the first implementation of DirectQ in D3D9, and you can see evolutions of the code working around it and trying to emulated it in older sources.
Sprites and the 2D GUI stuff also use geometry shaders, as will coronas.
The underwater warp was most recently done and is quite cool. D3D11 seems to be more fillrate-efficient for render to texture than 9 was, and having none of the BeginScene/EndScene crap (as well as completely generalizing the texture/render target interfaces) makes things a lot cleaner. Otherwise it should look and work much the same way as the old one.
The next steps involve continuing to tune, tweak, and bring on missing functionality.
Aside from that, and following my recent hardware failure, I'm now on AMD graphics for a short while. That's a barrel of laughs - I know that they have a reputation as the "enthusiast's choice" but if being an "enthusiast" means having to constantly struggle with drivers, settings, reinstallations, random weirdnesses, downloading dodgy overclocked drivers from Joe Random on the internet, etc, then I'd rather opt out, ta very much. I think I've spent more time at this stuff than at actually being productive in recent days, and I'm looking forward to getting back to NVIDIA as soon as possible.
Tuesday, February 28, 2012
IN YER FACE!
Posted by
mhquake
at
8:51 PM
Subscribe to:
Post Comments (Atom)
5 comments:
"There are about 3 or 4 different ways to update a lightmap texture, and in the end I went for the one that all the hardware vendors advise is the slowest (UpdateSubresource) - simply because for the Quake lightmap use case it benchmarked as the fastest on all hardware."
Late last year you mentioned that you felt GPU lightmap updates offered a good trade-off by decreasing best-case performance and increasing worst-case performance. From my perspective, that seems like a Very Good Thing, as it should reduce the occurrence of lightmap updates taking an inordinately long time in dynamics-heavy scenes — from what I gathered pretty significantly in some cases. That should improve on the 'smoothness' of the rendering by limiting the number of frames which take much longer to complete than previous and following frames, correct?
My question, then, is: Is the best-case performance not high enough to allow for the use of GPU lightmap updates?
There are advantages and disadvantages to all approaches. GPU lightmaps are great for animating styles, but crap for true dynamic lights (rocket explosions/etc). RMQ fakes it by setting up some vertex lighting in software but that suffers from the traditional flaws of vertex lighting, my previous D3D11 experiments did them for real but limited number of dynamics to 32 (because that's all that will fit in an int that needs to be passed from vertex shader to pixel shader). The shader overhead really became expensive as number of dynamics started piling up too - it gets much worse than the CPU case. A lot of people use DirectQ for multiplayer and that kind of pile-up is unacceptable in a busy firefight.
It's also possible to do something with attenuation maps but then you need to switch to 64-bit render targets or you lose dynamic range. A 64-bit render target will be slower than 32-bit, and the fillrate overhead of using render-to-texture for dynamic lighting would be too much to justify it (if you were already doing this for e.g. HDR stuff you could get away with it).
The way I see it now is that RMQ is a special-case content set that justifies the tradeoff, but for general-purpose use it's not justified. If it wasn't for dynamics I'd do it tomorrow (hell, I'd do it right now!); with dynamics you need to pick which side of all the tradeoffs involved you want to be on.
It's worth noting that the UpdateSubresource method I currently use resolves one of the major bottlenecks in traditional lightmap updating - resource contention and CPU/GPU syncing - so on balance it comes out roughly even - it can go up to twice as fast as the comparable D3D9/OpenGL code in many cases, removes the need to go one frame out of sync with lightmap updates, and is generally a lot cleaner.
There's probably a happy middle-ground where the GPU is used for animating styles but the CPU + texture upload is used for dynamics. That may be worth exploring at a later date. It's not an option for RMQ (I'm already pushing the hardware requirements as is) but would be for this engine.
"I'm now on AMD graphics for a short while. That's a barrel of laughs - I know that they have a reputation as the "enthusiast's choice" but if being an "enthusiast" means having to constantly struggle with drivers, settings, reinstallations, random weirdnesses, downloading dodgy overclocked drivers from Joe Random on the internet, etc, then I'd rather opt out, ta very much."
Surely it can't be that bad, can it? You just uninstall old nVidia Drivers, clean up any remnants with a good driver cleaner, then install AMD Card and drivers. Then you just do yaw thang; no need to overclock this and overclock that or futz around with dodgy drivers.
If you're having trouble than likely story is you've got a silly factory overclocked card from somewhere. I know how it feels. I once had a factory overclocked Palit nVidia card in my comp. and it gave me all kinds of grief. The thing constantly made my machine freeze and crash. The card ended up cooking itself so I tossed it and replaced it with a bog-standard reference design NON OVERCLOCKED Galaxy nVidia Card. My comp's been fine ever since.
I've used Nvidia exclusively up until recently when I got a 6870. This cards has it's pros, I think it still has the best performance to power consumption. But, Nvidia is just so, so much more solid on the driver side, but AMD is trying, I know they are. My next card will likely be Nvidia. Mmmmm, like a GTX 790 *drools*.
AMD are nice enough, and I've had some good experiences with one in a PC in work, but there are enough rough edges in the AMD ownership experience to make it grate.
I'm coming from a background of having been an NVIDIA user since the TNT2 days so part of it is a change to something totally different and unfamiliar, I'll admit that.
The comment about using a driver cleaner is significant here I think. That's something that you never have to do with NVIDIA. The process was seamless and totally pain-free. AMD ownership right now just seems to involve a bit too much digging around in the murkier recesses of one's PC for comfort, especially in 2012 when this kind of thing should have been put to death a long time ago.
They are definitely improving though - I'm actually getting good OpenGL from it, which is something I would have despaired of not too long ago - and I wish them every success going forward. It just remains to be seen whether or not I'll be onboard in the immediate future.
Post a Comment