The roadmap said one should do groundwork, glow/bloom, and a smoothed minimap. All this was done, and in addition, the weather system insulted my pet octopus, so now we have updated rain, snow, and grass (and full wind simulation).
Let's start with the rain. Just as any lvl 30 mage worth their bath salt, the mere presence of me triggered interesting gravitational effects:
The previous rain was done with five cylinders, encircling our player like a Russian matryoshka doll. Each cylinder was set to display both sides, and to use a scrolling rain texture.
While working, this method came with two downsides: it was repetitive (streaks always fell at the same spot), and it was a fillrate killer (ticket 892, among others).
Drawing every pixel on the screen five times, with alpha blending on (= 2x the cost, by having to do a read + write instead of mere write), really hit the weaker, bandwidth-low cards. Integrated ones especially bad, as system memory tends to be slower than dedicated VRAM.
So, for a 1024x768 window, 10 fullscreen draws equals 30mb of traffic. Using an integrated system with DDR2-400 RAM as an example, the system RAM can move 3.2 GB (3.05GiB)/s. If everything were optimally placed (we wish), it would take 9.8ms to draw the rain for one frame. 59% of the 16.6ms allowed budget, if the memory placement were optimal (add 10-20% of overhead to that for a more realistic number). This is also not counting the rain texture read.
Ah, we have a question from the audience: "But it's mostly see-through transparent pixels. Surely those are lighter than colorful pixels in the rain streaks?" - no, they aren't. Even if caught by the alpha test, that would only save the final write, the texture read would still occur, the fragment pipeline would still run. So lighter, but not by that much.
Before - after:
So, what was done? Instead of having five scrolling screens, we now model every raindrop, and it upped the average fps from 30 to 45 on a test setup. Yes, you read that right, every drop. There are about 2500 drops. The simulation is fully done on the GPU, and where each drop falls varies as time passes.
By drawing each drop with a point sprite, we send only one vertex per drop, and overdraw is minimal compared to the previous solution. Updating the rain texture from 512x512 to 32x32 (from covering a wall of rain to a single drop) also did its part, now the texture fits entirely into cache.
The average overdraw should be close to one now. Instead of the previous 10 fullscreen moves, we now move about 2.
The observant reader will note that point sprites are squares, not rectangles, and so there is still half of the space wasted in fully transparent area. This is true, a 16x32 texture with a rectangle billboard would be more efficient fragment-wise. However, it would have four times the vertex load, due to having to send each corner instead of just the center point.
Moving to the next topic, snow. Snow was handled as cpu-side particles, but they fell straight down, and always faced the player, giving them an artificial look.
So what was done with these? Nothing more than fixing the two lacking points. Each flake was made affected by the wind (more on it in the grass section), and to spin around randomly.
Making the flakes spin was an effort a little more involved. Irrlicht's particle system only does billboards, that is, squares that always face the player. They don't rotate, period.
This left yours truly stumped for a few hours, pondering how to work around the limitation. The solution came in the form of texture magic, brought to you by the symmetric nature of the snow flake. By flipping the texture in-place by various amounts, it gives the appearance of actually spinning snowflakes.
The improvement was profound. You'll need to see it in action, the still screenshots don't fully do it justice.
The grass, which was made to wave in the as-of-yet-unreleased-level Lighthouse, had some limitations. First, there was no wind simulation, it was a sine wave, very repetitive to the eyes. Second, the wind was applied on one axis only, adding to the repetitiveness.
As the first stop towards summoning Jupiter, a proper wind simulation was implemented, via the scroll of 2D Simplex Noise. It blew in all directions, at all strengths, bringing a.. uhm.. wind-like quality to the uhm.. wind.
Then, the new wind was applied to the grass (and trees, and vines, and...) with a nicer shader. Nothing more to report on this topic, for pictures of Lighthouse see the previous post by Samuncle.
Turning our attention to the minimap in the lower-left corner, we notice a certain quality of jagginess. Assuming the player had MSAA enabled, it would look nicer, but with the target group of more casual userbase, not many have the hardware power to do so. Of course, smooth maps are as close to a constitutional right as any, so something had to be done.
After a few rounds of prototyping, the final formula ended up being so: render the minimap at double the original resolution, blur it with a Gaussian blur with a radius of 3 pixels, and make sure it has trilinear filtering enabled.
These steps eliminate most of the jaggies, while keeping the edges sharp enough, giving a nicer look in total.
At this point, the remaining items on the checklist were glow and bloom. However, our resident master artist had a request: mipmap level visualization, so that artists can determine which textures are of too low resolution, and which are too high.
The formula and colors are those from Unity, publicized by Aras P. A big set of thanks to him!
The blue means too low resolution, and red means too high. Original color means just perfect. Of course the measurement varies by the user's resolution, and how far is one looking at the object from. Therefore any decisions based on this should be done after looking at multiple resolutions, and getting as close & far as possible from the item in question.
The analysis on Hacienda showed that the rope had a far too high resolution - memory could be saved there - while the ground and the wheels of most karts were of too low resolution.
The implementation differs a bit from that of Aras; it is not done by texture, but entirely in the shader, giving more freedom and making it easier to apply. If we were happy with a DX10 requirement, even the textureSize uniform could be skipped (GL 3.0 introduced a function to query a texture's size inside the shader), but as we have to override the shader anyway to show this, sending the uniform is little additional effort, and enables it to run on GL 2 hardware.
Obviously there is a speed hit caused by calculating the mipmap level in the shader compared to letting the texture hardware handle that, but seeing as this is a debug option for artists, extreme speed is not needed. Ease of implementation, by not requiring the extra texture, was worth more in this case.
Next up, bloom. No, not her. The technique.
This one is fairly standard fare; a post-processing light bleed effect. We start by capturing the bright areas of the rendered scene. This bloom picture is then progressively minified to 1/8th the size (1/64 the area) halving the resolution at each step, losing no information.
Indeed, such a progressively minified image is of greatly better quality than if we had rendered the captured bloom to the same size directly. The cost of minification is very small in the total effect.
At the small resolution, we then blur it with a Gaussian blur of radius 6. Blurring at a lower resolution comes with the usual benefit that it looks like a much larger blur applied to the full image, at several times faster speed.
This blurred image is then additively blended on top of the scene. There are some artifacts from bilinear sampling, but they are invisible to the human eye in practise, due to the brightness's effect on the eye.
For the final item on today's list, we have glow. There's been some confusion with these terms, so let it be known how I define them: Bloom is a post-processing effect, applied to the whole image, to bleed bright light over, improving the look of sunsets, lava, explosions, and many other parts.
Glow is the object emitting light that is visible without a media - so more of an aura, or a light outline. Glow does not appear on top of the object, but only outside it.
I'm especially happy about the dynamic pink in that last picture. Certainly gives the final finish to any barrel of wheat! (don't be alarmed, the colors are intentionally bad as mentioned in the forum, gently encouraging the more artistic among us to change them ;)
I'm sure everyone is itching to hear how this works under the hood. No? I'll tell you anyway.
At each frame, we build a list of glowing objects and their glow colors. Each object is then drawn to a temporary texture, fully clothed with their given color, updating the stencil along the way for free.
This colorful pastel buffer is then progressively minified, this time to 1/4th resolution. A 6-radius Gaussian blur is applied to spread the glow around the object itself. This glow texture is then given to the list of glowing nodes built earlier.
In order for these glow nodes to behave properly in the scene's depth order, they are drawn by container geometry in the transparent pass. At first I tried to use the objects themselves, scaled to a larger size, but that failed soon enough, as no object was modeled around the origin (!). So the next best thing was used, a small sphere (~20 polys) placed at the object's real origin, not the placed origin.
Why does it not overlay the whole objects then, nobody asked. I'm glad you didn't ask, as this is where the stencil magic comes in. Having marked the stencil earlier for "these spots do not glow", we now ask the stencil to block our drawing to those parts, just for the container nodes.
Altogether, it provides a very nice effect, answering the issue that nitros and other objects were not easily visible in darker levels, such as the sand/egypt track.
Phew, if you made it this far, you're probably too tired to leave a comment. That's ok, I'm too tired to read them until Monday. Have a nice weekend!