July 26, 2013

Graphics 5-6: Anti-this, pro-that

Good evening our loyal supporters. Welcome back to the same spiderchannel, at the same spidertime.

This biweek's plan was MLAA and SSAO. In addition, some remaining issues with lighting were squashed, and some unplanned effects were added, business as usual.

Selective bloom, requested by artists, was added. Now you can mark objects as causing bloom, even if their native brightness is not enough to catch them in the normal siphoning.

The test object in this picture is not the best one; you will likely see this effect used in the green crystals of the mine, and the windows of the mansion. The aforementioned windows may get something else too, but that's a secret, so don't tell anyone.

This is done by keeping a list of all such marked objects, and rendering that list with color and depth writes off, only writing the stencil. This results in 4x-8x drawing speed, as most graphics cards have that path optimized for uses such as Z-prepass.

This stencil is then used to copy the affected pixels from the picture to the bloom texture.

The bloom threshold was made configurable per-track. As you can see in the picture, the left version is clearly not bloom over-use.

The right is the default threshold, left was altered a bit to show how glorious a bit of bloom looks like.

Lightmaps were enabled in the new system. Before, after, the map.

This is a splash of older tech, and quite laborous for the artists too. However four tracks currently have it, so it should be used there.

MLAA, or Morphological Anti-Aliasing, is a shader-based AA technique. This means it can run in situations where traditional MSAA cannot, or catch edges such as those inside transparent textures better. It can usually reach better quality at the same speed; the quoted numbers for this particular technique are MSAA 8x quality at MSAA 2x speed, but the exact specifics obviously depend on the scene.

While no longer the state of the art, Jimenez's MLAA was used due to the author's familiarity with the technique, and the very decent results it still achieves. The most common competitor, FXAA, is a bit faster, but results in inferior quality - it blurs textures noticeably, which should not happen.

A short overview of the technique follows. First, any jagged edges are detected and marked for processing using the stencil. This greatly improves the performance of the technique, since the heavy shader for the following pass is only run on the needed pixels.

In the second pass, each edge is classified according to a pre-generated map of edges. This map tells the following pass where to fetch information from - essentially, which pixels to combine and the weights for them.

The final pass simply overlays the combined pixels on the picture. Before - after:

Dear reader, meet JJ.

Have no fear though, the glistering brightness that fills you with awe is a bit toned-up in this picture; sadly, the version to make it in the game will be more subtle. This comes as a disappointment to fans everywhere, and there have already been mass protests of over a thousand people outside my residence.

Levels can now choose to have cloud shadows. This completely fake system only fits some levels with a bright, daytime sky with clouds, but there are a few of such levels, and it improves them quite a bit.

It's a fairly subtle effect, and so hard to catch from a still picture, but when moving in the wind it's quite nice.

Itching to hear more I bet? This very fake effect is done inside the sun light, as that's the only thing that should be affected by it. A straight planar top-down mapping is created based on the pixel's world-space position; it's offset by the wind, and then used for a look-up in a specially made texture.

The effect is quite cheap. It does come with the downside that as currently there is no shadow mapping, the cloud shadows (just like sunlight) are visible inside buildings, caves and such. This may limit the levels where this can be used a bit for now, but with the overworld and hacienda, those areas are few and having it there is not distracting.

Finally, SSAO, or Screen-Space Ambient Occlusion.

"Mama, I want contact shadows and dark creases" "Sure thing, hon"

Another fake effect with no real-world equivalent, SSAO usually enhances scenes quite a bit. It approximates global light bouncing by making approximate ray traces around a pixel, checking if those points can possibly occlude some light from reaching this pixel.

Like you can probably imagine, it is expensive. Several tradeoffs were made in order to get it to run on the majority of the user base; it should still look acceptable, while the FPS hit is much less than the usual implementation.

Low - high:
The high variant merely runs at full resolution, and no tuning has been done for that one, as the majority will run on low or off.

The low option runs on a quarter resolution, taking only a single sample per step, without randomness, or an edge-aware blur.

Each of the listed parts lessen its quality somewhat, while increasing speed. First of all, the lower resolution prevents small creases from being detected, limiting the effect only to contact shadows. The upside is that it's 16 times less area, directly resulting in a 16x speedup over the high version.

Second, being able to only take a single sample per step, the information we have to work with is the normal plus a linearized, 8-bit depth. This results in some false occlusions in case the occluder is outside the hemisphere; the alternative however, making positions available, would double the cost of this effect.

The randomness in SSAO is usually implemented via a small normal texture. Its cost (beyond flicker that always results from randomness) is that it changes the step's texture read to a dependent one - named using great wit since the coordinates depend on another texture read.

A dependent texture read is a costly operation, since it prevents caches from pre-fetching the actual read. The exact cost cannot be calculated in a highly parallel pipelined system, but assuming it delays each read by 300 instructions (a typical cache miss), a core clock of 400MHz, and 16 steps, each pixel is delayed by about 12 microseconds. The delay is highly variable due to inter-dependencies, so take that cost estimation with a pinch of salt.

The first couple programmable desktop GPUs could only prefetch textures based on coordinates from a varying (that is, export from the vertex shader) or a hardcoded value. This is still the case for many mobile GPUs, but any desktop GPU from about 2004 onwards supports a set of math operations, such as additions and multiplication, making our set of vectors completely pre-fetchable. Should the target be a mobile GPU, this limitation could be circumvented by making the calculation of those vectors in the vertex shader and passing them as varyings.

As for using a normal gaussian blur over an edge-aware one, it results in some of the occlusions bleeding over a bit, but is about 4x the speed of the alternative. A 2x component comes from the fact that the edge-aware blur cannot take advantage of the bilinear hardware, requiring it to do twice as many samples, and the other 2x from requiring a branch for each sample.

With this, we end today's infotainment. Tune back next time!


  1. Guys, will you give all karts specific properties (weight, speed, acceleration) like in the game Re-Volt? That would be awesome and it would motivate players to win challenges to gain access to these karts.

    1. That's indeed the plan. We'll probably need lots of help from the community tuning this so no kart has any significant advantage overall.

  2. I like the karts as-is; without differing characteristics as it makes it a level playing field. I'd rather not deal with kids fighting over who gets the fastest character. Maybe some different characteristics in a story-mode or something.

  3. Every kart would have its own advantages and disadvantages - the PHP elephant, for example, would be heavy but could not be pushed to side easily, plus it could have some special property, like the protection against being slammed into the ground by the green swatter thingy.

    2nd Anonymous: Why shouldn't they be allowed to use whatever kart they want? Why not allow using multiple same karts at once?

  4. Anonymous2 here again: I didn't think more than one player could go the same character, can they!?

    I don't really like class-based multiplayer systems, e.g. TeamFortress2 - its annoyingly complicated for a casual old-school gamer (me). I'd like to be able to select a PHP because it looks cute without having to consider whether the track would suit its characteristics or not - arguments may ensue/players give up mid race because a tracks favours one character's abilities over another. Players may get upset as they're going uphill slower than the others, or the boost doesn't make them go as fast, or they can't turn as quickly, or their weapons don't seem to affect the PHP elephant as much, or they get bumped for miles when the PHP crashes into them.

    I'm not 100% against the idea, maybe it'd be very good! This is opensource so no doubt it'll happen sometime but I'd like to have an option to use equally balanced characters still please! This game is, after all, meant to be accessible to younger players.


  5. (Anonymous1)

    Sure, this should be optional. I just think that it is more exciting to compete with different karts.

    PS: Although Re-Volt is an old game (it was released in 1999), it seems to me that it looks better than STK 0.8 for some reason (check out this video: https://www.youtube.com/watch?v=euJXzjxzEMg). I hope that the graphics in the next version will be improved. :P

  6. Anonymous2 here again: Thanks Cand for the blog post, well done! I enjoy following the progress with STK, the above was interesting & I hope these additional graphics features work well & don't slow down my old PC too much! :-)
    PS: I think I played the re-volt demo when it came out, I recognised the 1st map in the video. I remember getting much fun out of split-screen SkunnyKart [https://www.youtube.com/watch?v=j0JzpjINeE0]. Oh the memories, oh the sore eyes... x-)

  7. This would be such a nice game! Too bad the problem with the keyboard hasn't been solved for months...
    My down-key is not working (if I assign another key to the slow down-command, it doesn't work either). I've already posted a threat in the forums, but noone replied, what should I do?

  8. About the key problem, I haven't seen anything in the forum, do you have a link to the forum post?