July 26, 2013

Graphics 5-6: Anti-this, pro-that

Good evening our loyal supporters. Welcome back to the same spiderchannel, at the same spidertime.

This biweek's plan was MLAA and SSAO. In addition, some remaining issues with lighting were squashed, and some unplanned effects were added, business as usual.



Selective bloom, requested by artists, was added. Now you can mark objects as causing bloom, even if their native brightness is not enough to catch them in the normal siphoning.


The test object in this picture is not the best one; you will likely see this effect used in the green crystals of the mine, and the windows of the mansion. The aforementioned windows may get something else too, but that's a secret, so don't tell anyone.

This is done by keeping a list of all such marked objects, and rendering that list with color and depth writes off, only writing the stencil. This results in 4x-8x drawing speed, as most graphics cards have that path optimized for uses such as Z-prepass.

This stencil is then used to copy the affected pixels from the picture to the bloom texture.

The bloom threshold was made configurable per-track. As you can see in the picture, the left version is clearly not bloom over-use.


The right is the default threshold, left was altered a bit to show how glorious a bit of bloom looks like.



Lightmaps were enabled in the new system. Before, after, the map.




This is a splash of older tech, and quite laborous for the artists too. However four tracks currently have it, so it should be used there.



MLAA, or Morphological Anti-Aliasing, is a shader-based AA technique. This means it can run in situations where traditional MSAA cannot, or catch edges such as those inside transparent textures better. It can usually reach better quality at the same speed; the quoted numbers for this particular technique are MSAA 8x quality at MSAA 2x speed, but the exact specifics obviously depend on the scene.

While no longer the state of the art, Jimenez's MLAA was used due to the author's familiarity with the technique, and the very decent results it still achieves. The most common competitor, FXAA, is a bit faster, but results in inferior quality - it blurs textures noticeably, which should not happen.


A short overview of the technique follows. First, any jagged edges are detected and marked for processing using the stencil. This greatly improves the performance of the technique, since the heavy shader for the following pass is only run on the needed pixels.

In the second pass, each edge is classified according to a pre-generated map of edges. This map tells the following pass where to fetch information from - essentially, which pixels to combine and the weights for them.

The final pass simply overlays the combined pixels on the picture. Before - after:




Dear reader, meet JJ.


Have no fear though, the glistering brightness that fills you with awe is a bit toned-up in this picture; sadly, the version to make it in the game will be more subtle. This comes as a disappointment to fans everywhere, and there have already been mass protests of over a thousand people outside my residence.



Levels can now choose to have cloud shadows. This completely fake system only fits some levels with a bright, daytime sky with clouds, but there are a few of such levels, and it improves them quite a bit.




It's a fairly subtle effect, and so hard to catch from a still picture, but when moving in the wind it's quite nice.

Itching to hear more I bet? This very fake effect is done inside the sun light, as that's the only thing that should be affected by it. A straight planar top-down mapping is created based on the pixel's world-space position; it's offset by the wind, and then used for a look-up in a specially made texture.

The effect is quite cheap. It does come with the downside that as currently there is no shadow mapping, the cloud shadows (just like sunlight) are visible inside buildings, caves and such. This may limit the levels where this can be used a bit for now, but with the overworld and hacienda, those areas are few and having it there is not distracting.



Finally, SSAO, or Screen-Space Ambient Occlusion.



"Mama, I want contact shadows and dark creases" "Sure thing, hon"

Another fake effect with no real-world equivalent, SSAO usually enhances scenes quite a bit. It approximates global light bouncing by making approximate ray traces around a pixel, checking if those points can possibly occlude some light from reaching this pixel.

Like you can probably imagine, it is expensive. Several tradeoffs were made in order to get it to run on the majority of the user base; it should still look acceptable, while the FPS hit is much less than the usual implementation.

Low - high:
The high variant merely runs at full resolution, and no tuning has been done for that one, as the majority will run on low or off.

The low option runs on a quarter resolution, taking only a single sample per step, without randomness, or an edge-aware blur.

Each of the listed parts lessen its quality somewhat, while increasing speed. First of all, the lower resolution prevents small creases from being detected, limiting the effect only to contact shadows. The upside is that it's 16 times less area, directly resulting in a 16x speedup over the high version.

Second, being able to only take a single sample per step, the information we have to work with is the normal plus a linearized, 8-bit depth. This results in some false occlusions in case the occluder is outside the hemisphere; the alternative however, making positions available, would double the cost of this effect.

The randomness in SSAO is usually implemented via a small normal texture. Its cost (beyond flicker that always results from randomness) is that it changes the step's texture read to a dependent one - named using great wit since the coordinates depend on another texture read.

A dependent texture read is a costly operation, since it prevents caches from pre-fetching the actual read. The exact cost cannot be calculated in a highly parallel pipelined system, but assuming it delays each read by 300 instructions (a typical cache miss), a core clock of 400MHz, and 16 steps, each pixel is delayed by about 12 microseconds. The delay is highly variable due to inter-dependencies, so take that cost estimation with a pinch of salt.

The first couple programmable desktop GPUs could only prefetch textures based on coordinates from a varying (that is, export from the vertex shader) or a hardcoded value. This is still the case for many mobile GPUs, but any desktop GPU from about 2004 onwards supports a set of math operations, such as additions and multiplication, making our set of vectors completely pre-fetchable. Should the target be a mobile GPU, this limitation could be circumvented by making the calculation of those vectors in the vertex shader and passing them as varyings.

As for using a normal gaussian blur over an edge-aware one, it results in some of the occlusions bleeding over a bit, but is about 4x the speed of the alternative. A 2x component comes from the fact that the edge-aware blur cannot take advantage of the bilinear hardware, requiring it to do twice as many samples, and the other 2x from requiring a branch for each sample.


With this, we end today's infotainment. Tune back next time!

July 10, 2013

Graphics 3-4: light me up

Evening, our loyal followers. This bulletin comes a few days early, as no new tech is expected for the rest of the week, just testing and tuning.

These weeks were about lighting, and now we have a running light prepass system. It could be described as more of a hybrid, read on to find out the butler did it.



So what is light prepass?

In traditional forward lighting, you draw every mesh, and each mesh runs a loop over all lights to see if they happen to touch it. As you can imagine, this is inefficient the moment you have more than one light.

The first good solution to this, deferred rendering, draws the mesh data to a set of temp buffers containing the position, normal, and other values. Lights are additively blended to the main image based on the values in the temp buffers.

It works, but it's inflexible in the amount of material variation it can do. Light prepass was designed to be both scalable and flexible.

In light prepass, less mesh data is saved: normal and depth only. The lights are additively blended to a light buffer, then the meshes are drawn, and they decide what to do with the gathered amount of light.



Our new system is light prepass, but by default it avoids the double draw of each mesh, blending the lights with a standard screen quad.

The moment the flexibility is needed, a mesh can draw itself and interpret the light data as it sees fit; but by default, there is only one draw call, as most meshes use the light data in a standard way.




Here we have the first step, ambient light, working. Yours truly has a fascination with the color red; he thinks nothing more of it, perhaps he should.


It's a normal world. This is a new debug view, showing what a normal world looks like, on a normal day. It has already revealed issues with two karts, which would have made them react badly to light (think badgers and mushrooms).


The next frontier after ambient light, point lights. The picture is from a new old map, red light district, with kindly visualized lights.



The other topic for this biweekly session was rim lighting for the karts.


First, one needed to identify the karts, and paint them in this spring's fashionable colors. It may look like red to the untrained eye, but in reality it's a mix of Indian red, crimson, and salmon, with just a hint of blueberry and salvia.


Rim lighting in the graphical world is a trick with normals. You compare the surface's normal with the direction to the camera, and visualize the difference with suitable colors, giving each kart a bit of shiny edges.


That's it folks, a short post this time.

July 8, 2013

GSoC project: The network for you to play online !

Hello world ! my name is Robin Nicollet, alias hilnius, and I'm a GSoC student working on your favorite open-source game ;) I'm going to present you why I'm here and what I'll try to add to SuperTuxKart.

Abstract

First of all, I'd like to thank STK's mentors for choosing me, it's a great honor to work on that project, and I'll do everything possible to make this game as good as possible. I'd also like to say that the community that I found here is amazing, I feel that everybody is welcome here and that's great to have such a good atmosphere.

Presentation

Me

I'm a french student from Télécom-Bretagne, I just finished my first year in this school (2 more and i'll have a master). I've been programming in C then C++ for about 5 years. I know more programming languages but that's not really relevant for this project.

My project

My project is to bring to this game the main features that allow people to play together online. To be quick, that means:
  • Connecting people in LAN (Local Area Networks) or WAN (Wide Area Networks : all over the world).
  • Starting games and playing together.
One of the main features to play on WAN is the lag compensation, which I won't tackle here for timing reasons :) .

If you just wanted to know what i'll do, you're down at this stage.
In the next sections, I'll detail how the connection between people will work and explain some things about network that are necessary to understand my solution. I will talk about playing together later in the summer when that will be implemented and working ;)

Connecting people

Situation

You may or may not know, but SuperTuxKart do not own a private server and cannot deploy a proper and classic client-server architecture. That's why we will ask SuperTuxKart's player to host their own games. As a result, we will provide inside the game an accessible interface that will allow you to start a server on your own computer to host races.

NAT issues

In this part i'll explain details about how NAT (Network Address Translation) works so that you will have a better understanding of IP networks and how connection will work.
I will use NAT to name the devices that does the Network Address Translation.

Your local configuration

Your local configuration at home probably looks like this:

You are in the Local Network. You are behind a first NAT (it is probably a box that you either borrowed or bought to your ISP (Internet Service Provider). Then, your ISP has his own network, with a lot of devices. Then he has his own NAT.

You may wonder: what is Network Address Translation ?

Network Address Translation

(If you want a fully detailed answer, you can read the wikipedia article)
Network Address Translation is a process used first to extend the number of available IPv4 addresses. Your three computers probably have IPv4 addresses matching the formats 10.xxx.xxx.xxx, 172.16-31.xxx.xxx or 192.168.xxx.xxx.
This does not identify you uniquely over the internet, thousands of people have the same address. It identifies you uniquely only in the Local Network. You should also know that each interface of a connected device has its own IP. For example, your NAT has at least two interfaces : one in the LAN network, and one in the ISP network. Each one of them has its own IP address.
What your NAT does, is that each time you send a message on internet (get a web page, connect to a server, anything...), it changes the IPv4 address in that message to its own. A quick drawing will help me explain that :

Request

Your computer sends a message (get the google.fr webpage) and puts his own local address. Then all NAT that stands on the way of the message will "translate" this address. That means they will memorize the IP of the sender, and replace it by their own.
So for example, when the message is in the ISP network, the IP address of the sender written inside the message is the one of your Router/NAT. And when the message travels in the public internet, the IP address of the sender written in the message is the one of the ISP NAT.

Response

When the google server receives that request, it answers to the IP written in the request (the one of your ISP NAT).
But the ISP NAT knows who sent that message in the first place (10.2.3.4). So the ISP NAT will send the response to 10.2.3.4. Same for your NAT/Router, which knows that it's you who made the request (and then doesn't send the response to your computers 1 and 3).

Conclusion ?

You can now understand that if google sends a message to your public address (that is the one of your ISP NAT), that ISP NAT won't know that you're the one who should receive the message, and then cannot route the message.
As a result, if you try to send a message to anybody who is not directly in the "public internet", he won't receive your message.
The solution to that is the STUN protocol and will be detailed in the next section

The STUN Protocol

This protocol carries well its name, because STUN means Simple Traversal of UDP through NATs. Our goal is indeed to establish a connection between two users, let's say Alice and Bob, each of who are behind their local NAT and their ISP NAT. You can find a complete RFC of the protocol here.

How stun works

STUN is very simple : there are STUN servers in the public internet. You send a STUN request to one of those servers, and it will send you a STUN response telling you your public address. If you understood well how address translation works, you'll notice that when the STUN server receives your request, the sender's IP address is your ISP NAT's one. In the response, that server will tell you : "you have a public IP address that is A and a public port which is P.".

If you're asking yourself what is a port, you can see it as how the NATs identify you. If you send a message to the STUN server, the ISP NAT will give you a port number and put it in the message. Then the STUN server will respond to the couple (sender's ip, sender's port). The ISP NAT will know that the port has been allocated to you and will send the response to your local NAT.

As you see, NATs use "NAT tables" to have ip:port bindings and know where they have to forward packets.
You can see that the STUN response received by your computer contains the ISP NAT's IP address and port number.
A detail worth mentioning: some NAT, called symmetric NATs, allocate a specific port for each connection, and do not allow this protocol to work. It's basically a security feature. So I'm sad to tell that it's very likely that you won't be able to play SuperTuxKart online in a company environment (symmetric NATs are used usually by companies).

Imagine now, Alice and Bob want to communicate. Alice and Bob obtain their own public address (which is their ISP's NAT IP address). Now Alice calls bob with her phone and says "hi bob, my public address is A-A and the associated port is P-A". Bob answers and says "Hey Alice, my public address is A-B and the associated port is P-B". Then if Alice sends a message to the address A-B on the port P-B, Bob will receive the message, because his ISP NAT will know that this IP:port couple is bind to Bob's local NAT/Router. And Bob's local NAT/Router will know that the message he's receiving from the ISP NAT is meant to be delivered to Bob's computer.
So Bob and Alice can now talk :)

The only difference in the way STK will work is that instead of using phone calls to exchange public IP addresses / ports, we'll use a SQL database.

The connection protocol in SuperTuxKart

So when you'll want to play online, you'll query a STUN server, obtain your public IP address/port, then you will publish that in a SQL database. After that, you will put in that database which server you want to join. You also get the server's public IP address/port from the database and start sending messages to it.
This server will check regularly in the database who wants to join him. When he knows that you want to join him, he will get your public IP address/port, and send messages to it. After both you and the server have sent at least one message to each other, NATs will have the good entries in their NAT tables to allow a UDP exchange between you and the server.

After that setup, you will be connected to the server and you will be able to play races with other players that are connected to this server.

Conclusion

I hope it wasn't too long, and that you have now a good understanding of why this is necessary to work this way, and why people cannot connect directly to each other.
The networking part is designed to allow people to setup game servers on real servers, so I'm glad to announce you that at the end of summer you will probably be able to run your own SuperTuxKart server !!

Bye and see you online ;) !