July 7, 2024

Why average FPS and 1% low FPS are misleading

A new benchmarking mode

Until now, SuperTuxKart didn’t have an easy way for anyone to test the performance of the game on their system :

  • Enabling the integrated FPS counter could give an estimate, but it required work from the user and suffered from limited data provided to the user and repeatability issues.
  • Using the command line to run a race with AI karts. This is used for example in the OpenBenchmarking test suite but also suffers from issues we will return to later.

This changes with the upcoming 1.5 release.

It will be possible, with a few clicks, to run a fast test and receive a summary of the performance at the current settings, or to run a more comprehensive test and receive recommanded settings. More on this at the end.

A question I asked myself while designing this mode was: "Which metric should be used to evaluate performance?"


Average FPS and the problem of smoothness

Games create the illusion of continuous movement by presenting a rapid succession of static images. More images in the same time frame means that the game looks and feels more fluid to the player. It also means that the result of the player’s inputs is visible quicker: the game feels more responsive.

The most obvious and most common metric to quantify this is the average "Frames per Second", commonly known as average FPS. It is a metric that has been around for a long time and that is easy to intuitively understand: it tells us at what rate the CPU and GPU can produce new images of the game’s state, updating what is shown on the screen to the player.

A single number, easy to compute, easy to understand: up to this day, it is by far the most common metric used to test the performance of different settings or hardware.

But as readers familiar with the topic already know, average FPS is a deeply flawed metric.

A constant, smooth, 120 average FPS offers a superior experience to a constant smooth, 60 average FPS. But in most games, and especially so on personal computers where the hardware can range from very low-end to the high-end, framerates are not constant.

Instead, some frames take more time to compute than others. Perhaps the game is trying to render a more complex scene. Perhaps more data needs to be moved from memory into the CPU or GPU computational cores, while on other frames most needed data is already present in cache. In online multiplayer, the game might need to do more calculations to update the game state based on data received from the server.

This means that the same average FPS can correspond to very different realities, from a fluid experience to an unplayable experience marked by heavy stuttering.

Variation in frame duration leads to perceptible stutters

This also means that a higher average FPS may correspond to a worse player experience if the duration of frames are less regular.


1% low FPS: a partial solution

1 % low FPS is the most common solution used in benchmarking to avoid the pitfall of misleading FPS averages. Instead of looking at the average of all frames, it sorts all the frames recorded during the test, then takes the slowest 1 % , and computes the corresponding average.

If a game has very regular frame durations, the 1 % low value will be very close to the average. If, to the contrary, frame durations are all over the place, the 1 % low will be far below the average. A large number of online reviewers make use of this metric, and it provides an obvious improvement.

Here is the same underlying data as in the previous chart, but with frame durations sorted and the frames used to compute the 1 % low highlighted:

Irregular frametimes get a worse 1% low result

The difference between the smooth 60FPS average and the choppy 60FPS average is correctly reflected, with a massive difference between 60FPS (1 % low) and below 20FPS (1 % low).

But while unquestionably better than the average, 1 % low is also a flawed metric when it comes to being a measure of the most important parameter – player experience. The determining factor to player experience is how slow are the slow frames.

Isn’t that precisely what 1 % low is about ? Yes, and no ! Unfortunately, the result is dependent on the total number of frames, and an increase in the number of quick frames will increase the number of frames among the 1 % slowest.

The following chart shows how two series, having the same number and duration of slow frames, and therefore the same stuttering and poor player experience, can receive different 1 % low scores with a different total number of frames:

The very quick frames between stutters don't improve player experience

A new metric

When trying to boil down a large set of datapoints into a single number, some of the information contained therein is inevitably going to be lost.

A naïve metric that wouldn’t fall into the pitfall we noticed for 1 % lows is minimum FPS : take the single slowest frame in the benchmark, and compute the corresponding FPS.

But it suffers from its own issues : minimum FPS tends to vary significantly when repeating tests, and a single frame out of thousands is also a poor representation of the overall subjective experience.

The crux of the issue with metrics such as 1 % low is that the importance of each frame is weighted based on the total number of frames. In the previous chart, the 3 slowest frames represent respectively 1 % and 0.75 % of the total number of frames, but they represent 3.46 % of the entire duration of the test in both situations.

The set of frames used to compute metrics meant to showcase how smooth a game is should be based on a fixed proportion of the total benchmark time, not on a fixed proportion of the total number of frames.

Measuring the average FPS corresponding to 3 % or 5 % of the benchmark time would be one way to do it. Because the required computations are very quick for a computer, our integrated benchmark asks two slightly different questions:

  • How much time is spent in frames slower than needed to reach a target framerate?
  • How much excess time is spent waiting?

Let’s take the following frame-time chart:


By asking the first question, we can produce this much easier analyzed second chart:


Easier to parse than a full frametime chart, it contains key informations about performance levels and is robust against distortions induced in average FPS and 1 % low FPS by more fast frames. Here, shifting the curves towards the right and higher target FPS requires improving the slow and average frames.

Looking at excess time offers a slightly different picture. Let’s take a target FPS of 100. This translates to a target frametime of 10 milliseconds. Excess time would be, in this example, all the time spent in a frame beyond 10ms. This metric penalizes relatively less minor misses and more major misses.


Conveying this information to the player

The full curves are interesting when trying to analyze the performance characteristics of the game, of particular settings or of a particular track, especially when testing code changes. But simple numbers also have their use.

The player running the benchmark needs a few numbers that can guide his settings choice or help him compare the performance of his system with others. STK’s benchmark offers three numbers:

  • Steady FPS : Defined as the highest target FPS with less than 1 % time spent in slow frames and less than 0.1 % excess time. In the previous chart, it is at 35. This metric is a target for the player prioritizing the avoidance of any stutter.
  • Mostly Steady FPS : With less than 12 % time spent in slow frames and less than 2 % excess time, it offers a good indicator for most players wanting a mix of performance and eye-candy. In the previous chart, it is at 38.
  • Typical FPS : With less than 50 % time spent in slow frames and less than 10 % excess time, it produces values that are usually close to the average FPS, making it useful to compare performance with other games. In the previous chart it is at 40, while the average FPS is at 40,7. However, highly irregular frametimes will produce lower values than with average FPS, making it more robust.

SuperTuxKart’s new benchmark mode in action

To ensure consistency and repeatability, SuperTuxKart’s integrated benchmark use a replay and the game’s own profiler mode, which records very accurate frametimes. It then displays a basic summary screen, and allows to save a report containing more detailed results.

If the player activates the pause menu mid-benchmark, it will keep running unless the player choses to exit the benchmark completely.

The replay allows different benchmark runs to last the same duration and to show the same scene. This makes results much more consistent over the player driving by himself or Ais racing around, as done in the OpenBenchmarking test.

A replay doesn’t perfectly represent a real race, but it’s close enough, and the integrated benchmark uses the most demanding track in the standard release to present a worst-case scenario. Performance on any other standard track will be similar or better (often significantly so).

Three main operations determine how good STK’s performance is:

  • Parsing the scene and sending draw calls to the GPU. This is CPU-intensive.
  • Rendering the scene. This is GPU-intensive.
  • In online multiplayer, replaying game events to synchronize the local state with the server state. This is CPU-intensive, because the game needs to simulate physics and other events between the time at which the server sent a state update and the current local time.

The benchmark stresses the two first operations and is representative of single-player performance. Online multiplayer is typically more CPU-demanding.

In SuperTuxKart 1.5, it will be possible to set the integrated frame-limiter to various values, including 1000 which is practically speaking unlimited. This is another advantage over the methods OpenBenchmarking currently has to use, where the maximum FPS cannot exceed 120.

April 27, 2024

On the way towards 1.5... and 2.0!

Some history

SuperTuxKart 1.0 was released 5 years ago, on the 20th of April 2019. With the introduction of online multiplayer, among many other changes and improvements, it marked a pivotal moment in the history of the game.

Afterwards, it was decided to commit to a policy of version compatibility: instead of making breaking changes that would prevent online play between different 1.x versions or would make it impossible to compare records between versions, game mechanics and tracks have been kept functionally identical, with very few exceptions (such as expanding some checklines).

Instead, the focus of development has been centered on improving all the other aspects of the STK experience. Releases from 1.1 to 1.4 have featured, alongside numerous bugfixes and UI improvements, many significant enhancements.

Some of these were very visible, such as the release of new arenas and kart models, all tracks included inside mobile releases, the introduction of render resolution scaling, a basketball minimap indicator, a brand new cartoon skin, customizable font size, new respawn animations of track items, etc.

Others have been more discreet but no less important, such as support for IPv6 servers or the adoption of SDL2 (improving compatibility with gamepads, simplifying ports to new platforms and more). The overall changelog contains more details.


SuperTuxKart 1.5 is coming this summer

A year and a half after the release of version 1.4, SuperTuxKart 1.5 is finally around the corner. The list of changes that will be included is not yet finalized, but it brings some notable improvements, such as:

  • Allow to configure STK’s maximum FPS within the game’s options, instead of being limited to a maximum of 120 (30 for Android). The game’s physics, running independently from the framerate, remain at 120FPS.
  • New LoD (Level of Detail) settings to reduce "popping" for a minor performance cost. As STK is fairly easy to run on modern systems, this is an appreciable quality improvement.
  • A new benchmark mode
  • Finer control of the game’s audio levels
  • And of course the usual minor bugfixes and improvements
The new framelimiter configuration and integrated benchmark

There are always many further enhancements and fixes that we would like to see in the next release, but the STK team wants to bring all the existing improvements to the players sooner than later: the release of STK 1.5 is expected this summer.

But this is not the most significant announcement in this blog post.


SuperTuxKart 2.0: an ambitious project

After the release of 1.5, development efforts will officially switch towards SuperTuxKart 2.0, featuring major gameplay changes and more significant track updates than any single STK release in the last fifteen years.

A 1.6 release featuring some backported bugfixes and improvements may happen, but providing a new major release will be the project’s priority.

The 2.0 version will feature a minimum of five new high-quality standard tracks, improved versions of all existing tracks, alongside considerable changes and improvements to all the elements of gameplay.

A substantial proportion of these changes already exist in an experimental branch I created at the end of 2023, coming back from a long break, to start implementing and testing them. This branch, which will be merged in the main development codebase after the release of 1.5, already features major revisions, such as three brand new powerups and a new kart class. Future blog posts will elaborate on the various changes and features coming for 2.0.

An early showcase of one of the new powerups


On the track-making side, efforts are led by Sven Andreas Belting, our lead artist and designer of Black Forest. He is working on several new track projects (many already quite advanced), as well as updates to several tracks, be they graphical or, in coordination with myself, related to gameplay. Typhon306 is also providing significant help.

A preview of two of the new tracks


Our aim is to significantly raise the standard of visual quality for tracks included in SuperTuxKart 2.0, while also providing improved polish and gameplay.

As versions following STK 2.0 will also follow the compatibility principle, we aim for a particularly high-quality for everything that will remain the same in 2.1 and other 2.x releases. Beyond the kart properties, physics, powerups, and obviously standard tracks, this includes things such as a new AI, more fun to play against, and a brand new Story Mode. Nonetheless, we would also like to lift other parts of the game; for example better performance from the engine and progress on the Vulkan renderer, improvement of non-racing modes, refreshing the GUI, featuring some new musics, or an improved addons system.


A labor of love

We are very excited for this, in the genuine sense of the word exciting, that soulless corporate-speak sometimes makes us forget. It is difficult, however, to provide a solid estimate for when this major release will happen. The SuperTuxKart project is alive, has a clear vision and is moving towards some important milestones, but it is also a project of passion carried by a small team.

It is still moving forward because contributors (most having spent more time playing the game than we would like to admit) want it to become better. Its free software licensing allows interested people to contribute.

But this also means that, unlike commercial games, the STK project cannot pay dedicated full-time developers and artists. It often feels better to do something for free, willingly,  for the sake of a work well done, than to do it for a paltry amount of money; and recruiting freelancers to do some things while key contributors investing much more time get nothing also seems wrong. Raising enough funds from the community to truly change this approach has always seemed too far-fetched.

The story of STK is that of the successes of free software games, but also of their struggles.

Hence, to make SuperTuxKart a better game and to reach the 2.0 release sooner, the most important thing we need are new contributors, be they helping with a few specific things or with many. Artists who could contribute new textures, 3D modelling, music, icons; developers who could implement features, fix bugs, improve the code structure or work on the engine; contributors willing to improve documentation, create content advertising the game or meticulously report bugs: there is no shortage of things that could help.


Final words

This blog featured no updates since 1.4’s release, as the activity around the project slowed down until my comeback, and other contributors of the project were not inspired to write. We want to avoid such a hiatus happening again.

I will try to provide in this blog, every one or two months, an update on SuperTuxKart’s progress: future plans, detailed presentation of a new feature, etc.