Kabounce, Optimizing for UE4
Stitch Heads needed someone to look at their game and find out why their game was not running as fast as expected. I signed up for this job and they hired me. This was not the first time I worked with the unreal engine. However, it was the first time I had a look at the graphics pipeline and profiling tools. I must say that I quite like EPIC games systems.

Rendering Techniques

I knew that the unreal engine uses a deferred rendering method by default. Deferred rendering is usually more suitable for games as it allows for a higher number of lights in a scene. There are a couple of downsides:

  • No alpha blending(transparency)
  • No hardware Anti-Aliasing(MSAA)
  • High memory footprint
Alpha blending can be added by doing an additional forward rendering pass for just the transparent objects, which UE4 does. Anti-aliasing can be achieved through techniques like FXAA or Temporal AA, both are supported by UE4. Due to the nature of the algorithm, you have a large memory footprint. There are optimizations that can reduce the footprint, but it is inherent to the technique.

Forward Rendering is not suitable for scenes with many lights. It does, however, support alpha blending and hardware AA. Furthermore, it does not suffer from a high memory footprint like deferred rendering.

After having a first look at Kabounce I noticed that the scenes only have one stationary light, the skylight. Having only one light in a scene completely removes the need for a deferred rendering technique. Forward rendering will even be faster. Luckily for us, UE4 supports forward rendering since 4.13(initial implementation) and 4.14. Enabling forward rendering did break the reflection captures. But, enabling High-Quality Reflections in the materials fixed this problem. Furthermore, SSR is not supported but we will not be needing it in Kabounce.


Culling prevents invisible geometry from being rendered, saving us precious cycles. Different forms of culling exist including frustum and occlusion culling. Frustum culling rejects all geometry that lies outside of the frustum. Occlusion culling rejects all geometry that is occluded(completely behind) by other geometry.

Culling, especially occlusion culling, can be expensive to calculate. Unreal supports 2 algorithms for occlusion culling, occlusion queries, and Hierarchical Z-Buffer occlusion culling.

Occlusion queries work by sending the mesh to the GPU and rendering it. However, instead of showing the result on the screen we ask the GPU how many pixels were affected by the draw call. If this number is below a certain threshold we reject the mesh. Getting the results back to the CPU takes time though and we will have to stall and wait for the results. This is very undesirable and will lead to dramatic FPS drops. Instead, we just continue with our frame and will get the results next frame and use them for culling there. This will mean that our culling will lag behind one frame and could cause popping if the camera moves at high speeds.

The second algorithm is Hierarchical Z-Buffer (Hi-Z) occlusion culling (OC). Hi-Z OC uses a mipmap chain created from the depth buffer to determine if an object is occluded or not. Hi-Z OC has a lower CPU and GPU cost than Occlusion Queries in exchange for more conservative results. The levels in Kabounce are sparse geometry wise, so there will not be a lot of occluded geometry. Therefore, I have opted for Hi-Z occlusion culling due to its relatively low cost.

Post Processing

Post processing is done after rendering the entire scene and is generally used for adding screen space effects. One of the effects added by post-processing is Fast Approximate Anti Aliasing(FXAA). This is useful in Deferred Rendering but is not needed in forward rendering as we can use MSAA. Bloom is added in post-processing and is a vital part of the look/art style of Kabounce.
Screen Space reflections barely influenced the scene and its effects were not visible when playing the game, hence I turned it off completely.
Ambient occlusion also had little to no effect on the, mostly, black geometry of the game scenes, away with it.
Turning off all these post-processing options reduced its overhead by quite a bit with little to no visual effects on the game.

Reducing Memory Footprint

One of the things the team leads were worried about was the memory footprint of the textures they were using. I did find some unnecessarily large textures but one quick heads-up to the artists fixed this.
I did notice that the texture compression settings on a lot of textures were set to SRGB or R8G8B8A8. This is a texture format generally used for mobile platforms where texture decompression is not an option. I decided to educate the team on texture compression formats and explained to them to different types of DXT compressions available in UE4.

Material Optimizations and More

I did not have enough time left to personally go through all the materials and look for expensive operations and algorithmic improvements. Thus, I decided to teach the team how to optimize their materials themselves. I did this by preparing a simple lecture explaining to them the ways of making code faster. Afterwards, I gave them a cheat sheet to help them improve their materials.

One final thing I noticed on my last day there was that some meshes generated by blueprints were set to dynamic lighting, this caused some unnecessary overhead as there were no dynamic lights available in the scene.
Figure1: Un-optimized profiling data

Figure1: optimized profiling data