Motivation
The aim of this project was to implement a fully dynamic frustum and occlusion culling algorithm. I have recently done a Light Propagation Volume implementation and feel that various passes in this technique could benefit from culling quite heavily. It is mainly the Reflective Shadow Map passes, one for every directional light, that could benefit from occlusion culling. The pre-depth pass and the forward pass would benefit from both frustum and occlusion culling as well. Aside from the above-mentioned reasons, frustum and occlusion culling algorithms are vital parts of modern renderers. Knowledge of how to implement and use them is almost mandatory for a modern graphics programmer.
Download Source: culling_release.7z
Frustum Culling
Frustum culling is a technique used the cull geometry that lies entirely outside of the frustum, and therefore cannot be seen by the camera.
This is done by calculating the six planes of the frustum, calculating a bounding box of a mesh and checking if all 8 points of that bounding box lie entirely outside of the frustum. If all the points lie outside of the frustum the mesh cannot be seen and should be culled.
Calculating the six frustum planes is done by extracting them from the MVP matrix. Extracting from the MVP matrix allows us to do the frustum-box intersection test in local space. This is perfect for us since our AABBs are already in local space.
Extracting the frustum planes
Let \(v = (x, y, z, w = 1)\) be a vertex and let \(M=(m_{ij} )\) be a 4x4 MVP matrix. Transforming \(v\) by \(M\) results in the transformed vertex \(v'=(x',y',z',w')\). And could be written as:
\begin{equation}
v^{\prime} = (v \cdot col_{1}, v \cdot col_{2}, v \cdot col_{3}, v \cdot col_{4})
\end{equation}
After this transformation \(v'\) is in homogenous clip space, where the frustum is an Axis Aligned Bounding Box. The size of the AABB frustum is API specific but we will be focusing on DX12. If \(v'\) is inside this frustum, \(v\) is inside the
untransformed frustum.
We can test if \(v'\) is inside the frustum by checking the following inequalities.
\begin{align*}
-w^{\prime} & < x^{\prime} < w^{\prime} \\
-w^{\prime} & < y^{\prime} < w^{\prime} \\
0 & < z^{\prime} < w^{\prime} \\
\end{align*}
Now, say we would like to test if \(x'\) is in the left halfspace of our frustum. \(x'\) is inside the half space if:
\begin{equation}
-w^{\prime} < x^{\prime}
\end{equation}
We can now rewrite this inequality to:
\begin{equation}
-(v \cdot col_{4}) < (v \cdot col_{1})
\end{equation}
In turn we can rewrite this to:
\begin{equation}
0 < v \cdot (col_{1} + col_{4})
\end{equation}
Finally, we can expand this to:
\begin{equation}
x(m_{14} + m_{11}) + y(m_{24} + m_{21}) + z(m_{34} + m_{31}) + (m_{44} + m_{41}) = 0
\end{equation}
This is exactly how we would represent a plane using the plane equation:
\begin{equation}
ax + by + cz + d = 0
\end{equation}
We can now extract all six frustum planes from the MVP matrix and use the to check if all the eight points of a bounding box are outside the frustum. If all eight points are not in the frustum we do not render the mesh.
Hierarchical Z Buffer Occlusion Culling
Occlusion Culling is used to determine if geometry is entirely hidden by other geometry. If it is, we cannot see the triangles belonging to the geometry and should not spend time shading them. Visibility is determined by sampling depth data from a mipmap chain created from a depth buffer. This depth buffer could be rendered in a pre-pass or the depth buffer of the previous frame could be re-used.
Figure 1: Mip map chain of a depth buffer
Hi-Z Map construction
Regular mipmap chains are created by sampling by sampling all corresponding pixels from the previous level(N-1) and taking a weighted average of the samples. In the case of the Hi-Z Map we do not take the weighted average, but we take the highest value of all the samples instead.
Figure 2: Left: weighted average. right: maximum.
Special care must to be taken when dealing with odd sized textures. An extra sample is needed to accommodate the last pixel in the row and/or column of the buffer. We have written a compute shader that will do the sampling for us and will simply do a dispatch for every level we wish to down sample.
Using the Hi-Z Map
To use the Hi-Z Map for occlusion culling we first must calculate a bounding box around the geometry we wish to test. We transform this bounding box into NDC space and use the UV coordinates of the four corner points to sample from our mipmap. We calculate the mip level to sample from using the formula:
\begin{equation}
ceil(log2(max(NDCBoxDimensions.x,NDCBoxDimensions.y)*0.5f))
\end{equation}
Finally, if the highest value we sampled is bigger than the depth of the primitive we are testing, the primitive is occluded and should be rejected.
Execute Indirect
Execute Indirect is a new feature in DirectX12. It allows the user to setup a buffer with arguments that can be used for draw calls later. The data in this buffer can be altered by a compute shader, effectively allowing for the culling of geometry on the GPU.
This is a huge benefit because we do not have to stall the CPU to wait on the result of the culling pass on the GPU.
My Execute Indirect Argument Buffer is formatted like this:
struct ForwardIndirectCommand
{
D3D12_VERTEX_BUFFER_VIEW vertexBufferView;
D3D12_INDEX_BUFFER_VIEW indexBufferView;
D3D12_GPU_VIRTUAL_ADDRESS constantBufferVirtualAddres;
D3D12_DRAW_INDEXED_ARGUMENTS drawIndexedArguments;
float padding;
};
The last member, drawIndexedArguments, of this struct contains the arguments required by the DrawIndexedInstanced function that is normally called.
typedef struct D3D12_DRAW_INDEXED_ARGUMENTS
{
UINT IndexCountPerInstance;
UINT InstanceCount;
UINT StartIndexLocation;
INT BaseVertexLocation;
UINT StartInstanceLocation;
} D3D12_DRAW_INDEXED_ARGUMENTS;
The other members of this struct specify the vertex, index and constant buffers used by the draw call.
We can now prepare a buffer filled with all the data used by all the potential draw calls. The beauty of this buffer is that we can still change the data in the buffer before the data is consumed. In our culling pass we determine if a mesh is visible or not, using both frustum and occlusion culling. If the mesh is not visible we set the instance count to 0, effectively not rendering the mesh in that draw call. The user is required to setup a command signature. This command signature is later used by the API to interpret that data in the Argument Buffer. Setting up this signature is fairly straight forward and follows the DX12 pattern of providing description structs to a create function.
Download Source: culling_release.7z