Ray Tracing

## Introduction

As a graphics programmer, I felt that this was really missing from my portfolio. The goal of this project was to build a simple ray tracer that could be ported to CUDA, eventually. In order to keep my iteration times in a decent range, some optimizations would have to be done during development. These optimizations will include BVH construction, multithreading, and storing data in a cache friendly way.
There were some optimizations to consider that would pre-calculate data. However, GPUs do not benefit from reading pre-calculated data at all. Due to this fact, I have decided to not do these optimizations.
The first thing I will talk about is the ray tracing algorithm itself. It is a straightforward algorithm without many caveats. Secondly, I will briefly discuss multithreading, and finally, I will talk about my BVH implementation.

## Whitted-style Ray Tracing

This algorithm was described by Turner Whitted and is therefore given his name. He wrote his paper in 1980 where he proposed ray tracing to simulate complex refractions and reflections.

Turner’s proposal looks something like the following pseudo code:
for(each Pixel in Viewport)
{
//create a primary ray
Ray primaryRay = GeneratePrimaryRay(Pixel.x, Pixel.y);
renderTarget[x][y] = Trace(primaryRay);
}

Color Trace(Ray ray)
{
Intersection out_primaryIntersection;
if(IntersectScene(primaryRay, out_primaryIntersection))
{//find closest intersection
//for diffuse materials we calculate direct illumination
if(out_intersection.materialType == DIFFUSE)
{
Color final;
for(each Light in Scene)
{
{//no intersections means the point is directly illuminated
final += (Light.color * out_primaryIntersection.color);
}
}
return final;
}
//for glass we recursively trace 2 new rays
else if(out_intersection.materialType == GLASS)
{
//use dielectric laws to calculate coefficients and new rays
Ray refractiveRay = GenerateRefractiveRay(refractiveIndex);
Ray reflectiveRay = GenerateReflectiveRay(refractiveIndex);

//recursion
if(!TotalInternalReflection)
return Trace(refractiveRay) + Trace(reflectiveRay);
else
return BLACK;
}
}
//no intersection, intersect skybox
}

For every pixel in the render target we generate a Ray, we find the closest primitive for that ray, and test for direct illumination if the intersected primitive is diffuse/opaque. If the intersected primitive is made of glass we recursively trace two new rays for the refracted and reflected light.

Multithreading the Whitted style ray tracing algorithm is ridiculously easy. There is no communication required between the threads whatsoever. One would simply generate a buffer with rays and have worker threads fetch rays from this buffer to trace, writing the results to the corresponding pixels in the back buffer. This setup follows a simple supplier/consumer pattern. The main thread will stall until the worker threads have finished processing all the threads in the queue.
c
Figure 1: Provider/Consumer Pattern

## Local Space Intersections

The most common way of testing triangle-ray intersections would be to generate the ray in world space, transform the vertices of the triangle to world space and test for intersection. I, however, decided to do my intersections in local space. This has multiple benefits. First, instead of having to transform every vertex in our mesh to world space (4 dot products per vertex), we only have to transform our ray to local space once. This saves us a lot of calculations. Secondly, when we add BVHs to our tracer, we can generate the BVH nodes in local space as Axis Aligned Bounding Boxes and do cheap intersections tests with them. There is no need to transform the BVH nodes AABBs to world space. Allowing us to not need to recalculate our BVH when we transform our meshes.

We do have to take care; the local spaces of two seperate objects are not guaranteed to be linearly comparable. Scaling causes the units in local spaces to be different. Therefore, when we do find an intersection with a mesh we have to transform the intersection data back into world space before comparing it with the intersection data of other meshes.

## Bounding Volume Hierarchy

A bounding volume hierarchy is an acceleration structure that we will use to reduce the number of primitive intersection tests. The main idea is to recursively split up a set of primitives over an axis until we are satisfied with the number of primitives in a single node.

#### Construction

A good BVH minimizes primitive intersection tests as well as node intersection tests. A good BVH will tightly encapsulate the geometry, reducing the change of a node being hit by a ray. From this we can conclude that the surface area of a node is proportional to the chance of a ray hitting that node. We can now calculate the cost of a split using the following formula: $$A_{left} * N_{left} + A_{right} * N_{right}$$ Where A is the surface area of a node and N is the number of primitives in a node. The lower the cost, the better the split. If the cost of a split is higher or equal than the cost of the parent node we stop splitting.
The next question we ask ourselves is, where do we split? We could consider every primitive centroid as a possible splitting plane. This would mean we would have to consider 3(N-1) configurations per level of the tree. For a large number of primitives this can be quite a big number.
An alternative would be to pick an axis at regular intervals, this is called binned splitting, and I have chosen to use this method. The number if bins to consider, per axis, is tweakable using a macro.

Figure 2: left: 32 binned split, right: median split;
Green indicates low amount of tests, red indicates high amount of tests

#### Storage and Access

We designed our BVH to be both usable on the CPU and GPU. This meant not being able to use pointers. Instead we use indices in to a node buffer. Nodes store how many vertices they hold, and either the index to their right neighbor or the index of the first primitive index. The left neighbor will always be the next node in the buffer. This results in a flat tree in our buffer which allows for very cache friendly access of our nodes.
struct BVHNode
{
AABB aabb;
uint32_t vertexCount;
union
{
uint32_t right;
uint32_t vertexStart;
};
};//32 bytes

struct BVH
{
Vertex* vertices;
uint32_t vertexCount;
BVHNode* nodes;
uint32_t nodeCount;
};//24 bytes, probably padded to 32 bytes

The resulting size of our BVHNode struct is 32 bytes, this allows two structs to be stored in the same cache line! When traversing our “tree” we check if a node is a leaf node by reading the vertexCount, if it is not 0 the node is a leaf node and we test all the triangles in it for intersection. If the vertexCount is equal to 0 we recursively test the left child(the next node in the list) and read the right child index so we can test that node, and its children, when we return.

Figure 3: BVH tree visualized, nodes are numbered according to a Recursive depth-first access pattern.

Figure 4: Flat BVH visualized, nodes are stored in memory in order of likely access.

## Dielectrics

Whitted style ray tracing was specifically proposed for light transport via other media then air. When a ray hits a glass object two new rays are generated and recursively traced. The amount of light that is reflected can be calculated using the Fresnel equation. However, the Fresnel equation is quite heavy. Luckily for us Mr. Schlick has come up with a cheaper approximation: $$R(\theta)=R_{0}+(1-R_{0})(1- cos⁡(\theta)^5$$ $$R_0=\frac{(n_1-n_2 )}{(n_1+n_2)}$$ Where theta is the angle between the surface normal of the primitive and the direction of the incoming ray. $$n_1$$ and $$n_2$$ are the refraction indices of the two materials.
Using the resulting coefficient we can scale the results of the trace of the reflected and refracted ray.
The directions of the reflected and refracted rays can be respectively calculated via the following formulas:
$$r_{reflected}=d - (d \cdot n) n$$ $$r_{refracted}= \sqrt{ 1-(\frac{n_{1}}{n_{2}})^2 (1-(cos(\theta)_{1} )^2 ) }$$