Resources


Highlights

Summary

In this group project, we studied the performance of a parallel ray tracing application, from CPU-based implementations to CUDA-accelerated ones.

Starting with a naïve CPU renderer, we progressively optimized the application using the right compiler flags, bounding volume hierarchies (BVH) and smarter thread scheduling strategies.

Then, we ported the C code to CUDA, a task that required the conversion of recursive functions and a completely different thread organization.

Finally, we optimized the CUDA code by introducing a tile-based thread scheduling and better data alignment, culminating in a renderer capable of achieving 65 FPS at 1080p, roughly 28 times faster than the best CPU implementation.

Pictures

An example of rendered scene

An example of rendered scene

A visualization of the ray-triangle intersection process

A visualization of the ray-triangle intersection process

Speedup of different compiler flags with respect to GCC’s default (CPU’s naïve version)

Speedup of different compiler flags with respect to GCC’s default (CPU’s naïve version)

Speedup of every improved version we developed with respect to the first one

Speedup of every improved version we developed with respect to the first one