Coherency gathering in ray tracing: the benefits of hardware ray tracking

Added: Over a year ago by Imagination Technologies

Despite the theoretically infinite ways to implement a modern GPU, the truly efficient ways to make one come to life in silicon tend to force the hands of those making them for real. The reality of manufacturing modern high-performance semiconductors, and the problem at hand when trying to accelerate the current view of programmable rasterisation, have uncovered trends in implementation across the GPU hardware industry.

For example, SIMD processing and fixed-function texture hardware are a cast-iron necessity in a modern GPU, to the point where not implementing a GPU with them would almost certainly mean it wasn’t commercially viable or useful outside of research. Even the wildest vision of any GPU in the last two decades didn’t abandon those core tenets. (Rest in peace, Larrabee).

Real-time ray tracing acceleration is the biggest upset to the unwritten rules of the GPU in the last 15 years. The dominant specification for how ray tracing should work on a GPU, Microsoft’s DXR, demands an execution model that doesn’t really blend in with the way GPUs like to work, giving any GPU designer that needs to support it some serious potential headaches. That’s especially true if real-time ray tracing is something they haven’t been thinking about for the last decade or so and here at Imagination, we have been.

The key ray tracing challenges

If you make your way through the DXR specification and think about what needs to be implemented in a GPU in order to provide useful acceleration, you’ll quickly tease out a handful of high-level themes that any resulting design needs to address.

First, you need a way to generate and process a set of data structures that encompass the geometry, to allow you to trace rays against that geometry in an efficient manner. Secondly, when tracing rays, there’s some explicit user-defined programmability that can happen when the GPU has tested whether a ray intersects with it or not. Thirdly, rays being traced can emit new rays! There are other things that a DXR implementation needs to take care of, but, in terms of the big picture then that trio of considerations are the most important.

Generating and consuming the acceleration structures for efficiently representing geometry that rays need to be tested against implies a potentially brand new phase of execution for the GPU to complete, and then we need to execute a brand new type of work primitive that processes those acceleration structures, test if they hit, and then do something under programmer control if they do or not. And GPUs are parallel machines, so what does processing a bunch of rays together mean? Does doing so uncover new challenges that are substantially different to those baked into the traditional parallel processing of geometry and pixels?

The answer to that last question is a resounding yes, and the differences have a profound effect on how you want to map ray tracing onto a contemporary model of GPU execution. Those GPUs have an imbalance of computational and memory resources, resulting in memory accesses being a precious commodity, and wasting these is one of the fastest ways to poor efficiency and poor performance.