Unreal Engine 5's Nanite system - explaination video

Started by
7 comments, last by a light breeze 2 years ago

New video by the architect of Epic's Nanite system for Unreal Engine 5:

  • This is heavy going and fascinating. A few key points.
  • Their goal is constant-time rendering regardless of scene complexity. That seems impossible, but it's not. The key concept is that the number of pixels on screen is fixed, and thus, in theory, the number of triangles you need to display is, too. Once triangles are smaller than pixels, that's as small as you need to go. So, ideally, the total number of triangles being drawn is about 2x the number of pixels. Which is not too hard with current hardware.
  • Right now, this only works for opaque, static meshes. No deformations or rigging yet. That will come.
  • The Nanite LOD system is designed to take a big mesh and display any part of it at any level of detail. They start with a mesh with some huge number of triangles.Then then find small clusters which fit certain complicated geometric criteria. Those areas can be rendered either with all the triangles, or as a reduced version with half as many. The outer boundary of the cluster remains the same. This is done recursively upwards. (I'm not sure how they keep the edges of larger clusters from having thousands of tiny edge triangles.) At display time, the level of detail shown for each section of the mesh goes down to the level necessary to keep triangles roughly smaller than pixels.
  • With all those tiny triangles, the fill problem is completely different than it is normally. The GPU fill hardware doesn't help much when you're filling a partial pixel. So they do a lot of that in the CPUs.
  • There are some very format-specific compression and streaming mechanisms to handle storing and fetching all that data efficiently.

A huge amount of effort went into this. It's impressive from a theoretical perspective. They're reduced an O(N) problem to an O(1) problem.

Advertisement

Nagle said:
The outer boundary of the cluster remains the same. This is done recursively upwards. (I'm not sure how they keep the edges of larger clusters from having thousands of tiny edge triangles.)

That's the invention - anything else you list is just what we expect from a LOD solution and nothing new. Just games rarely implemented LOD aside some algorithms restricted to heightmaps and discrete LOD for characters.
The game Messiah used a very similar algorithm for characters in the year 2000, probably using a precomputed tree of edge collapses. Support for skinning was given here as well.
Hugues Hoppe did many related papers. But GPUs were fast at brute force and dynamic geometry required constant reupload each frame, which is slow. Thus compute has to be used for a efficient solution, which came later.

The Key of Nanite is the fixed cluster boundary, so there never is a need of complex and slow stitching across differing levels of detail.
But it's fixed only between two certain levels. As we move up the hierarchy, the boundary edges become internal edges as child clusters become a single parent cluster. At this point they can be reduced, and high tessellation is prevented.
It's thus important to make sure the merging of clusters properly alternates and no boundary segments persist across multiple levels, and done. Pretty brilliant! : )

What the algorithm can't solve is texturing. Texture is reused for all levels, so changes in topology (closing holes or handles) will create texture seam artifacts. To minimize this to some degree, high quality UV maps are required. If we have many UV charts or differing materials on a model, they enforce fixed boundaries which can not alternate so we have a problem.

It's also worth to mention that current HW raytricing is not compatible with such technology due to BVH blackbox. Thus, imo HW RT is currently pretty useless, cough.

JoeJ said:
That's the invention … The Key of Nanite is the fixed cluster boundary, so there never is a need of complex and slow stitching across differing levels of detail. But it's fixed only between two certain levels. As we move up the hierarchy, the boundary edges become internal edges as child clusters become a single parent cluster. At this point they can be reduced, and high tessellation is prevented. It's thus important to make sure the merging of clusters properly alternates and not boundary segments persist across multiple levels, and done. Pretty brilliant! : )

Right. That's the new thing. I don't really get it yet, though. I'm missing something. I'm not clear on where the long straight edges needed as you zoom out get introduced. I need to rewatch that part of the video.

Oh, now I'm starting to get it. The long interior edges introduced when you simplify a cluster become the required long perimeter edges of a cluster at a less detailed level of detail. That's where the long edges needed come from. No new vertices; all the vertices are original.

There's some kind of graph algorithm that cuts up the clusters so that the clusters at the next less detailed level of detail level have a limited number of edges. Controlling growth in the number of edges of each cluster is crucial to this. If the re-triangulation of a cluster was arbitrary, like just running Delaunay, this probably wouldn't work. Something in the graph processing chooses edges that make it well-behaved. That's where the graph theory comes in. Don't understand that part yet. That's what makes it all go, though.

Right, texture seams might be a problem. But if triangles on screen are always subpixel, you probably can't see them. I wonder how this works on things that have hard edges and smooth surfaces, like cars. Probably OK, because on screen, all the triangles are subpixel. Then they use a temporal anti-aliasing pass on the screen image to hide some artifacts.

Nagle said:
Right, texture seams might be a problem. But if triangles on screen are always subpixel, you probably can't see them.

When i tried UE5, i saw seams pretty often. Triangles often end up pretty large, latest if you go close and no more higher resolution is available.
The main goal of LOD is consistent performance, not necessarily insane detail. The latter is just ideal to show a generational leap and make jaws drop.
But if you want to use the same content for a next gen and prev gen / mobile game, seams will be more noticeable on the lower end platforms. It surely becomes a problem in cases and tweaking content / making compromises might be necessary.
If your content is just rocks, all texture is pretty uniform in color and structure, so seams are less visible. It will depend a lot on content. And user generated content remains a harder challenge if users are unaware / ignorant to technical limits.

A true general LOD solution has to support reduction of topology at some point. I'm not sure how much Nanite can do here even on the geometry side. I tried to import a model with many small holes, but i'm no UE user and failed on the import.

However - in relation to the achievement over current stat of the art, those issues are just minor. Nanite is great. My own (unfinished) LOD solution can solve those problems, but it can't handle detailed geometry like a bicycle, which Nanite can do perfectly well.
There is no single general solution for everything. We still need to combine multiple techniques for differing requirements. Complexity only increases with time…

Nanite is really a mesh asset format. Most of the work is precomputing and compressing all that geometry, which then goes into the game asset files. When it's time to render the asset, a relatively simple and fast set of operations generates pixels.

This has some interesting implications:

  • This may force a rethink of mesh formats and graphics APIs.
  • Once this appears in shipped games, we'll probably see modders making tools to read and write Nanite assets. Unless Epic has some strong patents around Nanite, which they don't seem to have, someone can do that. Import and export filters for Blender would be nice to have.
  • In time, there might be GPU hardware support. UE5 does a lot of the rendering in the CPUs.. Usually, the GPU's scanline rasterizers do much of the pixel-setting. Those aren't all that useful for one pixel. The speaker says they'd rather have more general purpose compute than special purpose hardware. We'll have to see what the GPU people think of that.

Nagle said:
a relatively simple and fast set of operations generates pixels.

I did read recently it's not that simple, meaning compatibility. The persistent threads they use to generate rasterization jobs is not really specified to work. but so far it works on all GPUs due to luck. There may be troubles with future drivers.
Sadly, compute does not progress. Still the same batch processing model we had at its release. Fine grained stuff is difficult, and GPUs generating their own work is still a dream.

Nagle said:
We'll have to see what the GPU people think of that.

Hehe, yeah. I know what they think: ‘Forget about efficient software. Performance is our job, and you do it the way we dictate - by bruteforce, on huge, expensive discrete Toasters.’ : )

Azriku09 said:
. By the way, did you know that [redacted] account for sale

Who let the spammers in? Does this place have a “Report spam” button?

Nagle said:

Who let the spammers in? Does this place have a “Report spam” button?

Yes? Top right corner of a post, the three dots stacked on top of each other, then click the “Report” link that pops up.

This topic is closed to new replies.

Advertisement