Hardware support: Spatiotemporal Importance Resampling for Many-Light Ray Tracing (ReSTIR)

Spatiotemporal Importance Resampling for Many-Light Ray Tracing (ReSTIR)
US blocks all shipments of semiconductors to Huawei
Pixels? Triangles? What’s the difference? — How (I think) Nanite renderes a demo with 10¹¹ tris
TSMC Announces Intention to Build and Operate an Advanced Semiconductor Fab in the United States
Benchmarking Amazon's Graviton2 Performance With 64 Neoverse N1 Cores Against Intel Xeon, AMD EPYC
Asetek Rad Card - Industry’s 1st Slot-in PCIe Radiator Card
Questions Regarding the AOC 24G2 and ASUS VP249HE
Tiger Lake U spotted with 4C/8T and 2.8 GHz base clock
Chinese Pre-Built PC Review: ZhaoXin CPU + Knock-Off Windows OS, ft. NeoKylin
NUVIA: The Tesla of Silicon? (+ Interview with NUVIA's Jon Masters)
Finally here: Z490 motherboard roundup

Spatiotemporal Importance Resampling for Many-Light Ray Tracing (ReSTIR)

Posted: 15 May 2020 09:00 PM PDT

submitted by /u/Glassofmilk1
[link] [comments]

US blocks all shipments of semiconductors to Huawei

Posted: 15 May 2020 06:05 AM PDT

submitted by /u/jeffersondeadlift
[link] [comments]

Pixels? Triangles? What’s the difference? — How (I think) Nanite renderes a demo with 10¹¹ tris

Posted: 15 May 2020 09:30 AM PDT

The new Unreal Engine 5 demo was different. It wasn't just an illustration of what you can get from a good art team pushing hardware to its max. The demo was beautiful, but much more importantly, it was also impossible, pushing more triangles from more sources than seems fundamentally possible for the geometry hardware available.

If you saw the demo and thought it was Mesh Shaders and the SSD, you've missed 90% of the technology they showed. Not only is the Nanite rendering engine they used not using Mesh Shaders most of the time, from the perspective of the GPU hardware it's not even rendering triangles. I think I've managed to piece together a decent understanding of the basic outline of what it is doing, which is what I'll cover here.

The demo

Vimeo: https://vimeo.com/417882964
I suggest watching at 4k if possible, even if your screen resolution is only 1440p. Note that YouTube uses stronger compression, so loses a lot of the detail that makes the demo so incredible.

Gallery: https://www.eurogamer.net/articles/2020-05-13-epic-unreal-engine-5-screenshot-gallery
These images end in "/EG11/resize/1920x-1". If you remove that you can access uncompressed 4k screenshots.

Sources

Brian Karis' old blog posts, particularly Virtual Geometry Images. Brian is a huge part of the reason this demo exists at all.

Brian Karis' Twitter feed.

This Eurogamer article, which includes clarifying quotes.

Geometry Images, a paper by Xianfeng Gu, Steven J. Gortler, and Hugues Hoppe.

Multi-grained Level of Detail Using a Hierarchical Seamless Texture Atlas, a paper by Krzysztof Niski, Budirijanto Purnomo, and Jonathan Cohen.

A quick overview of traditional rendering

Representation

Traditionally, video games are made of triangular meshes. Each triangle renders some patch of your textures onto the world, which each rendered path trying to emulate as much fine-grained detail inside it that it can. The process works something like this:

You start with a model that you want to render, which has a 2D surface covered with triangles. Each vertex (point) is shared between several triangles, so we to store the vertex information once per vertex, and use triangle strips to tell us how to join the vertices into triangles.

Each vertex needs to know where it is in space, as in its x, y, and z positions, but the triangles also needs to know what data inside to render. It would be very inefficient for each triangle to have its own textures, so instead we use something called UV mapping, where we build a single texture for the whole surface, and each vertex maps to a specific point on that mesh with something called the 'texture coordinate'.

Because our triangles are covering significantly sized areas on the screen, we also need to include textures that capture the fine-grained detail of the triangle, which includes the suface's normal map, a texture representing the angles of the pixels, and can also include a height map, representing how far the pixel should jut inward or outward of the plane of the triangle.

Rendering

When rendering the game, the GPU looks at your triangle list, which point to indices for the vertexes. Those vertices are looked up to provide their positions, at which point the GPU is able to figure out where the triangle lives on screen.

For each triangle, the GPU will figure out which pixels that triangle renders and pass the work off to the 'pixel shader'. Every triangle handled this way has its own overhead, and the GPU will also only handle pixels in groups of 4. The pixel shader then does most of the heavy lifting, looking up textures and computing the direction that light reflects off of it, given the surface normal.

Because each triangle has overhead, and does at least four pixels of pixel shading work, this new engine Nanite, which claims to work with pixel-sized triangles, sounds very inefficient. That would be a lot of work going to waste, if a traditional pipeline was in use. On top of this, having so many vertices means you need to make all of these auxillary data structures we just talked about as large as your textures, which would hugely increase memory overhead.

Geometry images

Representation

A geometry image is a texture where every pixel is given a position in space, and triangles in the mesh are implicitly defined by triangles in the geometry image.

This video visualizes how the data is stored. Each pixel is interpolated between its position in the geometry image, and the position it encodes in object space. Note that pixels next to each other in the geometry image are also next to each other in object space, and with the exception of the seam, pixels next to each other in object space are also next to each other in the geometry image.

This is an efficient way of defining high-density triangles, since you only need the position for each vertex, and the triangles themselves are stored implicitly. However, as before, not only do you need to know where the triangles are in space, you also need to know what part of the textures they are rendering. Normally this requires storing texture coordinates in each vertex, but in this case there is a simpler solution: your geometry image is already a UV mapping, so just store texture data at the same position as geometry data.

Micropolygon geometry

This method of storing triangles implicitly does not mean that your geometry image has to be the same size as your texture data. You could still use this method for storing low polygon count models, and still stretch your textures over it. Heck, you could also even for store high polyon models with low resolution textures, hoping that lighting up all that geometric detail makes up for the othewise blurry image. However, this method does make storing huge polygon meshes much more efficient, for a swathe of reasons.

The first is simple data efficiency. Each vertex in the mesh requires its x, y, and z position to be stored, and that's it, unlike for triangle soups, which need multiple sets of indices for the triangles and the texture coordinates. Further, if your geometric detail is already at the pixel level, you typically don't require your textures to have normal maps or displacement maps—it's all just geometry.

You get more benefits. Because your geometry images are just images, and are spatially coherent (there are no discontinous jumps except at the borders), you can use techniques traditionally reserved for textures. For example, you can perform image compression on your textures. You can upscale and downscale your textures; downscaling can be done by the GPU with something called a mipmap. You can load patches of the geometry individually at precisely the wanted geometric detail, using something called virtual texturing.

The regularity of your geometry also makes it easier to do operations on local patches of geometry. A simple example is view-dependent downsampling, where parts of the object closer to the camera are represented in higher detail than parts further away. This is common for textures, but very difficult for traditional polygon meshes. But you can also imagine how you might apply deformations locally to specific parts of the geometry, or even perform backface culling (avoiding rendering the backside of triangles) over patches of the geometry.

Rendering (fallback path)

If your geometry is lower resolution than your textures, the standard triangle raterization hardware in GPUs is still going to be the best way to draw your image. Therefore you want to turn the implicit triangles in your geometry images into explicit triangles the hardware can understand.

There are a few ways of handling this. The most efficient is using new hardware support called 'mesh shaders'. It is worth noting that mesh shaders aren't new hardware functions; but rather a new interface directly to the hardware that already existed.

Microsoft have put out a very clear explanation of mesh shaders. However, the simple description is that a mesh shader reads in a bunch of arbitrary data, and produces triangles to render. In this case, a mesh shader can be used to produce triangles from pieces of the geometry image.

Older hardware does not have support for mesh shaders, so they need to use different techniques. There are several possible approaches they could use (tesselation, instancing, generation in compute, etc.), but all of these have drawbacks, and I don't know how to evaluate them.

Once you have hardware triangles, it's just a traditional renderer. Your processing will still largely happen in the pixel shader, as hardware-generated samples inside of your triangles. However, you still retain a few benefits, like mipmapping and support for virtual textures.

Rendering (true ending)

Let's assume that you have geometry with uniformly pixel-level detail, so each implicit triangle in your geometry image renders to exactly one screen pixel, with some overlap from objects at different depths. You do not want to incur triangle-level overheads for each pixel in the resulting image.

This time, let's avoid rendering triangles altogether. Instead, as each pixel in your geometry image corresponds to a pixel in screenspace, you can perform pixel shader work directly on the geometry image. Then, each sample can be projected to its location on screen, and written there directly; this operation is called a 'scatter', and it's something GPUs are very good at nowadays.

This has a number of huge advantages. You aren't just operating on pixels within one triangle in parallel, but on small patches of pixel-sized triangles at the same time. The overhead is therefore much, much lower. Furthermore, this does very good things for memory: pixels next to each other in the geometry image map to neighboring (or overlapping) pixels on the screen, and access neighboring pixels from the textures, all of this on top of the overall lower memory usage anyway!

There are a few main difficulties that were skipped.

Rendering needs to be depth-tested. That is, writing a pixel out to the image should not replace a pixel already written to the image, if that pixel is closer to the camera. I do not know how this is done, but it certainly not rocket science. It is possible that specific hardware features were added to support this, but it's also totally possible that the developers figured out another trick.

Of course, any real renderer is going to need to handle cases that don't map 1:1 with pixels on the screen, for example if the geometry data is not of sufficient resolution, or you're using a lower resolution mipmap, or your triangles are simply not perfectly uniformly spaced. In this case, you will need to iterate over each pixel each triangle contains, and render that. Note that this is only efficient if most triangles in the rendered patch are about the same size on screen, which is a property that you only get from this sort of texture representation. Note that each point you render can interpolate its position between multiple surrounding points, and not necessarily even linearly, which is something you simply can't do with traditional renderers.

Other details are of course important, like antialiasing. Unfortunately this is about at the limits of my ability to speculate.

Nanite uses two software renderers; my initial guess is that these are specialized for different triangle sizes.

That's impressive, but is it really that simple?

No.

What's up with the SSD? Is the PS5 special?

There are two questions here: does the Nanite technology require PS5-tier SSD speeds?, and does this demo require PS5-tier SSD speeds?

The first question is simple: no. As shown above, Nanite is first and foremost a way of storing and rendering geometry. Despite first appearences, this is an especially data-efficient way of storing geometric detail, and should always look more detailed than a normal-map approach at similar file sizes.

For the second question, you should understand how much data this demo had. The demo had "hundreds of billions" of 'virtual triangles', so even if after compression they are spending only a couple of bytes per triangle, and some geometry was reused, that's still way in excess of a hundred gigabytes of data, being scanned over in under ten minutes of gameplay. This is so much data that despite the PS5's insane SSD speed, I actually think there was at least one disguised loading screen, most visibly at 7:20-7:40, where the outside is whited out while I assume that portion of the level was loaded in. An XBox Series X's SSD would have taken twice as long to load this, and likely otherwise been slightly less timely with loading in geometric detail.

However, you should remember that this is a pathological case. Unless we're about to get shipped terabyte file sizes, shipped games will not be quite this geometrically dense. This is still not to say they won't look nearly this good; 8k textures are excessive, and a mix of 2-4k textures is still incredibly detailed, given that each pixel represents geometry.

Relation to Texture-Space Shading

When I wrote my Harware Futurism post, I discussed how Sampler Feedback can be used to support texture-space shading, where shading work is done in textures, and triangle rasterization is only used to move these to the screen. Because Nanite merges the triangle space into the texture-space, it can be considered a form of texture-space shading, although it's not clear whether they are making use of Sampler Feedback.

This means that Nanite inherits a few advantages of texture-space shading, although not all of them. One example is the ability to do more accurate blur effects.

Consider a cube with one white side and five red sides, with the white side facing towards you, such that none of the red sides are visible. If you were in space, the red light on the sides of the cube would be impossible to see. However, imagine the cube is in a foggy room. The red light from the side of the cube is able to reflect off of the fog, so will be visible as a halo around the visible white cube face.

This sort of blur effect is typically very challenging without some sort of ray tracing or light transport, as a simple blur of the rendered pixels will not add any red colour. However, in texture space, the red colour can still be sampled.

Culling, deformations, physics, interpolation

I have already mentioned this, but it's worth repeating. Nanite increases the number of triangles, which means there is more work to do when modifying meshes dynamically, but because the mesh is represented in local patches, algorithms can potentially operate much more efficiently on the mesh.

For example, an irregular cut through a mesh held in a triangle strip will produce a huge number of disconnected strips, that then have to be rejoined. However, a cut through a geometry image just produces two new partially occupied geometry images. Rather than fuss over restitching these into regular squares again, you could simply use mipmaps and virtual texturing to store the split efficiently. The resulting sub-geometry can still be rendered quickly.

Another example is culling, which could be done using a mipmapped hierachy of normal cones (aka. cones containing every normal vector in that patch of geometry). If the cone faces away from the screen, none of the triangles in that patch need to be rendered.

Subdividing geometry is normally nontrivial, since you need to interpolate between vertices that aren't colocated in memory. Subdividing geometry images only requires simple image interpolation; in fact, you could even imagine using some kind of AI upscaling.

Things you missed in the trailer

I have already mentioned that I suspect there was at least one disguised loading screen, most visibly at 7:20-7:40. Hands up if you spotted that too. The fact it was so subtle to be disguised as camera overexposure was very impressive.

At 2:05, there is a section showing the triangles rendered by Nanite. A few effects are visible here.

You can see the warped grid layout on the triangles in the lower resolution areas, with each mesh looking like it is made up of quads of roughly equal size. These are the implicit triangles stored in the geometry image, which is intrinsically grid-shaped.

Secondly, note how, especially in the top-left quarter of the screen, the resolution of the triangle mesh rapidly changes to meet the wanted resolution, both increasing and decreasing, by densities of a factor of 4. This is mipmapping.

This screen also seems to show a fairly sharp division between triangle-mode rendering and pixel-mode rendering. This might be an artefact of video compression, but it's also possible this is when Nanite switches between its triangle-optimized compute shader and its pixel-optimized compute shader. Perhaps the largest triangles in this frame might even be generated from a mesh shader.

In this screenshot, from 7:47, you can see blockiness artefacts from distant textured terrain, like the foliage. This results in very uniform pastel-like colours, whereas closer objects end up with much more texture. This, I'm fairly sure, comes from the mipmap-based downsampling they are using, combined with getting shadow detail dynamically. Capturing all the shadow detail to get more accurate would require modelling all of these internal facets of the geometry, which at this distance is sub-pixel. Don't get me wrong, though, it's still a hundred times better than traditinal rendering approaches.

1440p30

Given the expectations for next-gen to run at 4k60, people are understandably worried when this demo ran at 1440p30. Some are even dismissing the overhead outright; if you're lowering the resolution and frame rate so much, what's the point?

Frankly, I have no sympathy. I would much, much rather run games that look like this even at 1080p60 upscaled, than a traditionally-rendered one at 4k60, and this is coming from someone who thinks 1800p on a laptop is too few pixels. If you don't think it looks that good, sorry, but you're wrong.

However, one can make a few inferences about performance scaling. For example, people are talking about turning asset quality down, and rendering fewer triangles. However, because of the way this is being mipmapped and rendered, it's not actually clear that this would efficiently save computation time. I would not be surprised if Nanite, when rendering dense triangles in compute shaders, is closer to fixed cost for a given resolution for rasterization (though shading should still scale). Thus you shouldn't expect this demo to run at 4k just by optimizing the assets.

A lot of the rendering cost is going to come from the lighting and shadowing. Although you can easily talk about turning that down, Brian Karis says, "we render virtualized shadow maps with Nanite to get those super detailed shadows. Without it is sometimes hard to tell the difference between real high poly geo and normal maps. Detailed shadows are important!" Perhaps instead shadows can be updated less frequently, trading shadow latency for frame times. I don't know.

Daniel Wright says they are working on rendering at 60fps at the same resolution. If they get there, all the better!

Lingering questions

Some burning questions about the technology still remain.

We saw a lot of highly microtextured objects like rocks, showing off the polygons and diffuse shadows. However, we didn't see anything shiny or flat. The Geometry Images paper shows that this sort of geometry can create artificial sharpness from aliasing artefacts. Is this solved in Nanite, or will people have to resort to mixed strategies for smooth and shiny objects? Certainly I could imagine mitigation strategies, like adding back a redundant normal map to correct for artefacts in the implicit normals, or calculating the normals not from the triangle directly, but as some kind of bilinear sample.

The rendering technique I described does not include provisions for antialiasing or other supersampling techniques. It does, however, seem totally compatible with pixel-thin objects like hair and grass, with only reasonable modifications, since it is still fundamentally triangle-based. It will be interesting to see how these are approached; it is, after all, not explicitly said anywhere that the character in the demo was rendered purely from Nanite, and mixed techniques are plausible.

There is a similar question with reflections, which sound very challenging for such a fine grain of detail. It will be interesting to see how this is handled; again, mixed-mode rendering is a possibility.

So... geometry is now a solved problem? What next?

Yes, I honestly, truly think this is the future of rendering. I don't think it's a gimmick, or just a demo turned to max. As far as I can tell, this is real.

Once you have a triangle per pixel, where can you go?

First, performance. From a baseline of 1440p30, DLSS should take us to 4k30, and a 4x GPU performance boost should get us to 4k120. That's that done.

Then, well, I guess you have to find other stuff to do. Full ray tracing, subsurface scattering, better physics, contact deformation. There are still branches to pick. But, really, what would we even do with 200 teraflops of compute? I guess that's for the future to figure out.

Oh, and a quick point about VRAM: it's clear we have enough now. SSDs remove a huge amount of memory pressure, so what we've got seems enough for universal photorealism. Sure, ray tracing or other techniques might turn out to be memory hungry enough to push at the edges, but we're talking maybe factors of two, not 10.

An aside: Euclideon

Some people might have heard of a company called Euclideon, pushing an alternative rendering approach that also offers very high fidelity graphics. In all likelihood, Euclideon was using something called sparse voxel octrees.

Although sparse voxel octrees are real technology with a lot of the advantages Euclideon said they have, they have some fundamental disadvantages. Most importantly, the data they hold is not stored coherent, but it's spread around, and this makes it much harder to do certain things with them, and has significant memory costs.

In contrast, Nanite is still representing the surface of the triangle meshes, it's just doing it with much greater geometric density, and in particular it's a very regular representation. There is some sparsity from virtual textures, but it not of a sort that prevents a lot of the operations a graphics pipeline wants to do.

Craig Perko gives a very fair discussion of sparse voxel octrees.

An aside: Lighting

Lumen is at least as crucial to the picture as Nanite, but I'm less able to say anything illuminating about it, whereas I've hopefully wrapped my head around Nanite. Lumen uses a hierarchical approach to lightning, and I'll defer the detailed look to Alex Battaglia from Digital Foundry.

Please comment with any typos or corrections; I haven't done a thorough editing pass.

submitted by /u/Veedrac
[link] [comments]

TSMC Announces Intention to Build and Operate an Advanced Semiconductor Fab in the United States

Posted: 15 May 2020 06:18 AM PDT

submitted by /u/AWildDragon
[link] [comments]

Benchmarking Amazon's Graviton2 Performance With 64 Neoverse N1 Cores Against Intel Xeon, AMD EPYC

Posted: 15 May 2020 11:26 AM PDT

submitted by /u/dylan522p
[link] [comments]

Asetek Rad Card - Industry’s 1st Slot-in PCIe Radiator Card

Posted: 15 May 2020 07:54 AM PDT

submitted by /u/sleepygeepy_ph
[link] [comments]

Questions Regarding the AOC 24G2 and ASUS VP249HE

Posted: 16 May 2020 02:03 AM PDT

Hi! I'm currently looking into purchasing the AOC 24G2 for my laptop (and eventually a desktop in a couple years). My laptop only has an HDMI port, so I was wondering if I'm able to set the refresh rate of the monitor to either 75hz or 120hz (or anywhere in between) even while using the HDMI 1.4 port.

AOC 24G2

I was also wondering if anyone knows about the ASUS VP249HE as I'm also looking into buying it as my alternative option, which would be to invest in something temporary for the next couple years until I get a desktop and then invest into something more powerful. I wanted to know about reliability, how gaming with it might be, and if there would be any noticeable input lag.

ASUS VP249HE

submitted by /u/OceanDescendant
[link] [comments]

Tiger Lake U spotted with 4C/8T and 2.8 GHz base clock

Posted: 15 May 2020 05:51 AM PDT

submitted by /u/DarrylSnozzberry
[link] [comments]

Chinese Pre-Built PC Review: ZhaoXin CPU + Knock-Off Windows OS, ft. NeoKylin

Posted: 15 May 2020 07:33 PM PDT

submitted by /u/mycall
[link] [comments]

NUVIA: The Tesla of Silicon? (+ Interview with NUVIA's Jon Masters)

Posted: 15 May 2020 05:11 PM PDT

submitted by /u/808hunna
[link] [comments]

Finally here: Z490 motherboard roundup