Hardware support: How Lisa Su Turned Around AMD

How Lisa Su Turned Around AMD
How DLSS 2.0 works (for gamers)
Deriving max theoretical Wi-Fi data rates from (near-)1st principles
What stops say, Nvidia to just fit more cuda cores in a GPU to make it more powerful? What limits GPU speeds nowadays?
What is the next big 'revolution' in hardware tech that we should see in the next 5-10 years?
A Look at Intel Lakefield: A 3D-Stacked Single-ISA Heterogeneous Penta-Core SoC
Chip sales are unlikely to cushion Samsung’s profit
I did some measurements of modern cores and made a chart. Also includes transistor count estimates.
AVX2/512 throttling
Does a Symbolic Link from SSD to SSD affect speed?

Posted: 05 Apr 2020 05:02 PM PDT

submitted by /u/teutonicnight99
[link] [comments]

Posted: 05 Apr 2020 09:22 AM PDT

TLDR: DLSS 2.0 is the world's best TAA implementation. It really is an incredible technology and can offer huge performance uplifts (+20-120%) by rendering the game at a lower internal resolution and then upscaling it. It does this while avoiding many of the problems that TAA usually exhibits like ghosting, smearing, and shimmering. While it doesn't require per-game training, it does require some work from the game developer to implement. If they are already using TAA, the effort is relatively small. Due to its AI architecture and fixed per frame overhead, its benefits are limited at higher fps and it's more useful at higher resolutions. However, at low fps the performance uplift can be enormous, from 34 to 68 fps in Wolfenstein at 4K+RTX on a 2060.

Nvidia put out an excellent video explaining how DLSS 2.0 works. If you find this subject interesting I'd encourage you to watch it for yourself. Here, I will try to summarized their work for a nontechnical (gamer) audience.

Nvidia video

The underlying goal of DLSS is to render the game at a lower internal resolution and then upscale the result. By rendering at a lower resolution, you can gain significant performance. The problem is upscaling with a naive algorithm, like bicubic, creates visual artifacts called aliasing. These frequently appears as jagged edges and shimmering patterns. This is caused by rendering a game at too low resolution to capture enough detail. Anti-aliasing tries to remove these artifacts.

DLSS 1.0 tried to upscale each frame individually using deep learning to solve anti-aliasing. While, this could be effective, it required the model to be retrained for every game and had a high performance cost. Deep learning models are trained by minimizing the sum total error between a high resolution ground truth image and the lower resolution rendered frame. This means the model could average out sharp edges to minimize the error on both sides but leading to a blurry image. This blurring, together with the high performance cost, made DLSS 1.0, in practice, only slightly better than native upscaling.

DLSS 2.0 has a completely different approach. Instead of using deep learning to solve anti-aliasing, it uses the Temporal Anti-Aliasing (TAA) framework and then has deep learning solve the TAA history problem. To understand how DLSS2 works you must understand how TAA works. The best way to solve anti-aliasing is to take multiple samples per pixel and average them. This is called supersampling. Think of each pixel as a box. The game will determine the color of a sample at multiple different positions inside each box and then average them. If there is an edge inside the pixel, these multiple samples will capture what fraction of the pixel is covered and produce a smooth edge avoiding jagged aliasing. Supersampling produces excellent image quality and is the gold standard for anti-aliasing. The problem is it must determine the color of every pixel multiple times to get the average and therefore carries an enormous performance cost. To improve performance, you can limit the number of pixels with multiple samples to only the edges of geometry. This is called MSAA and produces a high quality image with minimal aliasing but still carries a high performance cost. MSAA also provides no improvement for transparency or internal texture detail as they are not on the edge of a triangle.

TAA works by converting the spatial averaging of supersampling into a temporal average. Each frame in TAA only renders 1 sample per pixel. However, for each frame the center of each pixel is shifted, or jittered, just like the multiple samples in MSAA. The result is then saved and the next frame is rendered with a new different jitter. Over multiple frames the result will match MSAA but will have a much lower per frame cost as now each frame only has to render 1 sample instead of several. The game only needs to save the previous few frames and do a simple average to get all the visual quality of MSAA without the performance cost. This approach works great as long as nothing in the image changes. When TAA fails, it is because this static image assumption has been violated.

Normally, each consecutive frame is sampling at a slightly different location in each pixel and then averaged. If an object moves, then the old samples become useless. If the game tries to average the old frame, this will product ghosting around the moving objects. The game needs a way to determine when an object moves and remove these old values to prevent ghosting. In addition, if the lighting or material properties changes this will also break the static assumption of TAA. The game needs a way to determine when a pixel has changed. This problem is the TAA history problem and it very difficult to solve. Many methods, called heuristics, have been created to solve this problem but they all have weaknesses.

The reason why TAA implementations vary so much in quality is mostly caused by how well they solve this problem. While a simple approach would be to track each objects motion, the lighting and shadow on any pixel can be affected by objects moving on the other side of the frame. Simple rules usually fail in modern games with complex lighting. One of the most common solutions is neighborhood clamping. Neighborhood clamping looks at every pixels neighborhood to determine the nearby colors. If the color in the new frame is too far from this neighborhood of colors in the previous frame, then the game recognizes that the pixel has changed and removes it from the history. This works well for moving objects. The problem is that a pixels color may also change sharply at a static hard edge or have sub pixel detail. This is why even a good TAA implementations will cause some blurring of the image. Neighborhood clamping struggles to distinguish true motion from sharp edges.

DLSS2 says fuck these heuristics, just let deep learning solve the problem. The AI model uses the magic of deep learning to figure out the difference between a moving object, sharp edge, or changing lighting. This leverages the massive computing power in the RTX gpu's tensor cores to process each frame with a fixed overhead. So at lower frame rates, the fixed cost of DLSS upscaling becomes smaller and the gains from rendering at lower resolutions can exceed 100%. This solves TAA's biggest problem and produces an image with minimal aliasing that is free of ghosting and retains surprising detail.

If you want to see the results. Here is link to Alex from Digital foundry showing off the technology in Control. It really is amazing how DLSS can take a 1080p image and upscale it to 4K without aliasing and get a result that looks as good as native 4K. My only concern is that DLSS2 has a tendency to over sharpen the image and produces subtle ringing around hard edges, especially text.

Digital Foundry

To implement DLSS2, a game designer will need to use Nvidia's library in place of their native TAA. This library requires as input: the lower resolution rendered frame, the motion vectors, the depth buffer, and the jitter for each frame. It feeds these into the deep learning algorithm and returns a higher resolution image. The game engine will also need to change the jitter of the lower resolution render each frame and use high resolution textures. Finally, the game's post processing effects, like depth of field and motion blur, will need to be scaled up to run on the higher resolution output from DLSS. These changes are relatively small, especially for a game already using TAA or dynamic resolution. However, they will require work from the developer and cannot be implemented by Nvidia. Furthermore, DLSS2 is an Nvidia specific blackbox and only works on their newest graphics cards, so that could be limit adoption.

For the next generation Nintendo Switch, where they can force every developer to use DLSS2, this could be total game changer allowing a low power handheld console to render images that look as good as native 4K while internally rendering at only 1080p. For AMD, if DLSS adoption becomes widespread, they would face a huge technical challenge. DLSS2 requires a highly sophisticated deep learning software model. AMD has shown little machine learning research in the past while Nvidia is The industry leader in the field. Finally, DLSS depends on the massive compute power provided by its tensor cores. No AMD gpus have this capability and it's unclear if they have the compute power necessary to implement this approach without making sacrifices to image quality.

submitted by /u/yellowstone6
[link] [comments]

Deriving max theoretical Wi-Fi data rates from (near-)1st principles

Posted: 05 Apr 2020 11:19 PM PDT

Disclaimer: I am neither an electrical nor signals engineer. I'm actually a failed electrical engineer. If I have any of this wrong or you have better explanations please by all means add to the discussion.

Earlier today I was trying to find the official (i.e. not some Verge editor) max theoretical data rate for Wi-Fi 6. My efforts were frustrated by many data sources apparently having only a partial grasp of the problem, sometimes with typos.

A few notes:

(Pure) 802.11ac is 5 GHz only. Dual band 802.11ac routers use n/ac in the 2.4 GHz band, with corresponding performance
When calculating max router performance, calculate performance for both bands separately and then add them
Per the Wi-Fi Alliance, all certified Wi-Fi 6 devices must support 160 MHz channels, MU-MIMO, 1024-QAM, and OFDMA
What follows is aimed at computing maximum data rate. For other data rates or data rates dependent on client type, see the various sources, espeically this one.

I wound up finding the equation, but that's actually the easiest part:

DataRate = (MaxDataSubcarriers * MaxModulation * MaxCoding * SpatialStreams)/(MinTotalSymbolTime)

Where:

MaxDataSubcarriers = Explanation

802.11 Standard	Band (GHz)	Maximum Data Subcarriers
a/g	2.4	48
n/ac	2.4	108
n/ac	5	108
ac Wave 2	5	468
ax	2.4	468
ax	5	1960

- Source Parent, Source

MaxModulation = The solution to the equation n = log2(N) for the highest N-QAM modulation that standard supports

- Source (See Data Rate calculation example for the 2ⁿ trick)

Highest Modulation Scheme	Maximum Modulation	Standard
64-QAM	6	a/g & n/ac
256-QAM	8	ac Wave 2
1024-QAM	10	ax

- Source Parent, Source

MaxCoding = Explanation. Each standard has a maximum coding rate:

Standard	Maximum Coding Rate
a/g	3/4
n/ac, ac Wave 2, ax	5/6

SpatialStreams = The n in nxn radio specs
MinTotalSymbolTime = The sum of 2 properties measured in microseconds, OFDM Symbol Time and Minimum Guard Interval. Those are kinda sorta explained here in a very hand-wavy fashion

Standard	OFDM Symbol Time (μs)	Minimum Guard Interval (μs)	Minimum Total Symbol Time (μs)
a/g, n/ac, ac Wave 2	3.2	0.4	3.6
ax	12.8	0.8	13.6

- Source (PDF & registration wall warning) (See Table 1-2. Note that the .04 and .08 values in the Guard interval (μs) row appear to be typos and should be 0.4 and 0.8, respectively)

How this information helps you

Being able to calculate the max data rate of a standard prevents you from falling prey to wild networking OEM marketing numbers. As long as you know the router/AP's spatial streams and wireless standard you can compute its theoretical max speed completely independent of manufacturer claims.

Example 1

Let's calculate the max 5 GHz data rate of a 4x4 802.11ac Wave 2 AP:

MaxDataSubcarriers = 468

MaxModulation = 8

MaxCoding = 5/6

SpatialStreams = 4

MinTotalSymbolTime = 3.6

Putting that into our equation gives 3466 Mb/s, which is what Cisco lists as the physical link rate for the Wave 2 APs.

Example 2

Let's see if we can reproduce Cisco's numbers for the Meraki MR56. Cisco claims 4804 Mb/s in the 5 GHz band (8x8) and 1147 Mb/s in the 2.4 GHz band (4x4)

For the 2.4 GHz band:

MaxDataSubcarriers = 468

MaxModulation = 10

MaxCoding = 5/6

SpatialStreams = 4

MinTotalSymbolTime = 13.6

Putting that into our equation gives 1147 Mb/s. So far so good.

For the 5 GHz band:

MaxDataSubcarriers = 1960

MaxModulation = 10

MaxCoding = 5/6

SpatialStreams = 8

MinTotalSymbolTime = 13.6

Putting that into our equation gives 9607 Mb/s. Wait, that's 2X what Cisco quoted. What's going on? It appears at least 2 things are at work:

The 5 GHz radio is running in low power mode so it fits within the PoE envelope
The AP has a 5GBASE-T port, limiting backhaul bandwidth (but at the same time ensuring it can be fully saturated)

Conclusion

Learned a lot doing this little (well, I'm kinda nearly 6 hours into it at this point) exercise. Stay safe, wear masks, and keep leveling up in isolation :P

submitted by /u/jdrch
[link] [comments]

What stops say, Nvidia to just fit more cuda cores in a GPU to make it more powerful? What limits GPU speeds nowadays?

Posted: 05 Apr 2020 03:27 AM PDT

Title basically, ive been wondering why its so hard for AMD to produce a competing high end card, apparently its not so easy?

Edit: Thanks for all the elaborate answers, ive read them all and even if its not so easy to understand some of the technical terms for me as just a hobbygamer, ive learned something today, thanks!

submitted by /u/UsernameAlre3dyTaken
[link] [comments]

What is the next big 'revolution' in hardware tech that we should see in the next 5-10 years?

Posted: 05 Apr 2020 11:28 AM PDT

I'm curious what will be the next big technological leap. Like the jump from HDD to SSDs.

submitted by /u/lddiamond
[link] [comments]

A Look at Intel Lakefield: A 3D-Stacked Single-ISA Heterogeneous Penta-Core SoC

Posted: 05 Apr 2020 10:18 AM PDT

submitted by /u/dayman56
[link] [comments]

Chip sales are unlikely to cushion Samsung’s profit

Posted: 05 Apr 2020 11:02 PM PDT

submitted by /u/robbyyy
[link] [comments]

I did some measurements of modern cores and made a chart. Also includes transistor count estimates.

Posted: 05 Apr 2020 10:41 AM PDT

The Chart

Spreadsheet
Was bored, did this for fun.

Important caveat for transistor counts: This is based on average SoC density. Different structures on the SoC have different densities so these estimates will be quite inaccurate. Also yes I didn't truncate the decimals. I'm special.

As for Zen 2 4MB, I just cut the size in 1/4th, as Renoir die shots are not available to take a measurement.

submitted by /u/CatMerc
[link] [comments]

AVX2/512 throttling

Posted: 05 Apr 2020 10:04 PM PDT

submitted by /u/valarauca14
[link] [comments]

Does a Symbolic Link from SSD to SSD affect speed?

Posted: 05 Apr 2020 08:58 PM PDT

I'm planning to buy a bigger SSD, for my games, and I'm planning to use a symlink for some of the documents, would it affect the loading speed of games? They're both SSD though.

submitted by /u/Samonji
[link] [comments]

Breaking News

Hardware Support

Monday, April 6, 2020

Hardware support: How Lisa Su Turned Around AMD

Hardware support: How Lisa Su Turned Around AMD

Disclaimer: I am neither an electrical nor signals engineer. I'm actually a failed electrical engineer. If I have any of this wrong or you have better explanations please by all means add to the discussion.

How this information helps you

Example 1

Example 2

For the 2.4 GHz band:

For the 5 GHz band:

Conclusion

No comments:

Post a Comment

Popular

Blog Archive