• Breaking News

    Saturday, August 29, 2020

    Hardware support: RTX3090 BW, core count, and frequency rumours seem quite consistent at this point. What do they imply about IPC, and how does it get there?

    Hardware support: RTX3090 BW, core count, and frequency rumours seem quite consistent at this point. What do they imply about IPC, and how does it get there?


    RTX3090 BW, core count, and frequency rumours seem quite consistent at this point. What do they imply about IPC, and how does it get there?

    Posted: 28 Aug 2020 06:48 PM PDT

    At this point, it seems all but confirmed that the 3090 will have a memory bandwidth of 936 GB/s, 5248 CUDA cores, and quite similar clocks to a 2080ti.

    This is interesting to me, since it means that the raw cores*clocks increase is just around 21%, while the memory bandwidth increase is 52%. NVidia has basically never, at least in recent memory,

    • released a card that has much more bandwidth than it actually needs
    • regressed in effective utilization of bandwidth across generations
    • and usually actually gets more final performance out of a given amount of bandwidth with each new architecture (thanks to improved caches, compression, or both).

    Even if we assume that the final point has reached an engineering/algorithmic limit and the efficiency in terms of BW is only equally good as on Turing, that still means a ~25% IPC increase is required for the bandwidth provisioning to make sense. We have heard a lot about how the memory is expensive, also makes the board more expensive, and increases power consumption, so they're not putting it on there just because.

    A >25% IPC increase in a single generation is huge, and not something which is likely to just happen with incremental refinement. So what is going on? I can only see a few possible scenarios:

    • the workloads they envision for the future are more BW intensive than current ones. This would be a direct reversal of the trend over many years where workloads got generally more compute-intensive. It could of course be a function of raytracing, but I think that's, if anything, more cache/latency heavy. Tensor operation are essentially dense MMULs, so even if they pump out a ton of FLOPs I still don't think they would be external BW limited too severely since there's a lot of reuse potential (O(N²) mem traffic for O(N³) ops).
    • NV totally messed up the balance on their entire new product line. I just don't think that's likely.
    • There's actually something to the vague "2xFP32" rumour.

    Now, the third is what I want to discuss. I initially dismissed this completely (you can find my post about it) for a few reasons, primarily (i) that it's a huge departure from the incarnation of Ampere we already know and (ii) it seems silly to make a CUDA core superscalar when you could instead simply put more of them on the card. These are very simple cores after all, right?

    The first one is still a valid point I believe, but I'm not so sure about the second one any more.

    • CUDA cores have gotten a lot more complicated in terms of control logic. In the early days they were little more than glorified SIMD lanes, but now there is some potential for independent scheduling, all the per-warp magic stuff, and much more. So maybe replicating them or making them wider isn't actually as good a tradeoff as it seems anymore.
    • Turing actually made each CUDA core superscalar already by introducing an independent INT ALU, and that was very successful in some workloads (and seems to be getting incrementally more successful with more compute-focused and modern engines). It's the reason we see Turing pull ahead much more in some scenarios than others, even outside of RT or any other new architecture features.
    • It could not be a full-functionality FP32 duplication, but just some subset of instructions. I don't know enough about hardware, die sizes per instruction type etc. to know if this actually makes sense, but I feel like NV would have sufficient data about the workload composition of shaders in games to determine that. (Which is also important since this entire thing presumes there are actually enough independent FP ops in each SIMT instruction stream on average that you can actually make meaningful use of your superscalar CUDA core)

    So, what do you think?
    Am I reading too much into a few rumoured numbers? (probably)
    Is it just a change in workloads? (maybe)
    Did they get the IPC improvement in another way? (ideas?)
    Did NV simply mess up?
    Am I missing something obvious?

    PS: I should note that I'm not a hardware engineer in any way, just someone who has been doing low-level GPU programming and optimization for a long time and has a basic CS education on HW aspects. So anything that relates to the nitty gritty of hardware might be way off and I appreciate corrections.

    submitted by /u/DuranteA
    [link] [comments]

    [Videocardz] GAINWARD GeForce RTX 3090 and RTX 3080 Phoenix leaked, specs confirmed

    Posted: 29 Aug 2020 01:36 AM PDT

    RTX 3090 and other cards Specifications Leaked

    Posted: 28 Aug 2020 05:21 AM PDT

    [Videocardz] Zotac RTX 3090 Trinity pictured and 3000 lineup renders

    Posted: 28 Aug 2020 12:12 PM PDT

    AMD RX 5300

    Posted: 28 Aug 2020 05:17 PM PDT

    Speculation on Ampere's pricing and highlighting the challenge they face with Pascal.

    Posted: 28 Aug 2020 10:20 AM PDT

    (TLDR at the bottom, this is a long one)

    Lately, I've seen a lot of people on this sub expecting extremely high prices for the new Ampere lineup. I wanted to discuss this in more detail because I think Turing's pricing was a strategy to foster this exact sentiment. Let me explain.

    First, let's look at the pricing of the XX60, XX70, and XX80 lineup from the 400 series to the 1000 series. I've omitted the Ti hardware (aside from the 560ti) so I can focus on the most common cards which represent the largest userbase.

    Gen XX60 XX70 XX80
    400 $229 $349 $499
    500 $249 (ti) $349 $499
    600 $229 $399 $499
    700 $249 $399 $649
    900 $229 $329 $549
    1000 $249 $379 $599

    You should see a pretty obvious pattern here. * XX60 cards fall between $200 and $300 * XX70 cards fall between $300 and $400 * XX80 cards fall between $400 and $600 (despite the 780 anomaly)

    Then came Turing:

    Gen XX60 XX70 XX80
    2000 $349 $499 $699

    Suddenly, every single card shifted up to the next price bracket, with the XX80 cards coming in around where XX80ti cards used to be. This was no mistake, and Nvidia's reports confirmed this was an intentional move. Look at the 'Majority Buying Up' card. They imply that a 1060 to a 2060 isn't a same-line generational upgrade, rather it's a generational upgrade and a series upgrade too.

    I firmly believe this was caused by AMD's loss of market share after Maxwell 2.0 brought AMD to its knees and Pascal finished it off. After that happened, Nvidia had an opportunity to sell a differentiating technology that would push AMD even farther behind at the sacrifice of increased die sizes. That's where the Tensor and RT cores come into play. While AMD was down, Nvidia took their market dominance and increased die sizes while also increasing prices to both cover those die sizes and please investors. Revenue went up, but I believe total unit sales went down. That was the one-trick pony of Turing imo, the ability to boost revenue despie poor adoption. I don't think that will happen again.

    Now let's look at the GPU market today:

    Steam hardware surveys show that 25% of gamers are using a Pascal card, with over 10% of all GPUs being GTX 1060s for the past 4 years. That's unprecedented. No other card has ever hit that kind of market share, and I think it's obvious why. There was no upgrade path for the XX60 users that felt like a true upgrade. Nvidia's 2060 was priced like a 2070 should have been, so those users either had to suck it up and spend $150 more than previous to buy a XX60 GPU. Eventually they could opt for a GTX 1660 which brought performance gains at the XX60 price point, but it included no new differentiating features either. Even for the XX70 and XX80 owners, they saw this as an upgrade that wasn't worthwhile.

    And so those Pascal owners didn't buy anything, they just waited. This was true of many people who owned Pascal, and the hardware survey illustrates that. Turing wasn't a worthwhile price/perf upgrade regardless of the new tensor cores or GPU hardware.

    But so what? Nvidia still has market dominance, the Pascal users are even more desperate for an upgrade than for Turing, why not just raise the prices again and get away like a bandit? Well, because they're fighting a battle on two fronts. Pascal is still difficult to beat, and AMD is looking pretty scary right now. Just look at how quickly AMD market perception changed after Ryzen released. On a fraction of the R&D budget of Intel, AMD hammered Intel's lineup. Now look at how Turing disappointed users and left many users hungry for an upgrade.

    Is Nvidia Intel? No. Is Nvidia resting on their laurels? Not at all. But that's not what matters, market perception is what matters. For people eager to upgrade their graphics cards back when Turing was preparing to release, nothing AMD had even remotely came close in terms of rasterization performance or the new shiny features.

    Now realize that many of those people who are disgruntled with Turing's performance, in the meantime, have upgraded their CPUs to AMD from Intel. AMD is driving many computers that used to be Intel, and users are very pleased with the performance uplift.

    The value of that market perception is huge.

    On top of that, the consoles are starting to leak information about RDNA2. It's a fuzzy target, hard to pin down, but ultimately it appears that there's much to be excited about with AMD's new architecture. The fact that no one knows the real performance is big. AMD hasn't revealed their hand. It might be a garbage hand, but it might also be a royal flush.

    So, I believe, this leaves Nvidia in an interesting situation for Ampere pricing out of the gate, because Nvidia has a trap card.

    There's a lot of pascal owners wanting a new card, and at this point they're willing to spend higher sums of money on one now that Turing raised the bar for GPU pricing. The rumors are that Ampere is on Samsung 8nm because TSMC couldn't be haggled down. I think that's probably true, but also it's plausible that Samsung cut Nvidia a sweet deal in terms of price/diesize that gave them their ultimate competitive advantage over AMD: Lowering prices on Ampere relative to Turing.

    If Ampere can launch the 3060 for $300, the 3070 for $450, and the 3080 for $650 all while increasing rasterization performance (even modestly) over Turing, Nvidia will have almost fully secured that upgrade-hungry pascal market without even returning to the previous pricing scheme while simultaneously looking like the good guy for "LoWeRiNg PrIcEs" because Turing's prices are still fresh on the mind. Nvidia could rake in profit and make it very difficult for AMD to seriously compete with RDNA2 because the well of users looking to upgrade would be dry.

    TLDR: Nvidia has to compete with both AMD and with the perception that they've been price gouging. They could easily beat both of those enemies by "lowering prices" by ~50-100 bucks from Turing on launch day. Such a move would greatly boost their market perception and also lower the pool of upgrade-hungry users that might have been tempted to wait for RDNA2.

    submitted by /u/Samura1_I3
    [link] [comments]

    NVIDIA Ampere GA102 “RTX 3090/3080” GPU pictured [Videocardz]

    Posted: 28 Aug 2020 12:16 PM PDT

    TSMC announces its first 3NM AI chip customer – neither Apple nor Huawei (its Graphcore)

    Posted: 28 Aug 2020 03:28 PM PDT

    Sabrent Rocket 4.0 2 TB NVMe M.2 SSD

    Posted: 29 Aug 2020 02:28 AM PDT

    Building an invisible PC [DIY Perks, 27:06]

    Posted: 28 Aug 2020 11:59 AM PDT

    MSI Modern 14 (B4Mx) review – the Ryzen processors are the stars of the show here

    Posted: 28 Aug 2020 05:55 PM PDT

    Marvell Refocuses Thunder Server Platforms Towards Custom Silicon Business

    Posted: 28 Aug 2020 01:54 PM PDT

    [Level1Techs] Dual Xeon Platinums - For Less Than $1K! Amazon's E-Waste! (Intel offroading server CPUs to keep hyperscalar customers)

    Posted: 28 Aug 2020 08:02 AM PDT

    TSMC and Graphcore Prepare for AI Acceleration on 3nm

    Posted: 28 Aug 2020 08:00 AM PDT

    Overclocking a 2080 TI to 2970MHZ by Teclab

    Posted: 28 Aug 2020 10:44 AM PDT

    RTX 3000 Power Connector, Great TSMC 5nm Yields, Nvidia's First PCIe 4.0 GPU

    Posted: 28 Aug 2020 04:22 AM PDT

    Cryorig teases a 'New Generation' of CPU cooler

    Posted: 28 Aug 2020 02:30 AM PDT

    AMD’s B550 at the maximum: Gigabyte B550 Vision D review - a bit special, but also pleasantly different | igor'sLAB

    Posted: 28 Aug 2020 04:21 AM PDT

    No comments:

    Post a Comment