AMD Demonstrates Stacked 3D V-Cache Technology: 192 MB at 2 TB/sec

1 : Anonymous2021/06/01 12:02 ID: npsdrv
AMD Demonstrates Stacked 3D V-Cache Technology: 192 MB at 2 TB/sec
2 : Anonymous2021/06/01 16:22 ID: h07ieud

This may be why they waited to put RDNA in APUs, so they could have infinity cache.

ID: h07rxd1

Yea it's a shame tho. I really wanted to get a 5000 series APU in phones it would have rdna graphics, but I'll wait until 6000 series APUs

ID: h07tz9j

Samsung Exynos will have RDNA2 in phones.

ID: h08qyvn

Isn't Rembrandt only supposed to have 12 CUs though?

I would think if they were planning this all along, their roadmap would have planned for the APUs to have a little bit more GPU horsepower than that.

ID: h08vv9y

There's also die area to consider. If they're going to make a die comparable in size to a dGPU, might as well just make a dGPU.

3 : Anonymous2021/06/01 17:38 ID: h07t0jx

The end-note updates in the article provide answers to almost every question I had about this new tech:

This technology will be productized with 7nm Zen 3-based Ryzen processors. Nothing was said about EPYC. Those processors will start production at the end of the year. No comment on availability, although Q1 2022 would fit into AMD's regular cadence. This V-Cache chiplet is 64 MB of additional L3, with no stepped penalty on latency. The V-Cache is address striped with the normal L3 and can be powered down when not in use. The V-Cache sits on the same power plane as the regular L3. The processor with V-Cache is the same z-height as current Zen 3 products - both the core chiplet and the V-Cache are thinned to have an equal z-height as the IOD die for seamless integration As the V-Cache is built over the L3 cache on the main CCX, it doesn't sit over any of the hotspots created by the cores and so thermal considerations are less of an issue. The support silicon above the cores is designed to be thermally efficient. The V-Cache is a single 64 MB die, and is relatively denser than the normal L3 because it uses SRAM-optimized libraries of TSMC's 7nm process, AMD knows that TSMC can do multiple stacked dies, however AMD is only talking about a 1-High stack at this time which it will bring to market.
ID: h081ilw

also they weren't kidding about it being on high end chips. at 36mm2 the extra cache will be almost half the size of an entire zen 3 CCX, so don't expect this across the entire product stack as its going to require a lot of extra wafers.

ID: h08dwzx

Acutally they will get about 1600 dies per wafer which means they could outfit 800 5950x packages per wafer. By contrast you only get about 750 full Zen 3 CCDs per wafer.

So on a performance per silicon basis... you are getting an extra 15% IPC for an extra 50% more silicon (that has higher yields than a CCD). And maybe even greater multithreading performance uplift.... It would at most increase costs to manufacture by about 50%. We can also assume that 7nm costs are going down at the same time a little.

ID: h08coev

The processor with V-Cache is the same z-height as current Zen 3 products - both the core chiplet and the V-Cache are thinned to have an equal z-height as the IOD die for seamless integration

That was not what they said... it is stacked with dummy silicon next to it over the cores.

They didn't say it would be the same height as current CPUs at all only that it would be level.

ID: h08wqce

I'm afraid you have it wrong according to AMDs response to the Anandtech article. The CCDs are made thinner to accommodate the new V-Cache, and the IOD is the same height as it's always been. So CCD + V-Cache = IOD height.

The dummy silicon, like you said, is for levelling parts of the CCD that aren't stacked with V-cache, so that the heat from the logic part of the CCD transfers efficiently to the IHS.

4 : Anonymous2021/06/01 20:51 ID: h08iu9b

Could you imagine say a 1gb of vram litterally stacked on top of the APU that could be accessed much like a shared cache for the CCD/GPU to use as it needs, with this level of performance, i mean 2TB/s is kind of insane. APU wouldn't be starved nearly as badly.

5 : Anonymous2021/06/01 12:46 ID: h06rb6k

Told you so.

This is probably not going to stay at this stage. I'd expect them to incorporate more than simple SRAM and while at it, why not on both CCX chiplets ?

This is what is probably going to make dGPU system in notebooks largely obsolete.

ID: h07a6ur

If you read the article, you'd see it says they put the conventional CCD on the demo CPU for comparison purposes.

The CPU they used for benchmarks has the extra cache on both CCDs.

ID: h070896

why not on both CCX halves ?

What do you mean by "CCX halves"?

ID: h079u46

Sadly, they're not going in that direction. Like, still the new APU using Vega. And this APU design also used for mobile. Not sure at what time they will start tu use chiplet design for their APU.

6 : Anonymous2021/06/01 14:35 ID: h0743rw

I'm worried about the difference of the height on the bigger L3 chip.

The issues had the Vega cards with the GPU chip been taller than the memory module ( orthe other way around, I forgot it), which leads to bad temperatures.

ID: h07dazf

They explicitly talked about how they shaved down the old package to retain the height.

ID: h07exj0

My bad, didn't catch it.

Atleast they learned from the past mistake.

But I will wait for GM test, if they really did a good job.

7 : Anonymous2021/06/01 22:54 ID: h08ycn5

I'm getting apple M1 vibes of quicker access shared resources.

My laptop is from 2018 and while it runs great. The perf increases since then havent been enough to transact a whole lot of cash.

This looks to be that extra edge especially in the APU/GPU space. I know they gave gamer benchmarks but I'm a content creator so I'd be hopeful (although I doubt Adobe ever would) utilise this tech moving forward

8 : Anonymous2021/06/01 23:06 ID: h08zq4i

...Does it make a difference that the two SRAMs are different sizes? The Zen3 CCD being 32MB with the V-Cache die being 64MB. Wouldn't that cause a scenario like running an 8GB stick of DDR4 with a 4GB one in dual-channel? Or like the Xbox Series S and Series X?

Would using 2 or 3 32MB SRAM dies and stacking them in layers like HBM have been better?

ID: h092njt

Not really. I mean, I'm not sure how they expose it and how they handle the accesses, but generally all caches are divided up into cache lines (and a few other things) so if a set cache line is, say, 64 bytes, then the bigger V-Cache will just have more lines.

The actual management of the cache tags and so on will probably be handled behind the scenes.


Notify of
Inline Feedbacks
View all comments
Would love your thoughts, please comment.x