RDNA2 Navi 22 annotation (from FritzchensFritz and Nemez)

1 ： Anonymous：2021/11/25 20:16 ID: r25ew2: RDNA2 Navi 22 annotation (from FritzchensFritz and Nemez)
2 ： Anonymous：2021/11/25 20:40 ID: hm2rc36: 4MB blocks of Infinity Cache are just ~2.35mm2 each , so almost twice as dense, compared to L3 on Zen3 chiplet. 96MB here take up only ~56mm2.

ID: hm30089

About in line in terms of density with V-Cache dies based off of everything we know.

Not a surprise at all at the end of the day. Regular Zen3's L3 contains:

The cache tags and logic to handle far more cache than is actually possible to get on die.

All of the components for AMD's core-to-core interconnect etc

Remove the both of those and you can save a tonne of space.

But still, really cool to see, especially because of how simple it clearly was to implement. It's the same IP utilised multiple times in the same die, with two connected to every IMC from the looks of it. Simple, but rather effective.

ID: hm5dn9m

Not only that but the GPUs don't need to clock as high as CPUs. This allows for a denser cache design aswell.

ID: hm5dj7y

also the V cache uses denser libraries compared to the Zen 3 CCD that used HP library (to enable higher clocks by distancing each transistor better and less thermal density in the core)

GPUs dont clock anywhere as high as CPUs so AMD might used denser libraries as well
3 ： Anonymous：2021/11/26 01:15 ID: hm3ojoy: Amazing how little space the actual compute takes up.

We definitely need major innovations in memory and cache.

ID: hm46c12

What's most crazy to me is in CPUs how little space the ALU takes up. Even within the core itself, the fpu takes up like 8x the space and is used far less for most workloads. The instructions cache, data cache, scheduler, decode, load/store, and of course branch predictor are all far larger than the heart of the cpu. And then of course those cores in total take up only as much space as the L3 cache

ID: hm545rp

feeding data efficiently to the number crunching machine is harder than the actual number crunching, and for the looks of it, by a large margin

ID: hm6vkfv

Is there a picture where they did the same thing to CPUs to represent what you're talking about?

ID: hm4nfgo

Why do you think AMD is doing V-Cache? Eventually cache, I/O, etc. will be on different dies from compute.

ID: hm4s210

I doubt that AMD will split the cache to a different die, but I/O I can imagine.

ID: hm5nw9x

Makes sense though. The compute units can perform trillions of operations per second but data cannot be moved in and out fast enough.

ID: hm5vo4y

Is it, though? Looks like it's around 30% of the die to me. 2 Shader Engines containing 20 WGPs total. Only 4 highlighted here.
4 ： Anonymous：2021/11/25 20:17 ID: hm2o57d: The image source for these is from Fritzchens Fritz :

https://twitter.com/FritzchensFritz

and the annotation itself is from Nemez:

https://twitter.com/GPUsAreMagic

ID: hm30tff

Cool, thanks!
5 ： Anonymous：2021/11/26 07:40 ID: hm4rk8l: No wonder they're trying to move the large cache off the main chiplet/process, it takes up a fuckton of room.

ID: hm4tmpf

On the bright side - Caches tend to be tough so while it adds to the die size, those regions are generally rarely defective. Well, more rarely that is.
6 ： Anonymous：2021/11/26 06:51 ID: hm4nj36: Lmao L3 cash

ID: hm5mgoe

Yeah it's a common notation in the industry, using $ for cache

ID: hm4tn6q

Game Cache* 😛
7 ： Anonymous：2021/11/26 04:27 ID: hm4a1em: looks like a factorio megafactory
8 ： Anonymous：2021/11/26 12:44 ID: hm5ds7x: I like how the use $ instead of cache, funny
9 ： Anonymous：2021/11/26 03:57 ID: hm46rej: Crazy to me how much fixed function/specialized logic there is here. In cpu die shots everything is either interconnect or supporting the ALU and FPU, where as this you have geometry processors, media processors, rasterizers, etc. I wonder what it looked like in the days before programmable shaders.
10 ： Anonymous：2021/11/26 05:28 ID: hm4g2ja: Maybe I don't understand architecture, but why does this not align with this? (Rotate that 90 degree counterclockwise)

Those tiny things labeled WGP are the "work groups" here? They make up like half the die in the other picture.

ID: hm53s6v

It actually does line a very well. This diagram is more detailed in what it outlines and also shows the internal grouping:
- Each Navi22 die has 2 Shader Engines - Each Shader Engine has 2 Shader Arrays (4 in total) - Each Shader Array has 5 WGPs (20 in total) - Each WGP counts as 2 CUs (40 in total) - Each WGP is made up of a number of smaller components (TMU, RA, Cache, SP, CU, etc...)

ID: hm5s4g1

Oh, I think I misunderstood the diagram. It's not saying only the purple is the shader array, it's showing all the internals of that shader array. Going form it's breaking it up more and more. I thought the WGP area was really small, but it's only showing a couple of them.

ID: hm4n3ff

Those tiny things labeled WGP are the "work groups" here?

Yes. Work group processor.

ID: hm52ozs

[deleted]

ID: hm5swzy

I think it's pretty accurate now that I know how to read this post. It's only showing 4 work groups out of 20. I thought that area seemed to small, but that's because they are also found in the shader array, and which is found in the shader engine. So it's really just digging in deeper, and separating parts out more further down. Seems like a 1:1 picture now knowing that.
11 ： Anonymous：2021/11/26 14:56 ID: hm5rcbf: What are those 2 unmarked spots on the left?

引用元:https://www.reddit.com/r/Amd/comments/r25ew2/rdna2_navi_22_annotation_from_fritzchensfritz_and/