I'd love to learn a bit more about if you have any references! I've always been interested in graphics programming but have spent most of my time on the CPU side of things
This is indeed a fascinated topic! I learned most of what I know from this very well written blog series:
Rys looks at our PowerVR graphics architecture and describes how Tile-Based Rendering (TBR) works in practice.
www.imaginationtech.com
The deferred rendering technique in PowerVR GPUs takes the information generated by the tiler to defer the rendering of subsequently generated pixels.
www.imaginationtech.com
It is worth noting that Apple likely adapted their TBDR hardware from Imagination directly (Apple's shading backend is entirely different though).
@mr_roboto already summed it up in a very neat way. I would only add that tile based rendering as most modern GPUs do is a caching optimization (by processing triangles in a close proximity to each other it's more likely that you will fetch texture data that is closely laid out in the memory), but triangles are still rasterized and shaded immediately. The big difference to TBDR GPUs (currently only Apple and IMG) is that they will first rasterize every single triangle in the tile and only then do shading. This means that every visible pixel is shaded exactly once. While the result is the same, the underlaying implications are actually very significant: TBDR GPUs can dispatch shading work in regular grids of 32x32 pixels (always doing 1024 pixels per shader invocation), while immediate renderers have to dispatch shaders for each triangle individually (they usually side 8x4 groups of pixels at once, which creates inefficiencies at triangle edges). Also, the TBDR model offers guarantees that the immediate model simply can not. For example:
- with TBDR you know that you "own" the pixel as no other shader invocation be computing the value of that pixel at the same time (in immediate rendering there can be data races, e.g. two overlapping triangles may be shaded simultaneously). This allows you to do things like race-free framebuffer reads that are key to programmable blending and other advanced techniques
- with TBDR you know that all other triangles in the same tile have already been rasterized, this allows you to deterministically track the state of multiple pixels at once. This again allowed Apple to do implement some really cool stuff like persistent cache between shader invocations (multiple shaders can work on the data in sequence, something that no other GPU supports) which is the key to performing some advanced effects without ever leaving the GPU cache
I'd say it's really about both. A lot of the memory bandwidth savings occur because TBDR never shades fully occluded pixels, meaning it doesn't have to fetch texture data for that pixel. But... it also doesn't have to run the shader program against that pixel, so it's saving both at the same time.
You are right of course. I was stressing the bandwidth issue since Apple GPUs are much more limited in this area. TBDR allows them to fetch only what is needed, saving precious bandwidth.
Of course, the flip side of that is that TBDR is weak when rendering transparency effects, especially if application devs don't take care to optimize their rendering pipeline the best way to handle transparency on TBDR GPUs.
I don't thin it's weak per se, it just won't be more efficient (it will have to shade transparent triangles immediately, just like an IMR would). There is still some potential for efficiency wins (e.g. if you have multiple non-overlapping transparent triangles in a tile). I suspect that the GPU will flush the tile when a newly rasterized transparent pixel hits a not-yet-shaded transparent pixel. So basically, if you have a lot of transparent pixels with plenty overdraw, TBDR performance will tank due to constant flushing. But so will IMR performance.
Transparency is a big problem anyway, and if I remember correctly the optimization tips for TBDR and IMR are identical: draw transparent objects last (you have to draw them last and sorted anyway if you want correct results). And then again there is order independent transparency which again can be done more efficiently on Apple GPUs.
Nvidia's GPU since Maxwell do use tile based rendering. As of now, pretty much every architecture out there seems to be using tile based rasterization. ARM Mali, Qualcomm Adreno, AMD, Intel, and Nvidia all uses tile based renderer lol.
Not sure about others, but Nvidia's approach is close to the TBDR since Maxwell.
It is not close, since it's not deferred. Saying that they are close because they both use tiles is like saying that a hybrid is basically the same as a full electric car
Using tiles makes sure that the modern GPU will on average fetch texture data from similar locations, which massively improves caching and is the key for the large performance and efficiency increased of Maxwell and Navi. But true TBDR is much more difficult to achieve because there are just so many edge cases... Imagination was basically the only company to ever do it, and Apple bought the tech from them.
And regarding most mobile GPUs (Mali, Adreno), these are simply atrocious. They cut a lot of corners in order to achieve higher efficiency (e.g. shader splitting) which results in inconsistent programming models. They don't scale with complex geometries since they don't use deferred shading. But it's good enough for lower quality games on mobile that don't try to do anything ambitious.