Basic physics… You switch more transistors in your GPU you pull more wattages and can hit higher TF numbers for both FP16/FP32. This is independent from memory power consumption (which is already very low for GDDR6) or SIMD arch.
- What do you mean with "Apple has a process advantage" and about which processes are you talking about exactly?
- Both AMD and NVIDIA have more experience than Apple when it comes to developing high performance GPU architectures and their power consumptions are tied to their maximum performance and node sizes. They are designed to deliver maximum FP32 performance in the industry, including superior ray-tracing performance (which Apple lacks) and GPU accelerate Tensor cores for ML applications (DLSS for example), INT4/8 operation and Mesh Shading. And all these features are actually accessible to developers, while Apple is not capable in any of these techniques, yet…
Ray-tracing is supported in Metal, but maybe it's not as performant as what AMD and Nvidia offer.
Metal for Accelerating Ray Tracing

Accelerating ray tracing using Metal | Apple Developer Documentation
Implement ray-traced rendering using GPU-based parallel processing.