You sound like you know what you're talking about, so I will only reply to you and ignore the Apple fanatics.
The fabbing process is, by far, the most important part, and it is also the hardest.
Can you explain why designing is as hard or even as important?
It's not hard to determine the transistor density and the various energy consumptions of certain SoC designs using a particular fabbing process. The problem then boils down to a cost-benefit analysis (How big do you want to make your CPU/GPU cores before your yield rates plummet?). The good fabbing process lets the non-fabbing 'designer' have more options to play with, but at the end of the day, the non-fabbing customer is a cost-benefit decider, not someone who is pushing tech like the fabbing company.
This is absolute insanity.
In 2006 Intel’s fabs crushed AMDs. Yet Opteron blew away Intel’s products. Why?
The trick in processor design is not “determining the transistor density and the various energy consumptions.” It’s figuring out what to do with more than a billion transistors, figuring out what size and shape each of them should have, where each of them should go, and how each of them should be connected to each other. It’s figuring out the path each wire should take, the dimensions of each wire, and which layers to use for each wire.
It takes 2 and a half to 3 years to design a high-end microprocessor. For opteron we had to design the entire 64-bit instruction set, which involves looking closely at operating systems and applications to try and predict where bottlenecks - for software that doesn’t exist yet - would occur.
Then we had to figure out a top-level architecture - how will we support multiple cores down the line, how will the pipelining work, how big will the reservation stations be, what will the branch prediction algorithm be, what will the load store units look like, how wide will the instruction issue be, how big should the caches be, etc. We had to floorplan the chip, figuring out how to position and size the top-level blocks, without yet knowing exactly what circuits would be in them.
We had to work out how to avoid cross-coupling by using power and ground planes, while at the same time designing our interconnect structure in a way that we would have enough routing planes. We had to design the standard cell architecture, determining how tall each cell would be, where the power/ground taps would be, etc. in an effort to optimize for density while, at the same time, allowing for sufficient bypass capacitance and thermal spreading, while obeying the design rules.
We had to work with the fab to engineer the transistor and interconnect performance we needed, and to develop spice models and parasitics models for the transistors and wires. We had to develop an architectural model for the design, and verify that it successfully ran thousands upon thousands of instruction traces - which we first had to develop, because the new ISA had no collection of pre-captured traces. We had to work with operating system vendors and internally to get OS support. For each top-level block, we had to break it down into circuits, determining the location of each transistor, its size, and the interconnections between them.
We had to determine which metal layers each wire use, and the route each wire takes. We had to determine how to route the clock wires, how many clock gates to use, and where to put them. We had to determine where to put repeaters, how many to use, and what size they should be. We had to design tools to determine the speed of each critical path, and to figure out what happens to that speed as we make design changes. We had to verify that the circuits we designed were mathematically equivalent to the architectural model. We had to test to see if we had caused any race conditions that would cause failure.
We had to develop a system to determine if cross-coupling between wires would cause any functional or performance problems, and, if so, we had to figure out how to re-route the wires to compensate. We had to repeat the “analyze-move transistors-move wires” process hundreds of times. We had to determine where the abutment pins between blocks go. If someone had to move a pin on one block, then the neighboring block had to go and adjust a bunch of wires, analyze, potentially move transistors, etc. We had to design each standard cell, both schematically and physically. We had to design custom macro blocks like PLLs and memory structures.
We had to develop FIFO structures for handling communications between clock domains. We had to design a clock deskew scheme. We had to analyze for clock skew and feed that back into our timing simulations, and, if necessary, adjust the circuits again. We had to analyze for electromigration issues, and adjust the circuits again. We had to calculate IR drop on the power rails, and move circuitry around accordingly.
We had to analyze for sufficient bypass capacitance, insert bypass capacitors, and move circuits around accordingly. And every time you move something, it causes a ripple that affects thousands of other wires, all of which has to be re-analyzed, and which usually means you have to repeat the cycle a dozen more times. When determining the circuits, you start with the architect saying “A=B+C,” and you have to design a circuit that takes two 64-bit 2’s complement numbers, adds them to produce a 2’s complement result, and does so within one clock cycle, within a certain power budget, and within an allotted number of square microns on the chip. And you are doing that exercise thousands of times, once for each simple line of architectural code.
And, in the end, this process which doesn’t “push technology” results, every time, in dozens of patents, several academic or conference papers, etc. I was, myself, published in IEEE’s Journal of Solid State Physics, and it was not because what we were doing “wasn’t hard” and “wasn’t pushing the technology.”
It would be a good idea to try and design a real product before commenting on what is hard and what is not.