The race is still on. It's just changed.
The transistors are like capacitors. E = energy, C = capacitance, V= voltage.
E= 1/2 C*V^2
As you toggle these transistors faster, the power goes up proportionally.
But to go faster, you need more voltage and is related with a square. So doubling the voltage causes power to go up by a factor of 4!
The die shrinks, from 130 to 90 to 65 to 45 ... causes the C factor to drop. So dropping from 130nm to 65nm the "C" (capacitance) should in theory drop in half. Not totally true, but close enough. But shrinking the die also means the total die area is smaller, so power density goes up and this is a problem. For example, say a 130nm cpu has 2 in^2 area and dissipates 100 watts, thats 50 watts per square in. But now take a 65nm cpu with 1 in^2 area and still at 100 watts. Now the power density is 100 watts per square in. Double!
Now back a few years, they could reasonably extrapolate speed and power for these CMOS cpu. Drop size by 2 (130nm to 65nm) and power dropped by a factor of 2. (equation above). But when they neared the 90nm mark the transistors started leaking more. This drop by factor of 2 didn't hold anymore.
Basically it gets hugely expensive to go faster now. So now they try to do more in parallel, hence dual cores & quad cores and more.
But its still about getting more work done is less time.