A lot of the underlying ideas I agree with here. But not some of the actual arguments used. So, at the danger of being labeled a troll, here's some comments (and don't take these personally - they're comments thrown to all the readers):
Originally posted by alex_ant
The point still remains that Apple is asking developers to re-write however much of their code
Much is the wrong word. The rewrite - ie away from C (or whatever) and to something to get benefit from AltiVec - is for the purposes of performance. It is extremely rare of performance bottlenecks to be anything other hotspots - the old 90% of the time in 10% of the code adage. There's not much to do for a word processor, other than leaning heavily on class libraries that Apple has spent lots of time working on overall (one hopes). But anything that could be considered to be performing
data processing (think anything with audio, video, graphics, FP data sets) is likely to be extremely amenable to optimisations in small,
key locations.
So I agree with the sentiment that it's a shame that you have to recode some things, but thing the balance isn't quite right here.
for this oddball AltiVec thing that nobody else uses.
It's probably as oddball as MMX, etc. In fact it's not at all oddball if you consider some of the processors outside the "general processor" category. Large register, split carry, repeated operation.... TI had a graphics co-proc doing this a good decade ago (340 series I think). And it's not that odd if you venture in to the world of DSPs, differing word sizes, true Harvard architecture, etc. I used an Analog Devices SHARC DSP a few years ago - data items in registers manipulated as 16, 32, or 48 bits, with data memory banked into units where an incremement of 1 moved up by one of these units (ie there was simply no concept of byte addressing). Oh, and that was VLIW too.
If you've just poured months into a large program, and have some respect for what you're doing, then
1) you should know where the bottlenecks are.
2) you should already have tried to isolate the key routines to make later optimisations easy.
3) the chance to achieve a 10x speed gain is almost irresistable
4) unless you're a VB coder who doesn't know what an interrupt or scalable algorithm is, you'll find ways to optimise if they exist.
If it were so easy, all Mac developers would have done it by now.
Here we start getting close to the real nub - is it really a case of being easy versus difficult, or actually that AltiVec won't make much difference? For comparitively simple operations over large data sets, there are benefits from AltiVec (and the like). But more general computing (let's say a Java VM just to choose a random example) is probably more just data bandwidth bound than anything else. In such situations, there will be no AltiVec optimisations, but because they're not going to contribute much versus being too difficult.
Until all desktop processors have similarly functioning vector processors onboard, or until whichever OS X compiler is capable of useful auto-vectorization, developers - if they want to involve themselves with the Mac - are forced into doing something they shouldn't have to do.
So I don't quite agree with this.
...
But that means less when the x86 world is much faster by default than the Mac world. A developer today has a choice: 1) writing standard cross-platform code that will run pretty well by default on the platform that 95% of the desktop market uses,
This 95% figure is interesting. We've been reading it quite happily for a while now, but maybe it's worth considering what it means.
Does Intel have 95% market share - no AMD ensures that it so.
Does Microsoft have 95% market share - yes (let's assume this at least for the sake of the argument).
Okay - what has Microsoft got 95% market share with? Their operating system
s (note the plural). Even this 95% doesn't mean any one of 95, 98, 2000, XP, etc, has 95% market share.
Applications - Let's guess IE is the most used piece of code MS produce. This still probably has less than 95% share - not all MS users are necessarily using the 'net and some (not many!) are using altnerative browsers.
Office is probably MS's biggest
selling application. This sure as hell hasn't got 100% of this 95% to itself.
In other words, this is classic lies, damned lies and statistics. The potential market share for a PC developer may be 95%, but this isn't a realistic figure for even MS, let alone more "lowly" developers.
It all comes down to market segmentation. Adobe may say that PC's represent about 20 times the market as Macs for Photoshop, but I imagine that the Mac community is more focused and would expect a smaller sales ratio than 20:1 in reality. (Can any one be bothered to trawl Adobe's SEC filings and pull out whatever figures they give?)
If we start talking "esoteric" applications, such as professional design (print, video, etc), then this 95% is probably prime MS FUD. Sales to Mac users might actually be higher!
[/b]
and 2) writing the same code and then combing through however much of it by hand to optimize it
the "by hand" bit is probably wrong. If you don't already know where its slow (eg writing a video application - not difficult to guess!) there are tools that will make this a fairly trivial process.
If Apple continues what they're doing now, they will only fall further and further behind the rest of the world. Although improving the compiler is great, and it may be their only short-term option, they need faster chips in order to stay competitive, and those chips have to be able to achieve good performance without significant developer effort.
Alex
This is where I really get closest to agreeing. But, as that wouldn't be providing troll fodder (not my intention, honest), consider this.
Let's assume we've factored out where AltiVec is relevant (possibly including compilers, but let's not go there).
A lot of the messages here recently seem to have focused very much on clock rates. Some have cited Power4 and derivates, but almost always in an off-hand fashion. But ask the question, what actuallty needs to be faster?
Much as I find it strange to say this, I think the BareFeats numbers might be relevant here. Combine the apparent oddities of these figures with the white paper about G4 upgrades and cache performance, and I think there's something lurking here (not everyone is going to be surprised by this). I think G4 at present is not MHz bound but MB/S bound - ie improving memory bandwidth might have a lot more effect that increasing clock rates or achieving higher IPC/superscalar performance.
Keeping everything about a G4 the same apart from doubling the speed of the memory bus and performing some ripple through on timings might well be a considerably cheaper way of getting better performance than things like Power4Lite.
But if this hypothesis is true, why hasn't it happened? I've no answer to that, other than it's far more in Motorola's (and maybe IBM's) hands than Apple's.
Phew - rant over.