Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.
It's extremely rare in the big data world to have data stored locally rather than in the cloud. What work load are you referring to? Examples?

If you have any cloud setup, you usually work with data that is already stored in the cloud, ie via AWS S3 or some kind of database.

Science simulation is one area that I believe is extremely cost prohibitive to do serious work on locally. The forum members here are talking about spending $40k to get a 1TB Mac Pro... You can rent an AWS server with 24TB of RAM and 448 CPU cores for a fraction of the cost by the hour.

It makes zero sense to buy a $40k Mac Pro to do science simulation on. Period. No one here has come out and claim they use 1TB of RAM on a Mac Pro to do real science simulations. I don't expect anyone to.
I am talking about chip design. Terabytes of data. If you want to keep all this data in the cloud it'll cost a lot and all you design flows must be in the cloud. It may happen some day, it's not practical or cost efficient now.
Claiming that cloud is cheaper than buying hardware without any specifics is pointless. If you know that you have enough computing needs to load up given number of computers without complicated load management (a lot of work one day and not so much the next day) owning hardware will always cost less otherwise cloud vendors would not make any profits.
 
This is what I've been arguing for the whole time.

It makes little economic sense to buy a 1TB RAM Mac Pro for local work that can be done faster and cheaper via the cloud.
How did you manage to interpret it like that?

My point was that there is a clear productivity advantage from using a workstation instead of the cloud. However, due to Intel's and AMD's product segmentation, there is a high fixed cost in buying a workstation. But if you buy one, adding more RAM is cheap. Cloud pricing is the opposite: fixed costs are low, but adding RAM is expensive.

Effectively, that means that a 512 GB Mac Pro is expensive, while an 1 TB Mac Pro is cheap. If the natural size of your target environment is 512 GB, using the cloud makes economic sense, at least in the kind of work I do. But if the natural size is 1 TB, buying a workstation could be cost-effective.

High-memory workstations are useful in niche applications, but their cost-effectiveness depends on arbitrary decisions made by various businesses. Some time ago, AMD still made Threadrippers, which made the fixed costs for workstations with up to 256 GB RAM lower than they are today. But today that intermediate option between consumer and workstation chips no longer exists.
 
Last edited:
Science simulation is one area that I believe is extremely cost prohibitive to do serious work on locally. The forum members here are talking about spending $40k to get a 1TB Mac Pro...
The 1 TB Mac Pro is more like $15k to $20k, depending on the other components. Nobody pays workstation manufacturer's list prices for RAM at this scale. They either get a steep discount or install third-party modules.
 
I am talking about chip design. Terabytes of data. If you want to keep all this data in the cloud it'll cost a lot and all you design flows must be in the cloud. It may happen some day, it's not practical or cost efficient now.
Claiming that cloud is cheaper than buying hardware without any specifics is pointless. If you know that you have enough computing needs to load up given number of computers without complicated load management (a lot of work one day and not so much the next day) owning hardware will always cost less otherwise cloud vendors would not make any profits.
We're moving really far away from the original goal post which is "it makes no sense to buy a 1TB Mac Pro to do science simulations".

So far, I'm correct. No one here has come out supporting it. Everyone who has tried has either moved the goal post or simply stated something like "just because you don't do science simulations on a 1TB Mac Pro, doesn't mean others don't".
 
The 1 TB Mac Pro is more like $15k to $20k, depending on the other components. Nobody pays workstation manufacturer's list prices for RAM at this scale. They either get a steep discount or install third-party modules.
Companies that actually have a use for 1TB workstations will not buy 1TB of RAM from Amazon for $7000. They'll pay the manufacturer or in this case, likely middlemen like CDW. So yes, it's likely that companies will pay the full or near full price for a workstation like this.
 
I am talking about chip design. Terabytes of data. If you want to keep all this data in the cloud it'll cost a lot and all you design flows must be in the cloud. It may happen some day, it's not practical or cost efficient now.
Claiming that cloud is cheaper than buying hardware without any specifics is pointless. If you know that you have enough computing needs to load up given number of computers without complicated load management (a lot of work one day and not so much the next day) owning hardware will always cost less otherwise cloud vendors would not make any profits.
It cost very little to store terabytes of data in the cloud. On Amazon, the basic S3 tier (without any discounts), costs $23/tb/month.
 
We're moving really far away from the original goal post which is "it makes no sense to buy a 1TB Mac Pro to do science simulations".

So far, I'm correct. No one here has come out supporting it. Everyone who has tried has either moved the goal post or simply stated something like "just because you don't do science simulations on a 1TB Mac Pro, doesn't mean others don't".
Your suggestion to use Linux or Windows machines in the cloud instead of Macs just proves that the Macs are inadequate for serious computing. I guess, the fact that people stopped using Mac Pros in favor of other platforms confirms it.
 
Central I/O die with all four edges as interconnects for four SoCs, so a cross-formation...

This leaves three sides of each SoC open for connections, but...

Only one side of each SoC has a fully exposed edge, where the other two sides share a quadrant with a neighboring SoC...
Yes, exactly, a cross, but that doesn't leave three sides open! Two sides of each are taken by RAM (remember, this isn't an EPYC chip, and the RAM isn't hanging off the I/O die - or at least the first 384GB isn't). And the other sides are for I/O. Just like the current Ultras.
 
Central I/O die with all four edges as interconnects for four SoCs, so a cross-formation...

This leaves three sides of each SoC open for connections, but...

Only one side of each SoC has a fully exposed edge, where the other two sides share a quadrant with a neighboring SoC...
Yes, exactly, a cross, but that doesn't leave three sides open! Two sides of each are taken by RAM (remember, this isn't an EPYC chip, and the RAM isn't hanging off the I/O die - or at least the first 384GB isn't). And the other sides are for I/O. Just like the current Ultras.

Three sides open for things like RAM or system I/O is what I mean...

The outermost edges of the dies in this cross formation are wide open, but the inner edges are at right angles to each other, therefore they are sharing space in the same quadrant, so less room for whatever is connected there...

Current M1 Ultra SoC has all outward facing edges for connections, cross formation would not...
 
It cost very little to store terabytes of data in the cloud. On Amazon, the basic S3 tier (without any discounts), costs $23/tb/month.
This means that you are paying as much per year as buying an SSD of that capacity - very expensive storage year on year.
 
This means that you are paying as much per year as buying an SSD of that capacity - very expensive storage year on year.
You are comparing the price of an SSD to the price of data stored in a data center with proper backup infrastructure. Don't think it works like that. Well, if the data is deemed safe to be stored in a local SSD, I guess it's not important enough to be worked on in a production environment?
 
Three sides open for things like RAM or system I/O is what I mean...

The outermost edges of the dies in this cross formation are wide open, but the inner edges are at right angles to each other, therefore they are sharing space in the same quadrant, so less room for whatever is connected there...

Current M1 Ultra SoC has all outward facing edges for connections, cross formation would not...
Oh, I see what you mean. Yes, obviously, and this would definitely be an issue they'd need to engineer around. Possibly even a very difficult issue, that would require relayout to put RAM traces at the outsides of the cross. (Or even one that would render this particular solution a nonstarter. Though I doubt it would be that bad.)

This illustrates what I've been saying all along - there are lots of potential solutions, but all of them involve either substantial engineering and/or tradeoffs.
 
Your suggestion to use Linux or Windows machines in the cloud instead of Macs just proves that the Macs are inadequate for serious computing. I guess, the fact that people stopped using Mac Pros in favor of other platforms confirms it.
Macs haven't been or have never been adequate for "serious computing". High compute, high memory or high availability workloads have been done on linux and in the cloud for at least 15 years now.

Also, I never suggested to use Windows.
 
It makes little economic sense to buy a 1TB RAM Mac Pro for local work that can be done faster and cheaper via the cloud.

I'm going to mark you as someone who agrees with me.
You are responding to somebody saying this:
sometimes it's cheaper or more productive to do things locally
So no, they are not agreeing with you.
 
Companies that actually have a use for 1TB workstations will not buy 1TB of RAM from Amazon for $7000. They'll pay the manufacturer or in this case, likely middlemen like CDW. So yes, it's likely that companies will pay the full or near full price for a workstation like this.
I've never worked in an organization with excess money like that. In every place I've worked at, the IT took advantage of such trivial ways of saving nontrivial money.

And in many cases, though probably not with Apple, only ignorant people pay list prices. For large purchases, it's more common to negotiate the price. You end up paying competitive prices for normal products (like a bunch of cluster nodes), and then you get a steep discount for special products (like individual high-memory nodes). I have forgotten the numbers, but I remember being surprised how cheap an 1.5 TB server was back in 2011.
 
Care to explain more? Please don't just give one or a few niche edge cases and then justify the main argument that Apple should use AMD + Nvidia chips for the Mac Pro.
You're mixing people up. I have not advocated for Apple using AMD or NVidia GPUs in the Apple Silicon Mac Pro.

Why would someone run science simulations on a $40k Mac Pro with 1TB instead of renting a cloud cluster? Or renting a government data center?
Take your pick:

- they're doing compute on large locally-generated datasets, and it is expensive and slow to upload them to the cloud (I have personally seen this)

- they use lots of cloud compute and storage, get the bills, and over time realize that buying their own hardware would have saved them lots of money (I have personally seen this)

- science runs on grant money, and when they get a lot of cash to spend on a research program, lots of scientists love to buy themselves a flashy computer to run their simulations on, and what's flashier than Apple hardware? (I have personally seen this)

There are many other reasons. Some of them have nothing to do with costs. Are you processing legally sensitive data, e.g. patient X-rays? Cloud is probably a no-go.

This is what I've been arguing for the whole time.

It makes little economic sense to buy a 1TB RAM Mac Pro for local work that can be done faster and cheaper via the cloud.
You're out of touch if you think cloud is guaranteed cheaper.

Cloud compute is something which makes less sense the more you use it. The baseline cloud business model which kicked everything off was to serve customers who wanted to run services like internet storefronts, but didn't want to pay the full price of a colocated web server. Since the machine resources for this kind of thing amount to a fraction of a modern server, it makes sense to use virtualization to rent out fractions of a machine. (For the same reason, there are IT-for-hire companies out there which effectively rent fractions of their full-time IT staff to operate cloud-hosted server instances.)

Customers who need only fractions of a computer do actually pay far less than they would to buy and support their own hardware. That doesn't mean they're paying a fair price! Cloud service providers take a lot of profit. AWS is somewhere around 75% of Amazon's profits despite being a tiny fraction of their revenue. As you start using more and more cloud compute, you will rapidly approach and pass the threshold where it is cheaper to own your hardware, even if you have to hire a full-time IT department. Cloud costs can be shocking to those who have completely bought into cloud hype, as it seems you have.
 
Last edited:
From the side-by-side picture Apple displayed in the M2 Pro/Max product introduction video the M2 Max is substantively bigger than the M1 Max is. Similar bloat that the M1 -> M2 only 3x bigger. Apple has gone in the complete opposite direction of a die shrink here.

M1 Max was a "too chunky" chiplet. This is even worse (in a scale past 2 die context). It makes sense for the laptops (and Mini and iPad-on-stick iMac). They are monolithic only solutions. But if that is level of "chunky" they were trying to do with 2 die and 4 die solutions , then it is not surprising the 4 die one was a 'miss' and got canceled. However, it also really doesn't make much sense to over couple the upper 'half' of the desktop line up to a laptop optimized die either.

The M2 Max die shot didn't have a UltraFusion connector in it at all. Apple as re-photoshopped the UltraFusion connector off the M1 Max side-by-side photo also. Could be introducing notion that their is a laptop "Max" that doesn't have a UltraFusion connector. ( that's too makes lots of sense of laptop 'Max' Mac sales much greater than desktop 'Max' sales. Potentially millions of connectors that don't connect to anything. Even more so when the die is in 'bloat' mode anyway. take that UltraFusion connector space and allocate it all to on-die logic ) . Or it is another "delight and surprise" move with photoshopped images.

M2 & M1 Pro has same memory bandwidth and max capacity. [ perhaps supply chain problems. ]

M2 Max same memory bandwidth, but does get 4 * 24 ( 96GB) max but only if buy the largest GPU core count ( $200 GPU core bump + $800 jump to 96GB). [ again perhaps supply chain problems. More expensive it is usually maps to fewer people buying it. ]
Yeah in a nutshell looks like a stopgap upgrade. I mean it's really nice for those who wanted a new computer TODAY. I'm better off just replacing my 88% capacity on my M1 Pro.

Definitely looking forward to the M3 Pro and Max, and even more so if OLED screens are bundled in.
 
Macs haven't been or have never been adequate for "serious computing".
I recall Apple revolutionising the serious computing market in the 80's ... it certainly changed serious industries when we were suddenly able to do prepress and desktop publishing work digitally instead of dealing with the notorious International Typographical Union workers who ruled everything that needed to go to print. The union existed from 1852 to 1986. They only survived two years after the Apple Macintosh were released with support for PostScript.
 
I was going to ask here if Apple Silicon uses micro-ops, but I found this informative article for anyone else that was curious.

This is a very fraught issue that leads to a lot of confusion.

ARM instructions are built to accomplish as much as practicable in each op. There are situations where one ARM op performs what would be the equivalent of as many as six x86 instructions, and does it in a single clock cycle, with one μop. This is real-world useful functionality. In some cases, an ARM operation is simply a multifunction instruction with part of its effect bypassed – compare, for instance, is just subtract with its result sent to an inactive register, and register-move is just a shift operation with a shift count of zero (though, unlike x86, register-move is a rarely-needed instruction).

The reality is that most modern compilers generate optimal code, which means that the lion's share of x86 object code resolves to a single μop per instruction. Many of the multi-μop instructions are rarely used, which is also the case with most ARM code. But there are very few ARM instructions that would actually have to be split into μops – a handful of "atomics", which are special operations that get occasional use.
 
There are situations where one ARM op performs what would be the equivalent of as many as six x86 instructions, and does it in a single clock cycle, with one μop.
Can you give an example? I thought it was the other way around.
 
> Can you give an example? I thought it was the other way around.

There's a perfect example to @Xiao_Xi's point (not @Sydde's point) in x86, used in simdjson. An integer instruction searches for some inner set of zeroes, surrounded by two one bits. On the x86 "complex" instruction set it's much faster, but on ARM it takes numerous "simple" instructions. Apple also has humongous instruction caches (192 KB) because ARM instructions have greater binary size.

One benefit of ARM though, it's easier to make out-of-order. All instructions are 4 bytes, so you can scan a massive reorder buffer - without having to pseudo-sequentially check where one instruction starts, one ends. On their GPU, instructions are variable, often 8 bytes, and instruction cache is 12 KB. Much smaller, but the GPU doesn't have to perform out-of-order as much (only dual-dispatching which I haven't confirmed). Also many instructions can get quite large, such as ICMPSEL32 which has 5 operands. It's the only way to perform 2 integer instructions per clock per ALU, just like FMA does for floats. The instruction is 10 bytes.

Code:
ICMPSEL, FCMPSEL (1 cycle, 80-bit instruction)

Input operands: A, B, X, Y
Output operands: D

D = select(X, Y, A > B); // other compare ops valid too

FFMA (1 cycle, 64-bit instruction)

Input operands: A, B, C
Output operands: D

D = A * B + C;
 
Last edited:
Can you give an example? I thought it was the other way around.
One example would be something like
EOR R8, R9, R7, ASR#34

To fully replicate that operation on x86, you would have to do something like
push r7
asr r7,#34
mov r8, r9
eor r8, r7
pop r7

This kind of difference pervades the ARM instruction set, in part because all ARM instructions are based on the a = b + c principle, while x86 is based on a = a + b. This means that while register-move instruction forms do exist in ARM, they are very lightly used because they are rarely needed: you put your result right where you need it, and the behavior of 30 general registers and 32 FP/Vector registers is identical (there are only 2 special purpose registers).

The operation of the above instruction is 1 cycle. The shift and the exclusive-or flow through the math unit together, and the result writeback is a natural consequence of the operation. There must a fair fraction of code that bypasses the full capability of a given instruction (e.g., setting the above shift count to zero), but the cost of doing that is negligible. By contrast, the structure of an x86 instruction is discrete: you either use the whole instruction or you do something else, but there is typically not a practical way to make partial use of an instruction.

Now, the instruction scheduler for an x86 device may well be sophisticated enough to combine a sequence of instructions as shown above into the functionality of the single ARM instruction and obfuscate away the push and pop through the use of rename registers so that the net result is the same, but that is a profoundly elaborate logic structure such as makes the computer faster at the expense of power and heat. The ARM design accomplishes quite a lot for less of that power cost.
 
Last edited:
There are ARM server CPUs that are on par with Intel Xeon and AMD EPYC high-end server CPUs. ARM wasn't originally designed as a "mobile chip", it was designed as a "power efficient multi-purpose chip" including desktops and servers. It just only got its niche in mobile use for decades.
ARM was originally designed in the early 80s as a desktop CPU to replace the 6502 being used in Acorn's computers at the time. The first implementation was used as a co-processor for the 6502 based BBC Micro. Acorn researched 16bit and 32bit CPUs in development or on the market at that time and thought they were too expensive and too slow. They were probably not thinking too much about servers because Acorn didn't build servers and data centers were filled with Mini computers (often DEC VAX) and Mainframes (often IBM) in those days.
 
  • Like
Reactions: Basic75
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.