from Ars, A History of ARM, Part 1, Part 2, Part 3ARM was originally designed in the early 80s as a desktop CPU
from Ars, A History of ARM, Part 1, Part 2, Part 3ARM was originally designed in the early 80s as a desktop CPU
What year is this? Simply no. Just no. This is not how modern big data is done.- they're doing compute on large locally-generated datasets, and it is expensive and slow to upload them to the cloud (I have personally seen this)
First, we're talking about "real" projects, right? Not hobby projects.- they use lots of cloud compute and storage, get the bills, and over time realize that buying their own hardware would have saved them lots of money (I have personally seen this)
I'm sorry. This is not a valid argument. I don't even know how to respond to this.- science runs on grant money, and when they get a lot of cash to spend on a research program, lots of scientists love to buy themselves a flashy computer to run their simulations on, and what's flashier than Apple hardware? (I have personally seen this)
I never said the cloud is guaranteed cheaper.You're out of touch if you think cloud is guaranteed cheaper.
I completely forgot that ARM uses load-store architecture and x86, register-memory architecture.One example would be something like
EOR R8, R9, R7, ASR#34
To fully replicate that operation on x86, you would have to do something like
push r7
asr r7,#34
mov r8, r9
eor r8, r7
pop r7
I never said the cloud is guaranteed cheaper.
Cloud is more expensive for workstations, if the usage is more than 6 months. Security and access in cloud is still a big problem unless you have money to through around for private networks with cloud providers. Google has access to most of the high end GPU servers, I dont want some one snooping on my stuff.This is what I've been arguing for the whole time.
It makes little economic sense to buy a 1TB RAM Mac Pro for local work that can be done faster and cheaper via the cloud.
I'm going to mark you as someone who agrees with me.
Way to genralize stuff. There is lot more than ETL processing. When you look at CV, Speech, and other AI stuff. Training happens in cloud on big clusters of A100. Lot of the inference on trained model happens locally on workstations with A5000 or 4090. Try renting a GPU for A 5000 or 4090 in cloud vs a work station. There is no one size fits solution, each approach has pros and cons.About me: Tech lead for a top Silicon Valley software company. I manage a team of software developers and a 7 figure cloud bill.
What year is this? Simply no. Just no. This is not how modern big data is done.
I've met, chatted, and interviewed hundreds of data engineers and data scientists in Silicon Valley. None of them do things like you mentioned.
In modern big data, you don't generate a large dataset locally that is so large that it can't be reasonably uploaded to the cloud. No one does this. Hell. Forget modern. In my 20 years of software engineering experience, I've never seen someone dumb enough to get into this situation.
Instead, what modern teams do is generate, manipulate, and process data completely in the cloud. No data touches the local machine except maybe small sample data.
It's called ETL. It's all done via the cloud.
Local computers simply aren't used to store or generate data. That's insanity. Big data is done in a team. How are your teammates going to use your data if you can't even upload it to the cloud?
PS. Bandwidth is cheap. You're wrong that it's "expensive". It's the cheapest thing in this process.
First, we're talking about "real" projects, right? Not hobby projects.
This is super funny. It's funny because this is the main argument someone who doesn't know much about running a high-availability software service makes.
Are there niche cases where buying local hardware is cheaper using the cloud? Yes. Maybe 1/1000 times.
Only extremely large companies or companies with special needs would ever build their own data centers.
I'm sorry. This is not a valid argument. I don't even know how to respond to this.
I never said the cloud is guaranteed cheaper.
@mr_roboto It seems like you're trying to appeal to an anecdotal evidence fallacy.
Are you worried about anything in particular? How do you think CSP can know what you have on their computers?Security and access in cloud is still a big problem unless you have money to through around for private networks with cloud providers. Google has access to most of the high end GPU servers, I dont want some one snooping on my stuff.
Inference often involves unpredictable computing needs, and the cloud is much cheaper when the workload is unpredictable. Can you elaborate a bit?Lot of the inference on trained model happens locally on workstations with A5000 or 4090. Try renting a GPU for A 5000 or 4090 in cloud vs a work station. There is no one size fits solution, each approach has pros and cons.
Cloud makes more sense for training but can bevery expensive with anything GPU for inference. Look at the Google Cloud pricing; the V100 cloud cost is around 2K per month. A comparable A 5000 GPU costs around 1.5 to 2K for purchase and allows adding multiple A5000S in a single workstation. Google terminates some of the GPU instances if trying to pause/stop. Azure and AWS are priced much higher and worse than Google regarding GPU. The cheaper options are usually in Eastern Europe in some guy's basement, renting the GPUs.Are you worried about anything in particular? How do you think CSP can know what you have on their computers?
Inference often involves unpredictable computing needs, and the cloud is much cheaper when the workload is unpredictable. Can you elaborate a bit?
What kind of inference workload is so computationally intensive and predictable?Cloud makes more sense for training but can bevery expensive with anything GPU for inference. Look at the Google Cloud pricing; the V100 cloud cost is around 2K per month. A comparable A 5000 GPU costs around 1.5 to 2K for purchase and allows adding multiple A5000S in a single workstation. Google terminates some of the GPU instances if trying to pause/stop. Azure and AWS are priced much higher and worse than Google regarding GPU. The cheaper options are usually in Eastern Europe in some guy's basement, renting the GPUs.
A workstation with two A 5000 will cost around 10K, and the cloud with a similar GPU will be 4k monthly on GCP.
What we're talking about here is $40k Mac pro workstations vs renting in the cloud.Way to genralize stuff. There is lot more than ETL processing. When you look at CV, Speech, and other AI stuff. Training happens in cloud on big clusters of A100. Lot of the inference on trained model happens locally on workstations with A5000 or 4090. Try renting a GPU for A 5000 or 4090 in cloud vs a work station. There is no one size fits solution, each approach has pros and cons.
You're talking about apples vs oranges.Cloud makes more sense for training but can bevery expensive with anything GPU for inference. Look at the Google Cloud pricing; the V100 cloud cost is around 2K per month. A comparable A 5000 GPU costs around 1.5 to 2K for purchase and allows adding multiple A5000S in a single workstation. Google terminates some of the GPU instances if trying to pause/stop. Azure and AWS are priced much higher and worse than Google regarding GPU. The cheaper options are usually in Eastern Europe in some guy's basement, renting the GPUs.
A workstation with two A 5000 will cost around 10K, and the cloud with a similar GPU will be 4k monthly on GCP.
And its going to cost around 6K per month for a similar 40K Mac Pro in cloud with 1 TB RAM. If the compute is predictable for a local workstation, its usually cheaper than Cloud.What we're talking about here is $40k Mac pro workstations vs renting in the cloud.
I don't see how generalizing is wrong if the vast majority of big data is done this way.
Everything has pros and cons. Even the best solution has them.
Who is doing all that with a workstation? Last I checked, a Mac Pro doesn't do any of the stuff you mentioned. If the company is spending 6-8K per month on a workstation for a guy to work from the beach, there is something fundamentally wrong.What about setup, maintenance, automated backups, electricity, cooling, auto-scaling, clustering, bandwidth, SLA, the ability to access this instance while you're working from the beach, and all the other benefits?
And what happens if you buy a $40k workstation and then one year later, it's completely obsolete because data is growing exponentially? What are you going to do with your shiny $40k workstation that is now completely useless for its intended purpose?
Speech, CV, sensors, and graphics. Spot pricing is 50-60% cheaper, but your instance will be stopped under load. On demand Pricing can get very expensive. Reserved is long term commitment, which defeats the purpose of the adaptability. I know folks who have commited to V100 for 3 years, which is worse than a workstation GPU currently.What kind of inference workload is so computationally intensive and predictable?
By the way, what type of pricing do you use: reserved instance or on-demand instance? Sorry for using AWS nomenclature, I'm not sure if there is a standard nomenclature for this.
That is common with scientific data. The data comes from local instruments, and because the internet is slow and/or expensive, it must be processed locally.In modern big data, you don't generate a large dataset locally that is so large that it can't be reasonably uploaded to the cloud. No one does this.
Bandwidth is expensive when someone is downloading your data from the cloud. That's a deliberate choice by cloud providers in an attempt to lock the customers in. Which doesn't work well with scientific work, which is often a collaboration between multiple organizations. Each organization has their own infrastructure, so even moderate-sized projects may have to deal with multiple cloud providers and local/national clusters/supercomputers.PS. Bandwidth is cheap. You're wrong that it's "expensive". It's the cheapest thing in this process.
What local instrument produces so much data that it can't be reasonably uploaded to the cloud and must be processed locally? Examples?That is common with scientific data. The data comes from local instruments, and because the internet is slow and/or expensive, it must be processed locally.
Wait what?Bandwidth is expensive when someone is downloading your data from the cloud. That's a deliberate choice by cloud providers in an attempt to lock the customers in. Which doesn't work well with scientific work, which is often a collaboration between multiple organizations. Each organization has their own infrastructure, so even moderate-sized projects may have to deal with multiple cloud providers and local/national clusters/supercomputers.
Apple also has humongous instruction caches (192 KB) because ARM instructions have greater binary size.
In my field, a sequencing machine may produce a burst of a few terabytes every couple of days. A single facility may operate tens of such machines. While it's possible to upload the data to the cloud (I think the Broad Institute does that), local processing is quite attractive with such data.What local instrument produces so much data that it can't be reasonably uploaded to the cloud and must be processed locally? Examples?
It's a $15k to $20k Mac Pro. And it may be appropriate for the people who develop methods and tools for processing the data.And is a $40k Mac Pro the right machine to process this data?
The price can be several times higher for downloads outside Europe and North America or if the transfer volume is less than 5 TB/month. For popular data resources, transfer fees are generally higher than storage costs.AWS Cloudfront gives you 1TB of transfer for free. After that, it costs as little as $0.02/GB. If you use S3 and you're transferring between different AWS services, it can cost as little as $0.01/GB. I assume that the chances of different organizations using AWS is high since well, practically everyone uses AWS.
I completely forgot that ARM uses load-store architecture and x86, register-memory architecture.
And I read somewhere that Apple’s license only requires they support the full ARM instruction set. It doesn’t say that they can’t add specific instructions that would just be for Apple’s internal use.The advantages of ARM are tiny, but on aggregate they add up. And the trend lines suggest that they will keep adding up.
And I read somewhere that Apple’s license only requires they support the full ARM instruction set. It doesn’t say that they can’t add specific instructions that would just be for Apple’s internal use.
Take for example, all the features of Apple's T2 chip is now entirely inside an Apple Silicon chip. The T2 chip is now way out of date for a modern macOS experience. This means if Apple were to use AMD + Nvidia chips, they'd have to engineer and build a brand new T3 chip just to provide basic macOS features to the Mac Pro. No way.
If you think data sets can always just be poofed into existence on a cloud compute server, well, sure, I guess I see your point.What year is this? Simply no. Just no. This is not how modern big data is done.
I've met, chatted, and interviewed hundreds of data engineers and data scientists in Silicon Valley. None of them do things like you mentioned.
In modern big data, you don't generate a large dataset locally that is so large that it can't be reasonably uploaded to the cloud. No one does this. Hell. Forget modern. In my 20 years of software engineering experience, I've never seen someone dumb enough to get into this situation.
Instead, what modern teams do is generate, manipulate, and process data completely in the cloud. No data touches the local machine except maybe small sample data.
It's called ETL. It's all done via the cloud.
Your hyperfocus on "big data" is telling. That's a buzzword from a certain segment of the tech industry. Makes me think your experience base is narrow, and you aren't fully aware of it.Local computers simply aren't used to store or generate data. That's insanity. Big data is done in a team. How are your teammates going to use your data if you can't even upload it to the cloud?
If you start needing bandwidth on the scale of 100 Gbps, you will run into some serious bills, and as @JouniS mentioned, sometimes you have to deploy to locations where there is no practical way to get that level.PS. Bandwidth is cheap. You're wrong that it's "expensive". It's the cheapest thing in this process.
I assure you that my employer is not paying me as a hobby. I'd rather not give specifics, so I'm going to leave it at that.First, we're talking about "real" projects, right? Not hobby projects.
This is super funny. It's funny because this is the main argument someone who doesn't know much about running a high-availability software service makes.
Are there niche cases where buying local hardware is cheaper using the cloud? Yes. Maybe 1/1000 times.
Only extremely large companies or companies with special needs would ever build their own data centers.
I'm sorry. This is not a valid argument. I don't even know how to respond to this.
I never said the cloud is guaranteed cheaper.
@mr_roboto It seems like you're trying to appeal to an anecdotal evidence fallacy.
The cloud is usually cheaper when computing needs are stable and predictable. In practice, very few companies have such computing needs. What company does not gain/lose customers? What company does not hire/fire personnel? The situation of companies can change very quickly and it is much easier to adapt using the cloud.
Just get a PC.For the Mac Pro, why not simply put a 96-core EPYC AMD CPU in it with a RTX 4090 (if Apple can solve their politics with NVIDIA) while retaining user expandability and repairability for the Mac Pro. Since Mac Pro usually supports dual chips, Apple could even put a 192-core AMD CPU in it even.
Does Apple really believe the M2 Extreme would beat a 192-core AMD CPU and a RTX 4090? Heck, you can probably put multiple RTX 4090 in the Mac Pro even (if Apple solves their politics with NVIDIA).
For laptops, I get it. ARM offers nice battery life, but a Mac Pro has no battery life.