macOS Python, M1 and multiprocessing

DWHH1 · Apr 23, 2021

I have a general question about Python and M1 processors. Can someone please help? I write a lot of code in Python (3.8) and in particular use the concurrent.futures library for multiprocessing. So here is the question. On Intel based iMac's concurrent.futures can be told to use a number of 'workers', ie a number of cores. Since the Apple Silicon M1 processor has 8 cores has anyone used Python and concurrent.futures on an M1 based Mac and if so were you able to spawn/fork the task across 8 cores?

In simple terms - does multiprocessing Python work on M1 Macs?

casperes1996 · Apr 23, 2021

I've never been a big fan of Python and their multiprocessing model can be a bit of a wicked beast at times with its own idiosyncrasies that I don't know much about, but I can tell you a bit about the way macOS does scheduling for DynamIQ (efficiency and performance cores)

Generally with an asymmetric multiprocessor model the OS will have two independent scheduler queues. When a process yields or the clock interrupt fires the core will enter its specific scheduler queue to look for work (or power down depending on situation).
High priority threads will more likely land on the performance cores will QoS background threads are more likely to land on efficiency cores but both scheduler queue could in theory take any task depending on situation.

As for your question then, as long as a kernel thread/process is created and Python doesn't keep the thread local to itself, like if it attempts to detect number of cores and only creates kernel threads for that many user threads. - Well then it should be able to use all the M1 CPU cores

If you spawn a given number of "workers" on your Intel Mac, do they show as independent processes in Activity Monitor? Does the overall process list that number of "threads" in Activity Monitor? Even if you try and spawn more than you have hyper threaded cores on your Intel machine? In that case, it should work with all M1 cores too

DWHH1 · Apr 25, 2021

Sorry for my late response. I was beginning to think I wouldn't receive any reply at all so didn't check the forum. Thank you for your time.

Responding to your final paragraph. Python 3.8 Concurrent.Futures and an Intel iMac have problems. It is necessary to issue an mp.set_start_method('fork', force=True) - part of the multiprocessing library - before multiprocessing otherwise C.F and Python crashes, certainly fails to start. With this step it is possible to launch and see separate instances in Activity Monitor. I use the ProcessPool.executor rather than Threadpool.executor because of the nature of the tasks.

I have choices. Later versions of C++, eg C++17 and C++20, have a jthread multiprocessing/threading option which probably works with M1 but I don't need the overall super-speed of C++ I just need a broad front of workers.

I have almost no experience with SWIFT so I don't know if that language offers something equivalent - but I am sure it does.

But the real focus of my question is whether buying a new iMac with an M1 processor to run Python in multiprocessing mode would prove to be a colossal waste of money (for me, and my use case). If it is NOT possible for Python Concurrent.Futures to 'switch-on' multiprocessing or multithreading on the M1 then the new iMac's are quite a disappointing underperformer.

I hope you are right, and your advice certainly makes sense, but I am surprised that no one has, by now, actually tried firing up M1/Python/multiprocessing and discovered there are no problems at all.

casperes1996 · Apr 25, 2021

DWHH1 said:
But the real focus of my question is whether buying a new iMac with an M1 processor to run Python in multiprocessing mode would prove to be a colossal waste of money (for me, and my use case). If it is NOT possible for Python Concurrent.Futures to 'switch-on' multiprocessing or multithreading on the M1 then the new iMac's are quite a disappointing underperformer.

It may depend on where you are from, but in some areas at least you can buy a Mac, try it out, and return it and get all your money back if it doesn't do what you want/expect with no questions asked. 14 day free return window here.

I would highly expect it to work fairly similarly to how it's worked on the Intel Mac for you, but can make no guarantees and unfortunately have no M1 Mac to test anything for you either.

You might alternatively try asking in the general M1 forums if someone would be willing to run test code for you. The developer forums receive a lot less traffic than other forums on here.

WC7 · May 10, 2021

Have no idea about the Python & MP usage on the Apple M based hardware ... maybe check over at the Apple Developer site there is an answer? I only use Apple's Accelerate framework and Apple's Metal(for their GPU) ... BLAS/Lapack solvers for calculations. I don't think these are MP directable. Accelerate is optimized, yes ... but MP user directable, no. I realize you are doing something entirely different with Python and MP.

WC7 · May 10, 2021

Oh, and I try to use Swift for my coding.

glyph. · Jun 28, 2021

Python works fine on the M1; nothing about `multiprocessing`is CPU-architecture specific, it more depends on the operating system than the chip.

That said, there are still some fiddly details with multi-architecture binaries, extension modules, and third-party libraries which you’ll need to figure out. So while everything should eventually work, you should expect to spend a little more time than previously on e.g. the Python discourse server or libera.chat #python IRC channel to understand some odd error messages or required configuration during the Intel->M1 transition period. (Worst case scenario you can just run Python under Rosetta 2, fooling it into thinking you’re still on an Intel mac, while you’re figuring stuff out.)

Senor Cuete · Jun 29, 2021

I'm not a Python programmer so forgive me if I sound ignorant. You're programming on the Mac, so can't you use the Cocoa thread manager for multiprocessing? Wouldn't this run natively on the M1 and avoid a lot of complications?

Apple recommends that programmers migrate their multi-threaded applications from the thread manager to the newer concurrency APIs:

Introduction

Explains how to implement concurrent code paths in an application.

developer.apple.com

WC7 · Jun 29, 2021

Hmm, forgot about the GCD. Interesting. I know I am not doing what was asked about Python and multiprocessing ... and specifically with a large existing code base (and possibly a "broad front of workers"). My application area is simply linear algebra. I've been using the routines Apple's Accelerate on the CPU side and the Metal on the GPU side. I am still testing with tiny matrices before I try something that needs a lot of 'workers'. On the Metal side with the M1 GPU they mention as many as 25 thousand threads. I am hoping with the Metal Performance Shaders they will use some of this capability in their built-in kernel routines. So far I have tested calculations with Float and Float16 types just using tiny matrices (not hopeful for large matrices with Float16). Also, there is the neural engine but I haven't contemplated anything on the matrix 'recognition' ... if anything is possible with that.

glyph. · Jun 29, 2021

Senor Cuete said:
I'm not a Python programmer so forgive me if I sound ignorant. You're programming on the Mac, so can't you use the Cocoa thread manager for multiprocessing? Wouldn't this run natively on the M1 and avoid a lot of complications?

Apple recommends that programmers migrate their multi-threaded applications from the thread manager to the newer concurrency APIs:

Introduction

Explains how to implement concurrent code paths in an application.

developer.apple.com

You technically *can* use macOS specific APIs from Python, there’s not much advantage to doing so, particularly for something generic like threading. You lose one of the major benefits of the language — portability to other platforms — which is particularly bad for data science (which is what it sounds like the OP is doing) because you’re almost certainly going to want to run your models in production on a Linux system.

To create threads, You’d use Python’s own threading APIs which are mainly based on UNIX primitives. If somebody would want to make use of a platform-specific threading API it would probably be Python’s core team, not application authors.

Finally, once again, none of this really has anything to do with CPU architecture. The same considerations apply for the exact same reasons on Intel.

glyph. · Jun 29, 2021

mp.set_start_method('fork', force=True)

Also, just for what it's worth, this is a *really* bad idea; the start method should always be the default, which is `spawn`. If you're having trouble with the spawn start method, that suggests something in your code is relying *way* too much on some startup implementation details in your main process which are not properly replicated down to the spawned workers.

It's worth debugging this, since using the fork start method means (for example) it's not possible to use *any* Apple libraries; CoreFoundation, Cocoa, CoreGraphics, etc will all intentionally crash themselves if they notice a fork()ed process which has not exec()ed. See for example https://stackoverflow.com/questions/7379419/fork-cocoa-process-and-re-init-cocoa-how

For a common cause of issues with the spawn start method, see https://stackoverflow.com/a/57192877/13564

Search

Search

macOS Python, M1 and multiprocessing

DWHH1

macrumors member

casperes1996

macrumors 604

DWHH1

macrumors member

casperes1996

macrumors 604

WC7

macrumors 6502

WC7

macrumors 6502

glyph.

macrumors newbie

Senor Cuete

macrumors 6502

Introduction

WC7

macrumors 6502

glyph.

macrumors newbie

Introduction

glyph.

macrumors newbie

Our Staff