It is just as much a software thing as a hardware one. Afterburner card sits behind an Apple library service for handling decompressing ProRes. If the card is not present then the work is dispatched one way. If the card is present then it is routed through the Afterburner hardware. No major changes in the application that calls it.
A very similar thing is done with Apple Private Cloud Compute ( PCC).
" ... We paired this hardware with a new operating system: a hardened subset of the foundations of iOS and macOS tailored to support Large Language Model (LLM) inference workloads while presenting an extremely narrow attack surface. This allows us to take advantage of iOS security technologies such as
Code Signing and
sandboxing. ..."
Secure and private AI processing in the cloud poses a formidable new challenge. To support advanced features of Apple Intelligence with larger foundation models, we created Private Cloud Compute (PCC), a groundbreaking cloud intelligence system designed specifically for private AI processing...
security.apple.com
If this module was on PCI-e card it wouldn't have to be limited to just the Mac Pro. ( Mac Studio with thunderbolt connection to a external PCI-e slot would just have slower connect. ). Apple PCC goes out over the internet, to Apple data center an back. Round trip time from a local PCI-e attached card probably should go faster.
The large catch here is that the 'functions' would largely be what Apple libraries already cover. ( e.g., decode ProRes with Apple code as opposed to some highly proprietary clone of that functionality. )
There is a substantive limitation here. How much stuff is really tolerant of "a few moments". The longer the latency to/from the more pronounced the juggling act. Moving the ProRes decode on the same die allows the M2-M4 to far exceed what one Afterburner card can do. And at a much lower cost ( so price/performace) is radically different.
. If Apple put a M4 Max/Pro on a PCI-e card with this 'thin' PCC OS then all really need to add is a 'virtual Ethernet' over PCI-e so that can connect the pieces of software with the connection they already use to communicate. ( Apple has some security layers with PCC that is overkill on local to the same machine. )
Apple has some narrow API to distribute workload over multiple macs. This WWDC 2022 session used four Mac Studio's inter-networked via Thunderbolt connection ( using three ports on each can connect all of them with point-to-point connections).
https://developer.apple.com/videos/play/wwdc2022/10063/
Three Mn Max Mac-on-card in a Mac Pro can do all that point-to-point connections with no TB cables. ( may need some power to the card so perhaps some AUX power cables). Doesn't have to be on a module that only fits into a Mac Pro. If a full fledge Mac on a card there are folks with PC's that may want to plop one of these nodes into their system. ( a much broader scope).