Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.
Heh, it looks like Apple has made their own approach for this API. Not as explicit as DX12, and Vulkan, but still low-level enough to bring performance and explicit control over hardware that is needed for future software(VR, games, entertainment, etc...).

I am optimistic right now. It makes dev to work a bit to make application run on OS X, however, it is not as huge amount of work as is with other APIs that are explicit and require direct optimization for specific hardware. Apple offers only Intel and AMD graphics in their computers, so there is not that much to do.

Also, https://developer.apple.com/videos/play/wwdc2016/604/ In this film I understand that Tessellation made from compute shaders is running Asynchronously of graphics pipeline(concurrently). So this is direct implementation of Asynchronous Compute feature from AMD GPUs.

Too bad that only GPU that will see performance benefit from this are those made on Tonga architecture. Tahiti, and Pitcairn, and Cape Verde will not see the benefit from this.
 
Somebody go corner Tim and find out why he won't just adopt Vulkan instead of this convoluted homegrown Metal that takes more twists and turns than needed in todays world. =\
 
Somebody go corner Tim and find out why he won't just adopt Vulkan instead of this convoluted homegrown Metal that takes more twists and turns than needed in todays world. =\

Vulkan is not a panacea. Many of the API/shader features that differentiate Metal from other APIs are also *optional* in Vulkan (incl. geometry & tessellation shaders) so there is simply no guarantee that any potential Apple Vulkan implementation would have been much different to Metal.

I honestly don't know the in's & out's of Apple's decisions but fairly obviously Vulkan came too late for them to adopt - Metal shipped on iOS 1.5 years prior to finalisation of the Vulkan API. Once they'd done the work for iOS why throw it out for someone else's API?

For good or ill Apple are committed to Metal, which is why they keep adding new features each year and that's what I want to see as a developer.
 
Mark, Apple emphasized the easiness of porting the code from DX12 to Metal in their keynote about "what's new in Metal" Are you agreeing with this?

Is it possible now to port more modern games to macOS than it was before, both on feature and "time to code" side?
 
Heh, it looks like Apple has made their own approach for this API. Not as explicit as DX12, and Vulkan, but still low-level enough to bring performance and explicit control over hardware that is needed for future software(VR, games, entertainment, etc...).

That's pretty much the idea as I see it as a developer.

Also, https://developer.apple.com/videos/play/wwdc2016/604/ In this film I understand that Tessellation made from compute shaders is running Asynchronously of graphics pipeline(concurrently). So this is direct implementation of Asynchronous Compute feature from AMD GPUs.

I don't think its just AMD, I believe some of the mobile GPUs have similar capability. Either way you'd need to also handle GPUs that can't do that (i.e. Intel & Nvidia) since they're a big chunk of the Mac install base.

Too bad that only GPU that will see performance benefit from this are those made on Tonga architecture. Tahiti, and Pitcairn, and Cape Verde will not see the benefit from this.

I'm not sure that's entirely true: http://www.anandtech.com/show/9124/amd-dives-deep-on-asynchronous-shading

Tonga and later GCNs are definitely better suited because of the additional scheduler but if the anandtech article is correct then at least some of the functionality is available on older GCN versions too.
 
Tonga and later GCNs are definitely better suited because of the additional scheduler but if the anandtech article is correct then at least some of the functionality is available on older GCN versions too.
80321.png

Look at Tahiti(HD 7970). Almost zero benefit.
 
Look at Tahiti(HD 7970). Almost zero benefit.

The benefit is dependent on how much of the scene you can overlap and how much unused GPU power you can leverage. Since some resources will be shared when running concurrently a GPU that has more spare/under-utilised capacity will see a much bigger uplift than one that is already near saturation. That graph does show that there is support for and a benefit from the feature even on older GCN parts. While 0.7ms might not seem like a lot it is actually surprisingly hard to make GPU optimisations in an established engine so 'freebies' like this are always welcome even if they are less impressive than on the newer/faster parts.
[doublepost=1466273174][/doublepost]
Mark, Apple emphasized the easiness of porting the code from DX12 to Metal in their keynote about "what's new in Metal" Are you agreeing with this?

I won't answer this directly but earlier you posted:

I am optimistic right now. It makes dev to work a bit to make application run on OS X, however, it is not as huge amount of work as is with other APIs that are explicit and require direct optimization for specific hardware. Apple offers only Intel and AMD graphics in their computers, so there is not that much to do.

So you've seen that there's definitely effort involved in porting from other APIs. As a developer you've got to work around the API bits that aren't the same and that takes time and effort. That's easier for some game engines than others but for original development you've generally got the flexibility to make the changes you need. Sometimes that means just turning a feature off because it can't be reasonably supported.

However, porting games after the fact means implementing the D3D11 or D3D12 APIs and that becomes substantially harder when the target API, whether that is OpenGL or Metal, differs. The bigger the difference the harder it is. The absence of read/write in fragment shaders was the big limitation for me in 10.11, but for a porting house they'd also have to deal with the absence of geometry shaders, atomic shader intrinsics on textures, plus now tessellation shaders being different etc. All of that adds up and it often isn't practical to turn off features that can't be made to work when porting a finished game. It's all possible to do, but it requires a big initial investment and won't always be 'free' in performance/memory terms.

Is it possible now to port more modern games to macOS than it was before, both on feature and "time to code" side?

We have more features and I intend to use them in UE4. I'm sure other developers will too, so more games will be closer feature-wise to their Direct3D Windows version. In terms of "time to code", I don't really thing much has changed - anything that isn't *identical* to D3D is going to incur a setup cost and then each new game will expose new & exciting problems ;)
 
  • Like
Reactions: old-mac-man
Somebody go corner Tim and find out why he won't just adopt Vulkan instead of this convoluted homegrown Metal that takes more twists and turns than needed in todays world. =\

While I agree that open standards really helped Apple become a viable dev target, Apple has always prioritized a complete hardware/software combination that fits their priorities. The "whole widget" approach to their platforms. Metal allows them to keep that level of synergy and control of the hardware and software layer. As long as the Metal API works correctly and is well supported/optimized (which seems to be happening) it's a step up for developers over Apple's Mac OpenGL implementation which was rife with bugs and poor driver implementations.

Sure there are missing APIs from other rendering APIs, and there will just be some things that can't be ported to Apple's platforms. After devs get used to Metal, many have stated how clean and easy to work with the API is compared to DX12 or Vulkan. Metal is a modern high performance API that is actually usable by mere mortal developers. Also with some of the features and dev tools that they announced at WWDC, I could see them loving Metal even more. Also there are features for iOS that take advantage of Apple's consistent usage of tile-based rendering GPUs. This would normally be handled by an OpenGL extension with poor driver support and unlikely to be ratified into the spec proper that Apple now has full control over.

It is still early, but I see Vulkan mainly as a Linux/Android API. Apple (by only supporting Metal) and Microsoft (using D3D12 and pushing hard for Windows 10 adoption) and Vulkan entry to market pretty much sealed the future for Vulkan. While the dream of a completely cross-platform API would be nice, and I'm sure there will be hacks for trying to keep this possible, the reality is that it's unlikely to be well supported. While macOS market is minuscule, the iOS market is too big to ignore. Also with the increased usage of development abstraction tools like game engines, native 3D API programming has become more of a niche target for most game devs as they would rather let the engine handle the native API platform abstraction so that they can focus on making the game. These engines know that cross-platform is a big selling point and thus puts the onus of optimizing/supporting the native platform on the engine devs. I'm sure the engine devs would rather have a single API, but the market reality doesn't support it. Unless market forces shift dramatically, I don't see the dream that was Vulkan becoming a reality.
 
Guys, you are the devs here, I am only the one interested in technology so I have a question.

Does Metal allow multi GPU configurations for gaming, or still only single GPU possible under OS X?
 
Guys, you are the devs here, I am only the one interested in technology so I have a question.

Does Metal allow multi GPU configurations for gaming, or still only single GPU possible under OS X?

Metal doesn't really support multi-GPU rendering - GPUs are accessed entirely independently and the only way to copy resources between them involves the CPU/system-memory. That's fine for running concurrent compute tasks in an application, but not ideal in a realtime graphics engine. While it is sort-of possible in Metal, it isn't really equivalent to D3D12's multi-adapter features.
 
  • Like
Reactions: Flint Ironstag
Remember too that Metal came about (at least at first) for primarily running on phones and tablets, to have a high efficiency (reading longer battery life) graphics rendering engine for those devices. Laptops too I suppose.. Battery life is always a concern with those particular types of devices.

What I keep thinking is that I'm not sure adopting this for desktop computers is the will be a great thing since I've already heard grumblings from some developers stating phrases along the lines of the following "we're not going to bother to develop in this metal thing, why don't they just finish their opengl implementation so we can at least use some of the code base we already have? we don't want to reinvent the wheel. they're just shooting themselves in the foot. woof woof, blah blah, oh by the way, metal is still in its infancy so we're waiting and seeing or not bothering - plus we make precious few dollars from the mac versions of the games we do sell" (paraphrasing a number of different dev gripes).

But with power efficiency being a bigger and bigger issue for many, it looks like apple is going for a game with an end way out there past the horizon. Metal is still in its infancy compared to something like OpenGL that technically is at the end of its development cycle. Now if they'd standardized on Vulkan......they may not have achieved the energy savings they wanted.
 
Last edited:
Some nice posts by Leman not far from here: https://forums.macrumors.com/threads/looks-like-metal-got-an-update.1977661/#post-23018833
[doublepost=1466576861][/doublepost]
I don't think its just AMD, I believe some of the mobile GPUs have similar capability. Either way you'd need to also handle GPUs that can't do that (i.e. Intel & Nvidia) since they're a big chunk of the Mac install base.
But would the "asynchronous" compute shader stage in the tessellation pipeline still take advantage of the asynchronous compute feature of AMD cards? Or is it completely unrelated?
 
Some nice posts by Leman not far from here: https://forums.macrumors.com/threads/looks-like-metal-got-an-update.1977661/#post-23018833
[doublepost=1466576861][/doublepost]
But would the "asynchronous" compute shader stage in the tessellation pipeline still take advantage of the asynchronous compute feature of AMD cards? Or is it completely unrelated?

Cool, leman seems pretty knowledgeable who is it?

I would imagine that the AMD drivers for Metal for OS X would execute both compute and rendering to the GPU at the same time to take advantage of Async compute. The problem is that there needs to be a scheduler that manages the GPU so that you don't over saturate the GPU with too much async during rendering.

Although let's be real here. While the advances in the Metal API will help on both iOS and macOS, macOS hardware is a travesty of GPU performance right now. Granted we're rumored to get some Polaris 11 GPUs soon, which should be a good step up, the current hardware for macOS is awful for 3D rendering. If Apple fully embraces Polaris 11 on ALL of their products (not just their $2500 laptops) and increases their 3D application marketshare we might see more support from developers on the platform.
 
Cool, leman seems pretty knowledgeable who is it?

I would imagine that the AMD drivers for Metal for OS X would execute both compute and rendering to the GPU at the same time to take advantage of Async compute. The problem is that there needs to be a scheduler that manages the GPU so that you don't over saturate the GPU with too much async during rendering.

Although let's be real here. While the advances in the Metal API will help on both iOS and macOS, macOS hardware is a travesty of GPU performance right now. Granted we're rumored to get some Polaris 11 GPUs soon, which should be a good step up, the current hardware for macOS is awful for 3D rendering. If Apple fully embraces Polaris 11 on ALL of their products (not just their $2500 laptops) and increases their 3D application marketshare we might see more support from developers on the platform.
Hardware Scheduler is part of GCN architectures from GCN 1.2(Tonga). This feature is not available in previous versions of GCN architecture. M295X, M395, M395X already has it. Upcoming, potential refresh of computers with Polaris GPUs presumable will also bring this bit of hardware to wider range of computers.
 
Hardware Scheduler is part of GCN architectures from GCN 1.2(Tonga). This feature is not available in previous versions of GCN architecture. M295X, M395, M395X already has it. Upcoming, potential refresh of computers with Polaris GPUs presumable will also bring this bit of hardware to wider range of computers.

Yeah, from what I understand, that hardware scheduler is what allows AMD to manage the async compute in the first place. It will execute a program concurrently on the unused cores. However, what I'm referring to is you can't push the entire game's loop through the GPU compute with the rendering and expect better frame rates. You need to balance the wins of compute vs the available overhead on the GPU during the frame. If you over saturate the scheduler it will start eating into rendering time.
 
  • Like
Reactions: koyoot
Yeah, from what I understand, that hardware scheduler is what allows AMD to manage the async compute in the first place. It will execute a program concurrently on the unused cores. However, what I'm referring to is you can't push the entire game's loop through the GPU compute with the rendering and expect better frame rates. You need to balance the wins of compute vs the available overhead on the GPU during the frame. If you over saturate the scheduler it will start eating into rendering time.

If you look at the graph posted previously you'll see that async. compute works (to a greater or lesser extent) on all AMD GCN GPUs. Tonga improves this because each compute unit has its own scheduler so can interleave work from one task with work from another if it finds itself idle - at least that is my understanding from the various papers as it certainly isn't crystal clear.

You are however right that you must think about what passes/operations you can overlap very carefully as any resource dependency or hazard will prevent concurrency and if one of the tasks can saturate the compute-units then you won't see a win.
 
What about the new "ressource heaps" that have been introduced? They're not in macOS (yet), but if I get this right, they bring Metal closer to DX12 and Vulkan in term of control and ability to extract the most of the hardware. Are they homologous to descriptor tables in these APIs?
It generally appears that Metal is a good tradeoff between power and ease of use. From what devs day, Vulkan is much harder to learn.
 
Resource Heaps has been brought by Direct3D12 feature set. Also few features from D3D12 are apparent in Metal., but still they are not available on Metal for Sierra, which is at least bizarre.
 
The Heap API is broadly analogous to the Direct3D Heap API, though like with most other features the Metal version is simpler and potentially less flexible. The Heap API is not the same as D3D12's descriptor tables and root signatures, which exist to make rendering/compute 'bindless' - the current Metal doesn't take that approach.

I'm not aware of explicit support for GCN's Out-of-Order execution in Metal - but the driver can make optimisations that Vulkan/D3D12 couldn't so don't take that as gospel.
 
Guys, you are the devs here, I am only the one interested in technology so I have a question.

Does Metal allow multi GPU configurations for gaming, or still only single GPU possible under OS X?

Metal doesn't really support multi-GPU rendering - GPUs are accessed entirely independently and the only way to copy resources between them involves the CPU/system-memory. That's fine for running concurrent compute tasks in an application, but not ideal in a realtime graphics engine. While it is sort-of possible in Metal, it isn't really equivalent to D3D12's multi-adapter features.

this is the only benchmark app that i know https://render.otoy.com/octanebench/ that can access and use multi GPU at GPU at the same time , so is possible to use multi GPU in mac os. the only problem is apple not writing the software to use multi GPU in macos , something like SLI , the only possible system that can use multi GPU is a MAC PRO , i know about MACS with dual GPUS but i don't know if they can be use at the same time , i don't have a MAC but for what i know. one of the cards is called discrete GPU and i think you can switch between the main card and the discrete card to save battery or something like that. i don't remember exactly right now. imagine if apple don't upgrade the OPENGL and OPENGL i doubt very much that they are going to write anything to support multi GPU or SLI for MAC. our only hope is nvidia and i also doubt it very much too. but is possible.

edit
i just updated to the new CUDA 7.5.30 and the app is working in EC , i will post the screenshots later
 
Last edited:
if you can access multi GPUS for video rendering, why they can't do the same for games, they just don't want to or they just don't care, either way still love apple and i like it more than windows, well let me say that again, something didn't sound right, i hate windows and i don't like windows at all, now that's better, here are the screenshots, first using only one card then using both cards at the same time, as you can see in the results it doubles the speed, i rather play emulators in macos than having to use windows for gaming.
161cf2g.jpg




i22hy8.jpg




nb2fxs.jpg




10iie0o.jpg
 
Last edited:
From what I know of Metal, there is nothing to stop you from using multiple GPUs. The problem is that there isn't driver/hardware support to put the GPUs into XFire/SLI mode. From what I understand (there are various ways this multi-gpu is implemented), to render efficiently the resources are usually shared between the cards using a fast hardware bus. This communication is much faster than the PCI bus used to communicate from the CPU to the GPU. If you have to communicate that data 2x through the PCI bus much of the benefit of multiple cards is negated.

However, there absolutely could be a uses for using one metal device for compute and another GPU for rendering as long as the compute isn't too bandwidth heavy. However, most Macs have only 1 GPU so your target audience for this behavior is relatively small and not worth it. I would expect only the Xeons on the Mac pro would have the PCI bus qty necessary to facilitate the necessary communication to/from the GPUs effectively.

This is assuming that we're talking about real-time rendering where you only have 16ms per frame. For offline rendering, multi-GPU PCI communication is not as much of a bottleneck.
 
Ferazel is mostly correct:

1. Video/photo rendering tools, including OctaneRenderer linked above, use Compute (OctaneRenderer is CUDA-only), not Raster, operations for their GPU acceleration. These are workflows designed for parallel processing whereas the realtime rendering of a game-engine is very linear with lots of resource dependencies. That's why async. compute isn't always a huge win.

2. There has never been any support on OS X for Nvidia's SLI or AMD's CrossFire features AFAIK. On PC these are driver-level features that make multiple physical GPUs appears as a single logical device and the driver manages marshalling data between the GPUs. CrossFire uses direct DMA between GPUs using the PCI-E bus & SLI has a separate physical interconnect but either way the transfer does not involve a round-trip to system memory or require synchronisation with the CPU so is *reasonably* fast.

3. Apple's OpenGL did support automatic marshalling of data across shared contexts, which could be bound to different GPUs but in that case it did so using a round-trip to system memory controlled by the CPU. That's too slow for a real-time game engine because it would happen too often. Metal is worse still as it won't do this for you - you'd have to use separate id<MTLDevice> objects and thus entirely separate id<MTLResource>'s for each id<MTLDevice>. To copy across you stall one GPU to read the data, invoke CPU memcpy to copy to a temporary resource, then upload to the other GPU - exactly what Apple's OpenGL did and just as slow.

5. Only the 2013 Mac Pro shipped with 2 GPUs intended for concurrent processing and even then only one of the GPUs can drive displays so it isn't a feature that could be widely used on current hardware.
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.