FCP X and CPU Cores.....

Discussion in 'Digital Video' started by Ifti, Aug 26, 2016.

  1. Ifti macrumors 68000

    Joined:
    Dec 14, 2010
    Location:
    UK
    #1
    Considering I only ever use my late-2013 rMBP 15" (spec in sig) for editing in FCPX, I'm thinking of making the move to a MacPro some time in 2017 - considering I am currently saving Amazon gift vouchers (which would be used to make the purchase from Amazon rather then Apple directly - not that this is relevant! lol)
    Anyhow, how many CPU cores can FCPX actually make use of?? Will it use all resources you can provide it, or is the software still limited to a certain number of cores?
     
  2. BenClement1978 macrumors member

    BenClement1978

    Joined:
    Sep 10, 2011
    Location:
    Antwerp Belgium
    #2
    "The old FCP could only use a core or two of a multi-core computer and only up to 4 GBs of ram. The new FCP can utilize 12 cores at once (that’s the peak of what Apple currently sells), up to 64 GBs of ram (that’s also the peak of what Apple currently sells) and sports GPU acceleration for even faster rendering of video and effects (Apple will have computers that have more than 12 cores and can handle more than 64 GBs of ram in the near future)."
     
  3. Ifti thread starter macrumors 68000

    Joined:
    Dec 14, 2010
    Location:
    UK
    #3
    Brill thanks!
    Looking to max spec a MacPro so I'm good for the next few years!!
     
  4. ColdCase macrumors 68030

    Joined:
    Feb 10, 2008
    Location:
    NH
    #4
    I've heard its not as simple as maxing out on cores and although FCP could use all the cores, it rarely does. You probably know that more cores doesn't necessary mean shorter processing time, because higher core counts are made up of lower speed cores.

    For me, the only time FCPX seems to use more than 75% of my 8 core macpro is when doing something like stabilization, and FCP also uses the GPU in that case. So there may be a compromise, and I've heard that 8 cores with dual D700 GPUs may be the sweet spot of the current line up for most mixed FCPX type applications. If there is something new released next year, it may be another ball game.

    On a side note, I notice the OS will often throttle FCPX down from 75% to less than 50%. The only app I run that consistently uses all available cores to the max is handbrake. I'm sure there are others.

    Just saying that it may not be that simple unless you know whats going to be your most demanding job, and optimize around that.
     
  5. BenClement1978 macrumors member

    BenClement1978

    Joined:
    Sep 10, 2011
    Location:
    Antwerp Belgium
    #5
    I personally would just go with a 6 core machine. seems to be the sweet spot. all depending on what you do with it and how much cash you have to burn ;)
     
  6. sevoneone macrumors 6502

    Joined:
    May 16, 2010
    #6
    This, Final Cut utilizes GPU power more than it does CPU. Or, consider a Hackintosh. The 2013 MacPro is now 3 years old. Head over to the MacPro thread and you will see there are several people that have already gone this way.
     
  7. joema2 macrumors 65816

    joema2

    Joined:
    Sep 3, 2013
    #7
    That is not my experience. Many common tasks such as scrubbing through a 4k H264 timeline or applying stabilization or noise reduction will use every available core. Also time-consuming tasks like encoding (exporting) to H264 are inherently CPU-bound and cannot be greatly accelerated by GPU parallelism. Anyone who is a serious video editor wants all the CPU horsepower they can get. This doesn't mean GPU and I/O are unimportant, only that CPU is vital, as can be seen by anyone doing these tasks in FCPX or Premiere while watching per-core CPU activity.

    OS X does not generally throttle how much CPU power FCPX can use. The only exception is on a hyperthreaded CPU the OS X thread dispatcher can apparently sometimes detect a potential "cache thrashing" situation and it will schedule threads on every other core. This is not throttling the available CPU capacity, as running on every virtual core would be even slower -- it is making the most efficient use of that capacity.

    On most operating systems the scheduling algorithm is quite simple -- the highest priority threads in a runnable state get dispatched to a CPU core for a time slice, then after several milliseconds the scheduler (driven by a real-time interrupt) wakes up, re-evaluates things and reschedules. The thread priority only determines IF the thread gets assigned to a core, not how fast it runs. There is generally no concept of quota-based CPU consumption in most operating systems, and even where that exists it is normally not used.

    Where a CPU core is not 100% utilized this is generally because the available pool of threads are not in a runnable state 100% of the time -- they are waiting on I/O, GPU, network, memory, or a synchronization event from another thread.
     
  8. handsome pete macrumors 68000

    Joined:
    Aug 15, 2008
    #8
    Yeah, the MacPro is a tough buy these days with the complete uncertainty of its upgrade schedule.

    I went a step further and just did a straight up Windows build. Would have loved to keep FCPX, but didn't want to deal with any hiccups going the hackintosh route despite the claims that's it pretty rock solid. Adobe and Autodesk fill my needs software-wise, and at risk of being labeled a heretic around here, Windows 10 is pretty nice.

    So I haven't paid much attention to the MacPro board since I abandoned ship, but I hope those guys over there that are still waiting it out get placated soon.
     
  9. ColdCase macrumors 68030

    Joined:
    Feb 10, 2008
    Location:
    NH
    #9
    Know just enough to be dangerous.

    I based my throttle comment on seeing several system log to the effect that "FCPX is misbehaving by using more than 75% CPU for some time, throttling back to less than 50%"

    On my stock 8 core, dual D700 I notice that often CPU utilization varies from ~1500% to ~ 700% to ~300% and back up when FCPX is the only thing running. Doesn't seem to be any correlation to what FCPX is doing. Stabilization does use more CPU% than a share/export of a Master clip, which rarely reaches 500% as reported by activity monitor.

    I've read elsewhere here that most think the 8 core is in a sweeter FCPX spot than the 6 core.
     
  10. joema2, Aug 29, 2016
    Last edited: Aug 29, 2016

    joema2 macrumors 65816

    joema2

    Joined:
    Sep 3, 2013
    #10
    I have never seen that in my console log and I Google searches on many variations, and I don't see anyone ever reporting that. If you still have the exact syntax, I'd be curious what that is.

    There is a fundamental difficulty with OS schedulers detecting misbehaving or "CPU hog" processes. How would it know the difference between one that just needs a lot of CPU vs one that is misbehaved and stuck in a tight loop?

    Some operating systems like Windows since 2008 have had optional CPU rate limits or quota limits. However this is a fixed hard limit on % of CPU and it doesn't fluctuate or change with process behavior: https://technet.microsoft.com/en-us/library/ff384148(v=ws.10).aspx

    Historically, class-based or quota-based schedulers have a lot of overhead and are prone to make mistakes or induce undesired behavior. With threads (lightweight processes) the scheduling complexity and cost increases. E.g, if you only provide quotas for the process (which may contain 100 threads) and one of those thread get out of control, limiting the process will penalize all the other good threads in that process. If finer-grained quotas are implemented at the thread level this is a lot of overhead.

    On OS X (as on many OSs), I believe processes have a priority within the OS X scheduler. A high-CPU process may have its priority lowered to give other processes a chance to run. However this by itself does not throttle the process according to a quota. Here is some background on different CPU scheduling methods: https://www.cs.rutgers.edu/~pxk/416/notes/07-scheduling.html

    CPU will vary a lot based on what you're doing at the moment, and how the app is configured. E.g, with background rendering on, FCPX will unpredictably take CPU time depending on what the timeout is and how much rendering is remaining. Even with background rendering off, just clicking on a clip with a CPU-heavy effect applied can cause CPU consumption to increase.

    I believe on machines with hyperthreaded CPUs, Activity Monitor combines the CPU consumption of all virtual cores
    to a single composite number which can be artificially skewed low if some hyperthreaded cores are not scheduled to avoid cache thrashing. That is why you have to look at all virtual cores to see what's really happening, not just the combined number.

    In general FCPX is efficiently multithreaded and will use all available CPU cores unless it is waiting on something else such as GPU, memory or disk I/O. For any serious video editor who can afford it, an 8-core or higher machine is a good idea, especially on 4k.
     
  11. ColdCase, Sep 1, 2016
    Last edited: Sep 1, 2016

    ColdCase macrumors 68030

    Joined:
    Feb 10, 2008
    Location:
    NH
    #11
    I've checked the console log the past couple of projects and haven't noticed the kernel catching FCPX using too much CPU for an extended period. I have seen a few of these, however. The syntax of the CPU use entry was similar.:

    "8/31/16 5:30:35.000 PM kernel[0]: process Final Cut Pro[34305] caught causing excessive wakeups. Observed wakeups rate (per sec): 2497; Maximum permitted wakeups rate (per sec): 150; Observation period: 300 seconds; Task lifetime number of wakeups: 15282188"

    So the CPU entry would have been something like this:

    "8/31/16 5:30:35.000 PM kernel[0]: process Final Cut Pro[34305] caught using excessive CPU. Observed CPU rate: 75%; Maximum permitted CPU rate: 50%; Observation period: 300 seconds; restricting CPU to 50%"

    (I don't recall exactly the italics wording)

    I see thousands of these:

    "8/31/16 5:28:34.745 PM Final Cut Pro[34305]: CGContextConvertRectToDeviceSpace: invalid context 0x0. If you want to see the backtrace, please set CG_CONTEXT_SHOW_BACKTRACE environmental variable.
    8/31/16 5:28:34.760 PM Final Cut Pro[34305]: CGContextConvertRectToDeviceSpace: invalid context 0x0. If you want to see the backtrace, please set CG_CONTEXT_SHOW_BACKTRACE environmental variable.
    8/31/16 5:28:34.760 PM Final Cut Pro[34305]: CGContextConvertRectToDeviceSpace: invalid context 0x0. If you want to see the backtrace, please set CG_CONTEXT_SHOW_BACKTRACE environmental variable.
    8/31/16 5:28:34.871 PM Final Cut Pro[34305]: CGContextConvertRectToDeviceSpace: invalid context 0x0. If you want to see the backtrace, please set CG_CONTEXT_SHOW_BACKTRACE environmental variable.
    8/31/16 5:28:34.874 PM Final Cut Pro[34305]: CGContextConvertRectToDeviceSpace: invalid context 0x0. If you want to see the backtrace, please set CG_CONTEXT_SHOW_BACKTRACE environmental variable.
    8/31/16 5:28:35.041 PM Final Cut Pro[34305]: CGContextConvertRectToDeviceSpace: invalid context 0x0. If you want to see the backtrace, please set CG_CONTEXT_SHOW_BACKTRACE environmental variable.
    8/31/16 5:28:35.244 PM Final Cut Pro[34305]: CGContextConvertRectToDeviceSpace: invalid context 0x0. If you want to see the backtrace, please set CG_CONTEXT_SHOW_BACKTRACE environmental variable.
    8/31/16 5:28:35.247 PM Final Cut Pro[34305]: CGContextConvertRectToDeviceSpace: invalid context 0x0. If you want to see the backtrace, please set CG_CONTEXT_SHOW_BACKTRACE environmental variable."

    And a few of these:

    "8/31/16 3:23:58.478 PM com.apple.xpc.launchd[1]: (com.apple.ReportCrash[34608]) Endpoint has been activated through legacy launch(3) APIs. Please switch to XPC or bootstrap_check_in(): com.apple.ReportCrash
    8/31/16 3:23:58.479 PM ReportCrash[34608]: Invoking spindump for pid=34305 wakeups_rate=2497 duration=19 because of excessive wakeups
    8/31/16 3:24:00.609 PM spindump[834]: Saved wakeups_resource.diag report for Final Cut Pro version 10.2.3 (276640) to /Library/Logs/DiagnosticReports/Final Cut Pro_2016-08-31-152400_Sams-Mac-Pro.wakeups_resource.diag"


    (FCP didn't crash or stop processing)
     
  12. Keebler macrumors 68030

    Joined:
    Jun 20, 2005
    Location:
    Canada
    #12

    Hi Joe,

    I have a 2013 nMP 12 core with D500s, but only 16GB Ram.

    Based on your knowledge, are you familiar with how an increase in ram could benefit FCX and Compressor?

    I ask b/c I had 32 GB ram on an older Mac Pro with Final Cut 7 and didn't really see any huge benefits to ram.

    But it's been recommended I add more ram to this machine although I'm skeptical.

    Then again, time is money so if it speeds this up or making scrubbing on a timeline more efficient (I notice FCX chugs when I'm scrubbing which I do alot and at fast speeds).

    Cheers,
    Brian
     
  13. joema2 macrumors 65816

    joema2

    Joined:
    Sep 3, 2013
    #13
    My iMac 27s have 32GB but I'm not sure how much performance gain you'd see. However by "chugging" if you mean a lot of disk I/O it might help that. In general I think 32GB (or more) is a good idea for a 12-core nMP used for video editing.

    OTOH if by watching your CPU and disk using either Activity Monitor or iStat Menus it is already CPU bound on those tasks, then additional memory would not help. My Macs also have Parallels Desktop so I wanted 32GB for the times I'm running Windows apps.
     
  14. dorsal macrumors member

    Joined:
    Aug 20, 2002
    #14
    As a point of reference, I have a 4 core Darth Mac with 32GB of RAM. With FCP X, PS CC, and iPhoto running, it never seems to use more than 20 GB of RAM total.
     
  15. ColdCase, Sep 3, 2016
    Last edited: Sep 3, 2016

    ColdCase macrumors 68030

    Joined:
    Feb 10, 2008
    Location:
    NH
    #15
    Here's one console caught burning CPU entry, although this one doesn't mention a throttle.

    "9/3/16 2:54:39.000 PM kernel[0]: process Final Cut Pro[36539] thread 22620170 caught burning CPU! It used more than 50% CPU (Actual recent usage: 58%) over 180 seconds. thread lifetime cpu usage 208.124810 seconds, (205.108022 user, 3.016788 system) ledger info: balance: 90001460815 credit: 207825320992 debit: 117823860177 limit: 90000000000 (50%) period: 180000000000 time since last refill (ns): 154692948450

    9/3/16 2:54:40.835 PM spindump[834]: Saved cpu_resource.diag report for Final Cut Pro version 10.2.3 (276640) to /Library/Logs/DiagnosticReports/Final Cut Pro_2016-09-03-145440_Sams-Mac-Pro.cpu_resource.diag"

    The resource diag report is lengthy.

    This machine has 32GB of RAM
     
  16. kohlson macrumors 6502a

    Joined:
    Apr 23, 2010
    #16
    Wasn't FCP7 32-bit, limiting the amount of memory that could be used to 4GB (per process/app)? 16GB's seems light for a 12-core set up. As joema2 explained, cores can only do the work is they have something to do, which means queued from disk through memory through support chips. For demanding workloads, it seems like more memory could help. The question I think you may be asking is: for the work your Mac Pro does, would more memory help?
     
  17. RubberShoes macrumors regular

    Joined:
    Jun 30, 2007
    #17
    We go by the rule 4GB ram/core and yes, FCP7 was 32bit and that was its flaw. It couldn't see or utilize 80% of your system's power. Premiere Pro is basically FCP7 these days but 64bit so it is able to utilize >2cores and >4GB ram. You could* trick FCP7/Compressor into using more resources by creating a local "cluster" with Qmaster but it wasn't all that efficient. Also CPU/GPU utilization is heavily dependent on what specific codec/process you're using in the editor. There are x264 encoders that can utilize the GPU and stuff like stabilization is an entirely separate process. ProRes is all CPU (you can hand off the instruction set to OpenCL but speed gains are minimal) and most of the time high-bandwidth codecs are limited not by processing power but I/O bandwidth in and out of the sytem.

    If you spent $$$$ for the 12 core just spend the few extra hundred and load that puppy with ram. That machine will sing for years to come.
     
  18. linuxcooldude macrumors 68020

    Joined:
    Mar 1, 2010
    #18
    While I'm sure it does not always use the GPU, I think FCP X uses the GPU a lot more than most editors. At least with Apples Compressor I know it uses primarily the GPU's for encoding. My last render/export used both GPU's with only 13% CPU load.
     
  19. joema2 macrumors 65816

    joema2

    Joined:
    Sep 3, 2013
    #19
    Technically speaking neither FCPX nor any other editor can meaningfully use a GPU for encoding and decoding of long-GOP content like H264. This is because the algorithm is inherently serialized and cannot be broken up into parcels for parallel execution on the many lightweight threads of a GPU.

    On appropriate hardware FCPX can use Intel's Quick Sync, which is essentially an on-chip ASIC (Application Specific Integrated Circuit) which accelerates the core encode/decode by executing elements of it directly in silicon. To achieve this it requires support resources from the on-chip GPU but it is not the GPU performing this.

    This is why the nMP despite having much more powerful GPUs than an iMac cannot do hardware accelerated encode/decode -- GPUs by themselves simply cannot do this. The Xeon CPUs in Mac Pros do not have Quick Sync circuitry, since Intel deemed this unnecessary for workstation and server class machines.

    Some GPU cards have add-on logic which are entirely separate from the GPU that can accelerate H264 encode/decode. This is accessed with a separate API that nVidia calls NVENC: https://en.wikipedia.org/wiki/Nvidia_NVENC and AMD calls VCE: https://en.wikipedia.org/wiki/Video_Coding_Engine However these are unique and proprietary to each vendor, so software must write specifically to those APIs. This along with the varying versions of each API has splintered the programming task and limited their use.

    Re rendering and export, technically rendering is fully computing the effects and editing changes in the timeline and writing those to a temp file. Depending on the effects this can be GPU-intensive, but sometimes it is CPU-intensive and the GPU isn't doing much. By contrast exporting is equated with encoding, and that is inherently a CPU-dominated activity, at least for H264 -- except where off-loaded by Quick Sync.

    Quick Sync currently only works for single-pass H264 (in current iMacs), however the quality is very good. FCPX does not yet support the newer, higher-compression codes HEVC/H264 or VP9, but partial hardware support is already present in Skylake CPUs, and Kaby Lake will have full support for 10-bit HEVC and VP9. Since these codecs are far more compute-intensive than H264, this will be important going forward.
     
  20. linuxcooldude, Sep 20, 2016
    Last edited: Sep 20, 2016

    linuxcooldude macrumors 68020

    Joined:
    Mar 1, 2010
    #20


    I'm getting around realtime rendering, meaning it encodes in the same length of the video. Sure it still uses the CPU, but far less than if it were all software.

    But yeah, it depends on what effects, transitions and so forth you have used in the timeline. But if you use background rendering it won't be as much as an issue.
     
  21. joema2 macrumors 65816

    joema2

    Joined:
    Sep 3, 2013
    #21
    The information in that video is incorrect. It is algorithmically impossible to use GPU acceleration to a meaningful degree for export and encoding of H264 or similar codecs. This has been studied by the smartest scientists on earth and there are many research papers written documenting these attempts which have all essentially failed. If anyone figures out how to do this they will probably will a Nobel prize.

    This was summarized by the lead developer of X264, Jason Garrett-Glaser, saying: "...countless people have failed over the years. nVidia engineers, Intel engineers...have all failed".

    The video is apparently confusing rendering vs export/encoding.

    Of course if you have an unrendered timeline, when you export that it will (if unrendered) go through a render phase before the export/encode phase. The render phase can be GPU accelerated but that has nothing to do with export/encoding.

    The good news is that encoding can be greatly hardware accelerated by custom silicon -- provided the software supports it. The logic may be bundled on the same GPU card (nVidia NVENC and AMD VCE) but it is separate logic with a separate API. Intel's Quick Sync also uses separate logic but they harness some on-chip GPU resources, whether video memory, bus or whatever. That is why Quick Sync is not available on Xeon. Intel did not want to waste millions of transistors on Xeon integrating a mediocre on-chip GPU just to enable Quick Sync. They really should have designed it differently but they are stuck with it now. This will get worse in the future since HEVC/H265 is much more computationally-intensive than H264, and if Intel doesn't devise a solution, future Xeon-powered workstations will be even worse handling these new video codecs relative to Quick Sync-enabled platforms than they are today.
     
  22. linuxcooldude macrumors 68020

    Joined:
    Mar 1, 2010
    #22
    But yet we have evidence contrary to what the scientists are saying.


    What we do know:

    CPU load is substantially reduced, both GPU's are being used, export time is substantially reduced.


    Couple of reasons why it does:

    Apple's products really are magical...lol.

    The use of VCE, Video Encoding Engine, on AMD cards that allow for hardware encoding of video during export.

    Some other factors we don't know about or a combination of factors.

    Whether or not you want to ague VCE is considered separate from the graphics card, but is still physically located on the card.
     
  23. joema2 macrumors 65816

    joema2

    Joined:
    Sep 3, 2013
    #23
    I have seen no evidence presented that H264 export time of a fully rendered timeline is substantially reduced by using GPU acceleration. Reduced from what? Export without GPU acceleration? How would that even be tested on a Mac? You cannot turn GPU acceleration on/off for timed export testing. Maybe on a Hackintosh or old Mac Pro you could remove the GPU card and test it that way. Is that what you're talking about?

    If the dual nMP GPUs are so fast at accelerating fully rendered H264 export, point me to a file and I'll export it from my iMac and we'll see which is faster. The nMP GPUs are vastly faster than the iMac GPU so if GPU acceleration is truly being used for export and encoding, I'd expect the nMP to be competitive.

    If anyone is interested in the technical details of why GPU acceleration of H264 encode/decode is mostly impractical, this tech talk by the lead developer of X264 discusses it. They made an extreme effort to obtain this acceleration when all others before them failed, yet the results they obtained were limited.

    They could only usefully leverage GPU acceleration on one component of X264 called Lookahead, which comprises only about 10-25% of the total H264 runtime. So they got a limited benefit on a limited code path. The results were measurable and marginally useful in some cases, but it was not a huge improvement on the scale typically associated with hardware acceleration, such as the 4x or 5x improvement in encode/decode performance from NVENC, VCE or Quick Sync:

     
  24. linuxcooldude macrumors 68020

    Joined:
    Mar 1, 2010
    #24
    I still expect the iMac to be faster on short exports, but on long exports I expect the Mac Pro to be faster due to thermal throttling on the iMac.

    Yes, in general. Another good indication of hardware acceleration is export times ( Other than CPU/GPU loads ). If encoding is substantially longer than the length of the video CPU/Software encoding is likely being used. If encoding times are close to the length of the video, your getting realtime encoding which is a good indication of hardware acceleration.

    Depending on other factors as well such as codec. But in general h.264.

    This is also coming from my experience on using the Matrox CompressHD card.
     
  25. joema2 macrumors 65816

    joema2

    Joined:
    Sep 3, 2013
    #25
    If you have a Mac Pro, pick out a video file, we'll both download it import it and do a timed export it as single-pass H264. I personally doubt any Mac Pro can do this faster than an iMac -- even on long exports, and in fact I expect the iMac to be significantly faster. IF the Mac Pro was using GPU acceleration for export, it would be competitive.

    About the only clear cut indication is comparing export times with GPU enabled vs disabled on the same machine and material. Unfortunately there is no easy way to do that on a Mac, short of pulling the GPU card on a Mac Pro tower or Hackintosh. That would be an interesting test; I wonder if anyone has ever done it?

    I'm not sure how revealing comparing export time vs playing time is. E.g, it generally takes much longer to export 60 sec of 4k vs 60 sec of 720p, yet they could both be using hardware-accelerated encoding.

    That said, my 2015 iMac takes 14 sec to export 60 sec. of H264 1080p using "Faster Encode", about 30 sec exported as "Better Quality" which supposedly doesn't use Quick Sync (but I'm no longer sure of that), about export 60 sec of H264 4k as 1080p, and about 53 sec to export 60 sec of H264 4k as 4k.

    Another uncertainty about comparing export time to playing time is CPU-only export actually *can* use a type of parallelization. Each GOP (Group of Pictures) is independent of all other GOPs in the file and can be processed in parallel by a separate core. The number of frames in each GOP is dependent on the encoder settings but is highly adjustable. Thus given the proper settings and file length, a 12-core Mac Pro could encode an H264 file fairly quickly -- using only CPU encoding.

    I believe the Matrox card used fixed-function customized circuitry to accelerate encoding, similar to Quick Sync, NVENC and VCE. I'm not sure I see the relevance to whether GPU-accelerated encoding is possible. If you just mean the Matrox card made a big improvement, that is understandable if that was an older machine with a slower CPU. On a newer machine with a late-generation multicore CPU, it's possible using CPU-only encoding the difference would be less marked. If you have any export times recorded from back then using the Matrox card, it would be interesting to compare that to the same material on a modern machine using CPU-only encoding.
     

Share This Page