Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.
You have set the CQ to 21.
So this will output different quality images for different hardware.
Hnadbrake does not explain this nicely, but you're compoaring apples to oranges.

Having the same setting does not mean that you'll have the same output, when HW accceleration is in the question.

Try setting different CQ values on M4, and compare the output image. CQ21 may mean a different think to M4 compared to Radeon GPU.
When you get the same quality image (for example CQ 21 in one, CQ 10 in other) then compare output fps times.
 
You have set the CQ to 21.
In the first posts, it is not active. There is no tick in front of "Constant Quality".
OP is using "Average Bitrate" there. (Their "Bit9K" profile.)
But you're right, to finally judge the different hardware encoders, one should genuinely compare the resulting images (not just file size).
I use IINA to select a frame and export it as a png.

Besides that, I never knew VideoToolbox taps into the T2. Always something new to learn.
As well as that QuickSync performance increases (slightly) with the number of CPU cores.
 
Last edited:
First of all, sorry that your new computer isn't giving the performance you hoped in this particular task and tool!

I work in video, and I noticed a few things that seem fairly exotic about your task. Don't take this as "you're doing it wrong" - I'm sure you have good reasons - but just things to consider about why this particular task might be an atypical use case. You might find better performance with a task that conforms to more standard parameters.

I don't regularly use Handbrake, I'm most experienced with encoding video in Adobe Media Encoder and Apple Compressor. You might consider trying out Compressor as a relatively low cost one time purchase, it would be the encoding tool best optimized for Apple hardware. Another option might be DaVinci Resolve since it's a free professional-level tool that proactively optimizes for new Apple hardware as well.

1. Scaling from a larger input file to a smaller output file. In general, scaling a video is something that must be done by the GPU or CPU, before the hardware encoders can create the final H.264 file. Note in your preset settings it says "VideoScaler" : "swscale". I don't know how Handbrake is doing this "swscale" but this part of the task may not be optimized for Apple Silicon or M4 in particular.

You will likely find it faster in this tool to do an encode where the output file is the same dimensions as the input file.

Also, the dimensions of the input file are not a "standard" video size such as DCI 4K or UHD. Hardware encoders and/or software pipelines may be most optimized for handling videos of industry standard dimensions, they might not work quite as well with a file of unusual dimensions, I'm not sure.

2. Variable frame rate - this is something I would avoid, for the same reasons. A variable frame rate is not a "standard" frame rate like 29.97, 23.976, etc. Apple does create VFR videos on iPhones or Quicktime Player recordings, it's true. But I don't know how Handbrake handles writing a VFR file. Something variable by definition is unpredictable. It must mean extra work for the system to decide when to write a frame or not.

I bet you will find the best performance choosing a constant frame rate, and more specifically the best choice is the exact same frame rate as the source file.

Ever since hand-cranked movie cameras were replaced by motor drives in the olden days, frame rates traditionally have been constant. Video software and hardware should have decades of optimizations for working with constant frame rates, that the new phenomenon of variable frame rate video wouldn't have. Not saying you're wrong for choosing VFR, just that it might be a factor in the encoding performance.

Lastly, in general, it may be that the architecture of M4 has some differences from past generations that would benefit from new optimizations that free cross-platform open source software may not have gotten around to yet. In an extreme example, Asahi Linux still only runs on Macs from the M1 and M2 generations, they don't support any M3 yet, let alone M4. Looks like the latest version of Handbrake supports MacOS 10.13 High Sierra still, which is great that they support so many old computers, but also indicates they probably aren't focused on working with the newest Apple technology.

See what performance you get in Handbrake with no video scaling, and a constant frame rate matching the source file, hope you notice an improvement. And thanks for sharing your experience, it's always good to know about video performance on new hardware.
 
Last edited:
@atonaldenim I am of the HandBrake developers, so I think I know a thing of two ;)

That preset key does nothing at the moment, HandBrake will use zscale, and swscale when zscale doesn't support the selected pixel format, both are quite optimized for arm. Or Apple VideoToolbox pixel transfer session when the hardware decoders are enabled (in the advanced preferences) and if everything can be run on the GPU. But VideoToolbox pixel transfer session is buggy so until Apple fix it the quality won't be too good.

VFR won't give any issue in HandBrake, or any speed penalty. Actually it's faster when using VFR because it does not need to analyze frames to decide which one to drop if the input FPS is different than the destination FPS.

The weird size is not an issue either, codec works with small blocks, so the actual width and height ratio doesn't matter.

The fact that HandBrake can run on 10.13 doesn't matter, most of the optimizations are hand written assembly, the deployment target won't matter.

If AAPLGeek could provide an activity log I could take a look if there is setting that can be improved, but on Apple Silicon VideoToolbox and quality preset ~50/60 fps are normal at 4k.
 
Just wondering... Did you install it from scratch on the M4 Mac? Or did you move a disk (or its contents) from the Intel Mac, and end up with an Intel binary using emulation?
 
Pretty sure handbrake is not well optimized for Mac? And M4 in just about anything will run faster, if not laps, around that Intel chip
Absolute nonsense. Yes, Handbrake is well optimized for Mac, both Intel and Apple Silicon.

And yes, the M4 is usually faster than an Intel iMac, but how is that relevant here? That's not the topic of discussion. The topic in this thread is an apparent exception to that rule.
 
Try to set the encoder preset to "performance".
VideoToolbox on Apple Silicon has got two quality settings, while on Intel setting "performance" or "quality" will make zero difference, another difference is that on Apple Silicon supports the constant quality mode, and on Intel it doesn't.
Plus those are still two totally different hardware encoder, the resulting quality might vary a lot.

Anyway, the bottleneck might be in the decoder, some things are still not optimized for ARM yet.

Thank you for this suggestion. I see either "speed" or "quality" preset option, there no "performance" option on version 1.9.0.

This is what I see with M4 mini:

4096x2300 input - no scaling and no filters, same Bit9k preset as above.
  • VTB - H265 "speed" - 90fps - Noticeable artifacts compared to the “quality” encoder at the same bitrate.
  • VTB - H265 "quality" - 54fps - Some artifacts and overall still lower quality than the H264 encode at the same bitrate.
  • VTB - H264 "speed" - 55fps
  • VTB - H264 "quality" - 55fps
No visible difference in output quality when using H264 encoder at “speed” or “quality” preset.

I'll test this on my Intel machine later, but it's interesting that with the H264 encoder it makes no difference which preset I choose.


That's why I'm so happy I went for a M4 Pro Mac. M4 base is too limiting.

M4 Pro is obviously 2x as fast sometimes, but it has the same hardware encoder as the bog standard M4. So in reality you’re not encoding any faster with your M4 Pro.


Using the Handbrake 1.9.0 that reckons is upto date

Encoding an FCP X master file output .mov file with the H.265 Apple VideoToolBox 1080P default preset and ALL filters are Off.
Encoding at 340fps on M1 Studio Base System
HandbrakeXPCService is using 868% CPU
Handbrake itself is 0.4%
VTEncoderXPCService is using 8.4%
96.11Gb Source File 1080 resolution
Activity Monitor shows around 90%+ for CPU Load

Take the original .mpg file tv record file and then the same preset

Encodes at 467fps
HandbrakeXPCService is 336%
Handbrake itself is 0.5%
VTEncoderService is 13.6%
4.75Gb Source File 1080 resolution
System floating around 30% CPU in Activity Monitor for CPU Load

Is definitely using my CPU Cores when encoding with the VideoToolBox

Tried the Production Max preset as no filters set there and H.264 (x.264) so no hardware

And the same FCP X output file is 40fps and HandbrakeXPCService is 950%+ and CPU in activity is topping out.
Handbrake Service is 0.2
VTEncoderXPCService not showing up as being used of if is so low that not populating onto the screen.

And yes it is showing as Apple as the Kind for the Program Core so don't have an Intel Mac version installed by mistake.

The larger source file definitely takes a lot more CPU to service into the VTEncoderXPCService

But my CPU Cores definitely seem to be in use even with the VideoToolBox Encoder set and no Filters in Handbrake.

I get 200fps on my M4 mini with that preset. You haven’t specified your input resolution but you’re likely getting 340fps because M1 Max has 2 encode engines.

HandbrakeXPCService is using 868% CPU because you’re scaling down your video with that preset. I too get around 500% CPU usage on that preset because any scaling engages the CPU cores.


Useful article comparing transcoding times and what's happening using Quicksync alone, Quicksync and T2 together and neither Quicksync nor T2 at: https://appleinsider.com/articles/1...t-difference-in-video-encoding-for-most-users

This suggests "VideoToolbox taps into the T2 if available, and any Intel QuickSync support that any given machine has". This may go some way towards explaining how both processes used together in the older system may be so effective against just the T2-like hardware encoder built in to newer M series media engine(s).

Interesting article, but I’m not convinced their findings apply to T2 machines with a dGPU.

The Intel QuickSync encoder is a part of the iGPU and since Apple disables the iGPU entirely on Intel iMacs with dGPU, I’m not sure VTB can tap into any part of QuickSync.
 
  • Like
Reactions: blob.DK
You have set the CQ to 21.
So this will output different quality images for different hardware.
Hnadbrake does not explain this nicely, but you're compoaring apples to oranges.

Having the same setting does not mean that you'll have the same output, when HW accceleration is in the question.

Try setting different CQ values on M4, and compare the output image. CQ21 may mean a different think to M4 compared to Radeon GPU.
When you get the same quality image (for example CQ 21 in one, CQ 10 in other) then compare output fps times.

As @arw correctly pointed out, I’m using the average bitrate option instead of the CQ setting and it’s identical across both machines.

Having the same setting does not mean that you'll have the same output, when HW accceleration is in the question.

I’m aware of that. In my testing and the preset I’m using which essentially does nothing but scale the bitrate down to 9K, I get nearly identical quality output with both machines.

I did some digging and as another poster discovered, the most likely reason for that is the iMac 2020 uses the Apple T2 hardware encoder.

So it’s really Apples to Apples comparison in this case.


First of all, sorry that your new computer isn't giving the performance you hoped in this particular task and tool!

I work in video, and I noticed a few things that seem fairly exotic about your task. Don't take this as "you're doing it wrong" - I'm sure you have good reasons - but just things to consider about why this particular task might be an atypical use case. You might find better performance with a task that conforms to more standard parameters.

I don't regularly use Handbrake, I'm most experienced with encoding video in Adobe Media Encoder and Apple Compressor. You might consider trying out Compressor as a relatively low cost one time purchase, it would be the encoding tool best optimized for Apple hardware. Another option might be DaVinci Resolve since it's a free professional-level tool that proactively optimizes for new Apple hardware as well.

1. Scaling from a larger input file to a smaller output file. In general, scaling a video is something that must be done by the GPU or CPU, before the hardware encoders can create the final H.264 file. Note in your preset settings it says "VideoScaler" : "swscale". I don't know how Handbrake is doing this "swscale" but this part of the task may not be optimized for Apple Silicon or M4 in particular.

You will likely find it faster in this tool to do an encode where the output file is the same dimensions as the input file.

Also, the dimensions of the input file are not a "standard" video size such as DCI 4K or UHD. Hardware encoders and/or software pipelines may be most optimized for handling videos of industry standard dimensions, they might not work quite as well with a file of unusual dimensions, I'm not sure.

2. Variable frame rate - this is something I would avoid, for the same reasons. A variable frame rate is not a "standard" frame rate like 29.97, 23.976, etc. Apple does create VFR videos on iPhones or Quicktime Player recordings, it's true. But I don't know how Handbrake handles writing a VFR file. Something variable by definition is unpredictable. It must mean extra work for the system to decide when to write a frame or not.

I bet you will find the best performance choosing a constant frame rate, and more specifically the best choice is the exact same frame rate as the source file.

Ever since hand-cranked movie cameras were replaced by motor drives in the olden days, frame rates traditionally have been constant. Video software and hardware should have decades of optimizations for working with constant frame rates, that the new phenomenon of variable frame rate video wouldn't have. Not saying you're wrong for choosing VFR, just that it might be a factor in the encoding performance.

Lastly, in general, it may be that the architecture of M4 has some differences from past generations that would benefit from new optimizations that free cross-platform open source software may not have gotten around to yet. In an extreme example, Asahi Linux still only runs on Macs from the M1 and M2 generations, they don't support any M3 yet, let alone M4. Looks like the latest version of Handbrake supports MacOS 10.13 High Sierra still, which is great that they support so many old computers, but also indicates they probably aren't focused on working with the newest Apple technology.

See what performance you get in Handbrake with no video scaling, and a constant frame rate matching the source file, hope you notice an improvement. And thanks for sharing your experience, it's always good to know about video performance on new hardware.

Thank you for the post and suggestions, but I think you missed a few key things in my posts as handbrake screenshots can be confusing to a non-user.
  1. In the first post, I’m not scaling the output at all. As the handbrake developer explained above, that flag does nothing at the moment. A few posts later, I’ve posted a screenshot with scaling to UHD, but the speed difference still remains the same across both machines.
  2. The framerate option too is set to “same as source”. Selecting variable or constant frame has no bearing because Handbrake doesn’t touch the input framerate when selecting “same as source” option.
I’ll try Apple compressor and see if I notice anything interesting.
 
Last edited:
@atonaldenim I am of the HandBrake developers, so I think I know a thing of two ;)

That preset key does nothing at the moment, HandBrake will use zscale, and swscale when zscale doesn't support the selected pixel format, both are quite optimized for arm. Or Apple VideoToolbox pixel transfer session when the hardware decoders are enabled (in the advanced preferences) and if everything can be run on the GPU. But VideoToolbox pixel transfer session is buggy so until Apple fix it the quality won't be too good.

VFR won't give any issue in HandBrake, or any speed penalty. Actually it's faster when using VFR because it does not need to analyze frames to decide which one to drop if the input FPS is different than the destination FPS.

The weird size is not an issue either, codec works with small blocks, so the actual width and height ratio doesn't matter.

The fact that HandBrake can run on 10.13 doesn't matter, most of the optimizations are hand written assembly, the deployment target won't matter.

If AAPLGeek could provide an activity log I could take a look if there is setting that can be improved, but on Apple Silicon VideoToolbox and quality preset ~50/60 fps are normal at 4k.

Thank you for your input. I can share the logs via private message later if it's ok.

Anyway, the bottleneck might be in the decoder, some things are still not optimized for ARM yet.

As the test is essentially now Apple A10 vs Apple M4 encoding performance, it's likely that some ARM optimizations might speed things up for the M4.
 
Thank you for this suggestion. I see either "speed" or "quality" preset option, there no "performance" option on version 1.9.0.

This is what I see with M4 mini:

4096x2300 input - no scaling and no filters, same Bit9k preset as above.
  • VTB - H265 "speed" - 90fps - Noticeable artifacts compared to the “quality” encoder at the same bitrate.
  • VTB - H265 "quality" - 54fps - Some artifacts and overall still lower quality than the H264 encode at the same bitrate.
  • VTB - H264 "speed" - 55fps
  • VTB - H264 "quality" - 55fps
No visible difference in output quality when using H264 encoder at “speed” or “quality” preset.

I'll test this on my Intel machine later, but it's interesting that with the H264 encoder it makes no difference which preset I choose.




M4 Pro is obviously 2x as fast sometimes, but it has the same hardware encoder as the bog standard M4. So in reality you’re not encoding any faster with your M4 Pro.




I get 200fps on my M4 mini with that preset. You haven’t specified your input resolution but you’re likely getting 340fps because M1 Max has 2 encode engines.

HandbrakeXPCService is using 868% CPU because you’re scaling down your video with that preset. I too get around 500% CPU usage on that preset because any scaling engages the CPU cores.




Interesting article, but I’m not convinced their findings apply to T2 machines with a dGPU.

The Intel QuickSync encoder is a part of the iGPU and since Apple disables the iGPU entirely on Intel iMacs with dGPU, I’m not sure VTB can tap into any part of QuickSync.
What do mean I didn’t specify resolution of input.
For each of the two input files then have the size of the file and that was a 1080 resolution as in full hd 1920 x 1080.
With a 1080 output and 1080 input then what scaling am I doing
 
And M4 in just about anything will run faster, if not laps, around that Intel chip

Respectfully, parroting glib marketing-isms is not helpful and something MacRumors could do with a lot less of.

There's plenty about Apple Silicon that is more performant than equivalent alternate architectures (read: Intel). But often victories in one direction beget deficiencies in another. A more rapid pace of architecture change makes it difficult for software developers to find ways to leverage new capabilities to accelerate output (assuming those heavily-marketed and trumpeted capabilities even apply).

You might be surprised just how performant a modern x86 chip can be, and that performance more readily accessible thanks to a far wider set of knowledge for how to leverage it.

This is not to say there isn't a solution for the OP or that this is expected behavior. It's just that there is no magic "moar faster!" when intersecting architecture and software.

It's been fascinating to read this thread as people effort a solution—thanks to everyone offering considerable knowledge and experience!
 
Last edited:
What do mean I didn’t specify resolution of input.
For each of the two input files then have the size of the file and that was a 1080 resolution as in full hd 1920 x 1080.
With a 1080 output and 1080 input then what scaling am I doing

My bad. I missed that you specified the input resolution a few lines below.

Still, the high CPU usage could be explained by any number of things except the actual video encoding.

Here's a quote from directly handbrake's documentation:

Only video encoding is performed by the hardware encoder. Every stage prior to and after video encoding including decoding, filters, audio/video sync, audio encoding, muxing, etc., is performed by the CPU. As a result, it is normal to have high (even 100%) CPU utilisation during encodes.


Edit: Just for a comparison, I ran a longer encode on the same 4096x2304 input using the H.265 Apple VideoToolBox 1080P preset and I get over 288fps. If I do a 1080p->1080p using the same preset, I get over 352fps.

1080p is useless to me though.
 
Last edited:
Going to share some of my personal testing first, then share the results of using one of your presets on a random file I have lying around.

I currently am running an M4 Pro mini (12c/16c/24GB/512GB) and an M1 mini (16GB/512GB).

I briefly tried an M4 non-pro mini (10c/10c/24GB/512GB) before the M4 pro, and had an M1 Max 14" MBP (24-core GPU/32GB/1TB) that got traded in for the M4 Pro. I did some like-for-like comparison between the M1, M4 non-pro, and the M1 Max, using handbrake's then-current version (this would have been right at release of the M4s), the same input file and preset.

In that round of testing, I was doing h.264 to x265 with videotoolbox. I didn't save my exact FPS but I saw approximately a 25% performance gain from M1 to M4, but the M4 was about 25% slower than the M1 Max. Unfortunately I no longer have the M1 Max or regular M4 to compare with so I cannot say if the newer versions of handbrake have closed that gap.

I had to wait for the M4 pro for a bit, so I did not do the exact same comparison there.

This thread had me do a fresh comparison, using some files I was working with over the weekend. This was an x265 4K to x265 1080p downscale. Ran the same on my M1 and M4 Pro, got 142 vs 258 fps using handbrake 1.9.0 release (snapshot build performed slightly worse, 243fps).

Using your preset to transcode the same 4K x265 file in my previous test, the M4 Pro averaged 63fps while the M1 averaged 42fps.

Anyways, long story short you getting the same results from T2 on an M4 doesn't smell right. If you want to point me at some reference source file, I'm happy to run a comparison on both machines with the preset you provided and if you wanted to try a few others, too.
 
This thread had me do a fresh comparison, using some files I was working with over the weekend. This was an x265 4K to x265 1080p downscale. Ran the same on my M1 and M4 Pro, got 142 vs 258 fps using handbrake 1.9.0 release (snapshot build performed slightly worse, 243fps).

Using your preset to transcode the same 4K x265 file in my previous test, the M4 Pro averaged 63fps while the M1 averaged 42fps.

Anyways, long story short you getting the same results from T2 on an M4 doesn't smell right. If you want to point me at some reference source file, I'm happy to run a comparison on both machines with the preset you provided and if you wanted to try a few others, too.

So you're essentially seeing the same results in your M4 Pro tests.

In my post right above yours, I posted 4K to H265 1080 using "H.265 Apple VideoToolBox 1080P" preset and I get over 288fps on base M4 Mini.

My bit9k preset maxes out at around 55fps when doing 4K to 4K H264, compared to your 63fps on M4 Pro.

The T2 is actually around 20-30% faster and it makes no sense why.

The only way to get higher fps doing 4K to 4K is selecting the H265 encoder with the "speed" preset. I get over 103fps with that, but the quality is lot worse at the same bitrate.
 
Last edited:
My bad. I missed that you specified the input resolution a few lines below.

Still, the high CPU usage could be explained by any number of things except the actual video encoding.

Here's a quote from directly handbrake's documentation:




Edit: Just for a comparison, I ran a longer encode on the same 4096x2304 input using the H.265 Apple VideoToolBox 1080P preset and I get over 288fps. If I do a 1080p->1080p using the same preset, I get over 352fps.

1080p is useless to me though.
Well that explains why Handbrake is still a monster on the CPU even though doing the hardware encoder.

Which is why I was puzzled when you said was only a monster when not using the hardware encoder based on what I see on mine.

I wonder if your issue is that the I5 can prep the work quicker then the M4 does so the T2 is fed quicker then the MediaEngine on the M4.

Bit disappointing that Handbrake doesn't use the DecodeHardware. Running Compressor for comparison and that has 2 of the VTDecoderXPCService and 2 of the VTEncoderXPCService whereas Handbrake just has 1 VTEncoderXPCService when encode with Handbrake indicating Handbrake only using 1 of my Encoders and neither Decoder. CPU is about 80% idle as well so much more efficient use of the hardware.

As my file is exported in Prores422 then having to decode that on the CPU before Encoding whereas Compressor uses both ProRes Decoders. Decoding ProRes on the CPU not going to be surprised using the CPU a lot.
 
HandBrake can use the hardware decoders, as written in https://handbrake.fr/docs/en/latest/technical/video-videotoolbox.html

The Apple Silicon hardware decoder is not the same as the T2, and the software is different too. The Apple Silicon one supports more features like weight-p more, so the quality is supposed to be better.
Thanks for that. Running that now and creating profile off the VideoToolbox preset then 470fps and the VTEncoderXPCService and VTDecoderXPCService are running an instance each. Had to make sure enabled the Also use in conjunction with software decoders. HandbrakeXPCService is now 73.8%
That is some drop in CPU usage.
Hopefully Handbrake will be able to get 2 of those running to use the Max media engines fully.
Bought the Studio as no M1 Pro Mini and wanted the ProRes Engine but will most likely get an Mx Pro probably around the M9/10 depending upon how long Apple support the M1 Max Studios.
Going to see if can live the hype of "set for 10 years"
 
Because the T2 is just an older A10 chip, and the M* can do everything the A10 can do, but the hardware encoder is either a little different, or configured differently to get a better picture quality.
 
  • Like
Reactions: blob.DK
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.