rMBP UI performance (Safari) benchmarks

Discussion in 'MacBook Pro' started by leman, Nov 6, 2012.

  1. leman, Nov 6, 2012
    Last edited: Nov 6, 2012

    leman macrumors G3

    Oct 14, 2008
    In order to bring some light into the issue I have now done what I should have for a long time - some systematic tests.

    Test machine: retina MacBook Pro 2.3Ghz, 16GB RAM, 10.8.2 with Safari 6.0.2

    Method: I switch resolutions while manually switching between HiDPI and 'normal mode' resolutions. I use safari (always maximised window) to display www.theverge.com, as this is a very difficult to render website and quickly scroll up-down along the whole height of the website. At the same time, I use QuartzDebug to monitor current UI refresh FPS. For each settings, I determine the most 'stable' FPS value (by eyeballing the QuartzDebug output). Now, I discard the lowest FPS value because the FPS will dip inadvertently at the point where the scrolling direction is changed - the UI will only redraw if there is a need to (e.g. picture has changed). For instance, as I am currently typing this the meter is at 10 FPS - because only a very small portion of the screen (the input form) must be updated infrequently (reflecting the speed with which I type) and 10 FPS is enough to provide smooth UI updates.

    I didn't record the whole thing because a) I don't have any equipment to do this and b) it would be too much work. I threw this thing together during breakfast ;)

    Note: at high-res normal DPI modes, the rendered width of the website is much smaller than in the HiDPI mode. This is 'unfair' because it obviously reduces the workload compared to HiDPI modes Thus, for high-res modes I would do the test twice - once with the 'default' zoom settings (thevere occupies around 1/4 of the screen width on 2880x1800 mode and even less on 3840x1200 mode) and then zoomed in until theverge occupies around 1/2 screen width to make it comparable to what happens on HiDPI resolutions. For such resolution, I give two results in form of a (b), where a is the FPS when the width is normalised (zoomed in to 1/2 width) and b non-normalised (left as default)


    HD 4000 GT 650M
    2880x1800 20 (40) 20 (40)
    3840x2400 10' (40) 10' (40)
    1920x1200 50 45

    1680x1050 (HiDPI) 25 30
    1920x1200 (HiDPI) 25 30

    All modes showed some micro-stutters, especially near the top part of the website, which uses complex tile layout. The stutter was very light at 1920x1200 (non HiDPI) and extremely noticeable at higher-res non HiDPI modes. I couldn't feel any difference between the micro-stutters in the tested HiDPI modes.


    The benchmarks suggest that the HiDPI modes actually offer higher performance than the non-HiDPI modes with the same amount of pixels. Subjectively, the scrolling was very smooth at 1920x1200 (non HiDPI) and 'ok' on the HiDPI modes (save for some micro-stutter). It was really bad on the higher-res non-DPI modes.

    On HiDPI modes, the results were better for the 650M, subjectively I couldn't tell any difference. Funny result: the HD 4000 actually appears to be faster in 1920x1200 (non-HiDPI) than the 650M. Could this have something to do with RAM access? I.e. HD 4000 can directly use system RAM as texture data while 650M has first to copy them to VRAM.

    Also, take these results with grain of salt because the performance is not really consistent. I am also not sure how reliably the gfxCardStatus works for switching. Safari performance seems to degrade over time, so i hat to relaunch it several times. A more reliable benchmark would be to repeat the experiments multiple times and use proper statistics to analyse the FPS.
  2. tau101 macrumors member

    Jul 28, 2012
    I have to say I am not sure that the methodology you employ here really quantifies the issues people experience when it comes to rendering graphic heavy websites in browsers.

    I have read a number of posts by people attempting to do this, so I felt I should share my own methodology which I used to quantify the problem in my own mind.

    Here is a very simple illustration of the start what I think is a better process to this end using Chrome as the browser. I will leave it to the interested reader to determine their results, rather than providing my own.

    My rMBP spec: 2.7GHz, 16GB RAM, 768GB SSD.
    My browser: Chrome 23.0.1271.64
    My test page of choice: http://memebase.cheezburger.com/senorgif

    Go to about://flags in Chrome and make the following changes:
    1) GPU compositing on all pages = enabled
    2) Threaded compositing = enabled
    3) FPS counter = enabled

    Now compare scrolling on the test page in Chrome in HiDPI 1920x1200 (which is actually 3840x2400 scaled down) to scrolling at 1920x1200. The current FPS is displayed, and the change in FPS as you scroll is plotted on a graph. You will see significantly more downward spikes to FPS in the 10s running in HiDPI. This is because the hardware does not have the power to meet the demands of the browser's method of rendering this page. It is exactly these spikes to a low FPS that are the issue, so averaging the FPS during scrolling to me does not make sense.

    Some have suggested this is a "software issue" which I read to mean it may be mitigated by future improvements in how the browser renders for high resolution displays. I will be delighted if this turns out to be the case.

    For my purposes this is not a problem and I use QuickRes (http://www.quickresapp.com/) to switch to the equivalent low DPI setting for my desktop resolution when I require very occasionally.
  3. leman thread starter macrumors G3

    Oct 14, 2008
    True, but what exactly does it mean? Is it the hardware which is 'too slow' or crappy software? For instance, in Chrome there is a known bug which results in cache misses in HiDPI mode, which is obviously a big performance killer. However, things like these are hardly a hardware limitation.

    Anyway, my results, as imprecise as they are, seem to suggest that you don't get much improvement from switching between the IGP and the dGPU. So the GPU performance is hardly an issue. The real problem will be in the general rendering setup overhead. And here, there will me many opportunities for optimisation, of this I am sure.

    This is a good point. However, it is crucial to understand what the FPS spikes actually represent. Is it a bottleneck? Is it while the browser is waiting for some external data? Or is it because there is simply no need to refresh the page at this point? In this regard, I am not sure that your website is a very good choice - as it loads the data dynamically as you scroll + is rather video heavy.
  4. nontroppo, Nov 7, 2012
    Last edited: Nov 7, 2012

    nontroppo macrumors 6502


    Mar 11, 2009
    Not at all, it only shows that Chrome's page layout algorithms cannot sustain a given FPS, the cause of the bottleneck is not measured. As there is an open bug in the Chrome bug tracker that exactly describes this, and is to do with cacheing of image decoding, the evidence points to this being a software problem...

    leman: FPS is a very relative measure as you clearly state, we don't know if low FPS is because of a constraint or "optimal" redrawing. The other parameter of interest would be CPU / GPU (GPU hard to measure, ATMonitor used to but 10.8 support has removed that) use. Also, can scrolling be instrumented with automator or more robustly, something like Selenium:

  5. tau101, Nov 7, 2012
    Last edited: Nov 7, 2012

    tau101 macrumors member

    Jul 28, 2012
    A cache miss will result in the image being re-downloaded from the server. This should not impact the scrolling performance at all.

    The fact that your results indicate little improvement in switching between the integrated GPU and discrete GPU should suggest to you that your methodology is flawed.

    This is a superb point, and I thought that an astute individual would point this out after I made my first post. Some of the spikes you will see in the low DPI mode will be due to javascript being triggered when you reach the end of the page to load the next page to create infinite scroll. You can eliminate these from your test by simply scrolling very far, say 10 pages, and then just scrolling around these images when loaded. The dynamic download of images shouldn't impact FPS but, for the sake of argument, if it did you can easily eliminate this factor from your tests by simply allowing all the images you want to test to load.

    I personally feel this is the best page to demonstrate the limitations of the current hardware unambiguously. There are no videos, by the way, just animated gif images.

    In your first sentence you have restated what I said, but placed more emphasis on blaming "Chrome's page layout algorithms". I chose my words carefully when I stated "that the hardware does not have the power to meet the demands of the browser's method of rendering this page". The reason I chose to say it that way is that your statement presumes that there are optimisations that can be made that would eliminate the difference. This is not a trivial assumption if you truly understand the task that the browser is trying to accomplish here. I personally sincerely doubt this issue will be any more than mitigated in the future by optimisations.

    The test I describe demonstrates unambiguously that given the current hardware, the chosen browser, and its implementation that the rMBP struggles rendering the test webpage in HiDPI resolutions vs their low DPI-equivalent resolution. To state that the evidence provided by this test points to a "software problem" is absolutely false.
  6. pgiguere1 macrumors 68020


    May 28, 2009
    Montreal, Canada
    Thanks for the test, it gives us additional information on the problem although the origin of the bottleneck remains unclear.

    Anand implied the scrolling issues have to do mainly with decoding of compressed image which is all handled by a single CPU thread (at least in Safari 5).

    It does seem that websites causing the most scroll lag (The Verge, Facebook) make heavy use of compressed images.

    Another thing they have in common is use of pretty intensive graphical effects such as overlapping of semitransparent CSS3 gradients over images. I'm not sure how browsers technically handle such effects but it may be pretty intensive to have to render the result of a dynamically created gradient with an alpha channel over a compressed image (JPEG), especially if this is has poor threading and no GPU acceleration and drive a single CPU core at maximum load as shown in Anand's tests.

    An useful test to try to learn more about those issues might be to write various test web pages with heavy use of CSS3 effects to see if this is part of the problem.
  7. leman thread starter macrumors G3

    Oct 14, 2008
    There are different kind of caches. In this particular case I am talking about the Chrome resize cache (where resized images are stored). To quote a Chrome developer:

    There will be probably more issues like this.

    No! It might also indicate that the GPU performance is much less of a factor than you assume. In truth, all these tasks are fairly trivial for a modern GPU, even an integrated one. The HD 4000 is able to fluently play Skyrim, which has multiple hundreds or even thousands application of 512x512 textures per frame + some heavy per-fragment computations. The few orthogonal texture applications used in UI rendering is really much less demanding. The bottleneck probably lies elsewhere - rendering setup cost, font rendering, layout computation - all thee things are done on the CPU. The question is simply whether they can be done more efficient. I believe yes.
  8. tau101 macrumors member

    Jul 28, 2012
    Given the evidence it should not surprise you to learn that playing a video game which has many low resolution textures and is heavily GPU optimised is less taxing to the rMBP hardware than image resizing at very high resolutions. I think you are being much too optimistic in your expectations of improvements that can be made by the browser.
  9. leman thread starter macrumors G3

    Oct 14, 2008
    Texture application is the same thing as image resizing. And HD 4000 provides multiple gigatexels of texture pixel performance per second. HiDPI 1920x1200 is around 9 megatexels.

    Of course, a proper OpenGL benchmark is in order - I will try to quickly write one up if I will find the time. No promises though. The algorithm would be fairly simple - upload a random 3840x2400 pixels to a texture and apply it on a quad to a 2880x1800 FBO using linear filtering; and see how much FPS you will get using the HD 4000 and GT 650M.
  10. stevelam macrumors 65816

    Nov 4, 2010
    Sorry but what the hell is a 'complex tile layout'. How is a bunch of divs next to each other complex at all?
  11. leman thread starter macrumors G3

    Oct 14, 2008
    Dunno, ask the people who wrote the browser layout computation engine.
    I have no idea why, but there seems to be some micro-stuttering exactly on that part of the website. Maybe it has something to do with image decompression, as pgiguere1 suggests.

    I agree with you that I should have probably avoided that particular expression :)
  12. tau101 macrumors member

    Jul 28, 2012
    Texture application involves resizing the texture, that certainly doesn't make it the same thing as image resizing! This is particularly ridiculous if you consider how different the process of rendering a frame in a video game is to a browser rendering a webpage sized to fit a 3840x2400 display as if it were a 1920x1200 display!

    I am not sure how this will address the issue of slow UI animations (scrolling included) with which we are all concerned. The rMBP is certainly a powerhouse, if that is what you're trying to prove.

    Incidentally, I feel I should point out that just because an algorithm can be simply described in a sentence it doesn't mean that it cannot be so computationally expensive that it could cause the hardware running it to struggle, as is the case here.
  13. leman, Nov 7, 2012
    Last edited: Nov 7, 2012

    leman thread starter macrumors G3

    Oct 14, 2008
    I don't understand what you are trying to say. The GPU role in all that desktop composition thing basically boils down to render a bunch of textures (representing the UI elements) onto the big offscreen 3840x2400 buffer and then resize this buffer to the native resolution. It is also entirely possible that the GPU also has a part in rendering at least some of the vector-based graphics. The resizing part (e.g. translating an image of size A to size B) is exactly what texture application is - you map a texture of size A to a quad of size B. The popular theory is that the IGP is too slow to handle such textures in the context of the HiDPI rendering - this is the theory I believe is wrong. My theory is that the GPU is actually not utilised well enough and that more tasks (like image decoding or potentially even font rendering) could be offloaded to it.

    The principal difference between rendering a game and a website is that the website will have more if its pixel-based content (textures) created on the CPU side of the software stack, there is basically no geometry transformation and only few per-fragment computation (also knows as pixel shaders). Most of performance requirements for the GPU here are thus texture upload speed (which is irrelevant for the iGPU as it has direct access to the system RAM) and texture filtering performance. A game in any case is MUCH more computationally intensive.

    It won't :) I am simply curious whether the performance problems are really due to the HD 4000 being slow (which is the popular theory) or is it something else (e.g. the CPU side of the workflow) which is struggling. If it is possible to show that the HD 4000 in fact does not have any performance problems with huge textures than it means that the GPU is not the culprit, but something else is.

    I wasn't implying that it would not be computationally expensive, I was simply saying that a benchmark of that kind shouldn't be difficult to write.
  14. tau101 macrumors member

    Jul 28, 2012
    Taking your last point first, with respect to the complete process of decoding an image and rendering it for the user a game is, in this case, not as computationally intensive. I believe this is what has been demonstrated.

    Given the description in your initial paragraph I think I understand where the confusion is coming from, and imagine you're unable to reconcile why games run so well and UI animations less so. The logical conclusion if you equate the two and slowdown symptoms manifest in one is a software issue, so I can see where you're coming from.

    However, the fact is that you are trivialising the difficulty of the task presented to the browser from the perspective of the machine running it. The textures in a modern video game engine are pre-loaded precisely so that this does not have to be done on the fly, which would badly impact performance. Much more importantly though, no video game has to render as many pixels as 3840x2400 per frame in any case. People have set various games to 2880x1800 and noted a significant performance impact. This is a more a like-for-like comparison if you wanted to equate texture resizing in video games to rendering a webpage at high resolutions.
  15. nontroppo macrumors 6502


    Mar 11, 2009
    Perhaps we are at semantic ends of the same pole. For me, a hardware issue implies that a bottleneck is hardware in origin. But you can block a software loop easily and slow down a particular operation without ever getting close to a hardware ceiling. If your framerate depends on another software operation completing, and it doesn't complete (because it is measuring the wrong thing, it doesn't take into account an API change, nothing that locks it to a hardware ceiling etc etc), then your limit is software not hardware based.

    I think you cannot make the categorical statement it is hardware-based; I didn't make a categorical statement otherwise, merely an empirical bias. You cannot discount a software-only origin, because weak algorithms can be limited in their efficiency with ample hardware resources to spare.
  16. leman thread starter macrumors G3

    Oct 14, 2008
    And this is precisely the reason why I would like to evaluate exactly these kind of tasks with a test benchmark ;)
  17. tau101 macrumors member

    Jul 28, 2012
    I agree with all this.

  18. nontroppo macrumors 6502


    Mar 11, 2009
    When I get time, I can make some synthetic benchmarks as I can enable full scene anti-aliasing (FSAA) via OpenGL using my graphics pipeline (we use an OpenGL wrapper in Matlab to generate stimuli for vision perception research). That will at least answer the argument of how resampling affects the HD4000 and 650M — resampling algorithms are different between vendors. Our OpenGL wrapper is not currently retina aware, but simply runs at the resolution of the screen so there will be two levels of resampling depending on the screen mode...
  19. leman thread starter macrumors G3

    Oct 14, 2008
    Can you explain a bit more on what you want to do? I don't see how FSAA (if we are talking about multi-sampling) has anything to do with the issue, as it only samples geometry. For desktop composition, geometry AA is not an issue, as its about flat quads with no projection transformation anyway. OS X on HiDPI mode uses explicit super-sampling (rendering to a 2x2 buffer and then downscaling).

    A note of caution about screen resolution - the screen resolution of a 1920x1200 HiDPI screen is still 1920x1200 logical pixels, if you create a full-screen OpenGL view, it will be 1920x1200 pixel wide as well. OS X will then pixel-double the resulting image to match the true underlaying resolution (see also http://developer.apple.com/library/...ngScreenContents/CapturingScreenContents.html ). So you have to be extra careful to make sure what you are actually benchmarking :)
  20. nontroppo, Nov 8, 2012
    Last edited: Nov 8, 2012

    nontroppo macrumors 6502


    Mar 11, 2009
    Our OpenGL wrapper suggests that it hints towards FSAA (i.e. supersampling to a higher resolution then lowpass filter downsampling for display), rather than MSAA. The problem is that, at least with OpenGL, you can only hint what method you want, but the GPU driver is what finally decides what method to use:


    Now, this also assumes Apple uses a pipeline for resampling (supersampling) similar to OpenGL, but I don't know if it does. Your results certainly seem to suggest the GPU is involved and more than capable of handling the resampling.
  21. leman thread starter macrumors G3

    Oct 14, 2008
    They don't rely on OpenGL FSAA API to do the sampling (that would be very awkward and hard to control), they do it 'manually'. That is, what they do is to compose the image in a big offscreen buffer (3820x2400 for HiDPI 1920x1200) and then blit this buffer to the default display framebuffer buffer (which is obviously 2880x1800). You can do it either as a texture application or simply using glBlitFramebuffer()
  22. tau101, Nov 14, 2012
    Last edited: Nov 14, 2012

    tau101 macrumors member

    Jul 28, 2012
    I just want to share that any choppy UI animations have been almost completely eliminated by renaming /System/Library/Extensions/AppleGraphicsPowerManagement.kext to AppleGraphicsPowerManagement.kext.disabled and rebooting.

    Apparently this disables multiple power states for the discrete GPU and tells it to consume full power all the time. This has taken idle temperature for Heatsink C from the late 40°Cs to 62°C and the idle fan speed from 2000rpm to 3200rpm. With use the heatsink remains at around the same temperature and fans speeds are in the low 4000s. The temperatures and fans speeds under very high loads don't seem to have been affected, except that they tend to be close to the previous high load temperatures during general use. The battery is predicted to last 2:30 mins with the backlight set to 5 notches from max, and it took around 80 mins to run it down to 50%. Nothing one would not expect given what has been done.

    I have made a little script to toggle the renaming of the file and reboot just for convenience so I can change back for when I need to use the laptop for extended periods without access to the mains. If, like me, you're mostly using your laptop connected to the mains you may wish to consider this avenue.
  23. 0x000000 macrumors 6502

    Aug 26, 2011
    Wouldn't setting the discrete to be always on with gfxcardstatus do the same?
  24. tau101 macrumors member

    Jul 28, 2012
    Based on what I gleaned from the post that first led me to this, no. There is still a spectrum of power states the discrete GPU is configured to operate within depending on various factors. When you delete the file all these state definitions are deleted and the GPU runs with its default power consumption for all the time.
  25. tau101 macrumors member

    Jul 28, 2012
    In addition to this I would like to point out that the latest build of Firefox Aurora performs spectacularly well on the test page.

Share This Page