AnandTech said ECC video RAM not turned on, why not?

Discussion in 'Mac Pro' started by Luba, Jan 2, 2014.

  1. Luba macrumors 6502a

    Luba

    Joined:
    Apr 22, 2009
    #1
    Read AnandTech's review of nMP and it said (paraphrasing) that when Apple designed nMP it decided not to turn on ECC for VRAM, why not? nMP has ECC for RAM, wouldn't it even be better to have ECC VRAM, especially since the GPU will be used for compute purposes?
     
  2. deconstruct60 macrumors 604

    Joined:
    Mar 10, 2009
    #2
    Cynically, probably because it puts more money in Apple's pocket. AMD is likely playing along because it does the same thing to a less extent for them. Apple can by both cheaper parts and do less work. AMD has an even bigger differentiators for real FirePro cards.


    I suspect that somehow Apple committed to making this a FCPX optimized system. In that specific context it doesn't matter. Bit errors in video won't even been seen even if present. The individual bits by themselves have microscopic value so nobody cares if just a very small number are wrong. The value only goes down more as the resolution dimensions go up. ( they are even harder to see).

    Not sure if they narrowed the focus to help get the design out the door faster in the context of major change to getting it out the door. Or they have more permanently decided to just live in the video ghetto over the long term. FirePro being more so a gimmick to charge higher prices on the GPU cards by normalizing the price comparison against cards that don't throw alot of features under the bus so that can do the "we are cheaper than those" rather than the "we are more expensive than the ones we are more alike with".

    Given unfinished firmware ( product launches with immediate firmware fixes ), likely other software glitches/defects, and no driver support for making effective use of higher resolution dimensions for graphics other than just pictures/movies right now I'm leaning toward they were in over their heads given the resources provided and simplified this initial version.
     
  3. thekev macrumors 604

    thekev

    Joined:
    Aug 5, 2010
    #3
    I'm curious how OpenGL performance will look against some as close as possible Windows firepro equivalents, but this is basically what I expected. The extra $500 probably relates to $200 for an SSD and $300 for an extra gpu or something along those lines. As has been mentioned before, it's possible that the chips used in the D300 may not support ECC ram depending on what chip was actually used.


    Depending on the format being used, a flipped leftmost mantissa bit during any kind of rendering might present an issue, but it's not like you couldn't fix a pixel if the brightness offset was such an issue. Of course given that I'm thinking of floating point image formats in a worst case scenario on something that should almost never happen, it shouldn't present an issue. If you're constantly seeing video artifacts on screen refresh, that machine simply requires hardware service.
     
  4. VoR macrumors 6502a

    Joined:
    Sep 8, 2008
    Location:
    UK
    #4
    Also...it's a bit pointless.

    We should be asking why apple didn't go for amd's latest version of gpus, tahiti uses ecc on sram anyway - I'm pretty (incredibly) cynical and suspect they're more concerned about losing the firepro moniker (just look at the misinformed comments on any nmp thread for reasoning) than creating an off the shelf ggpu monster that destroys everything at a great budget. I guess it wouldn't be such great advertising for the rest of amds workstation lineup either, but it is inevitable.

    Anyway... for the sort of workloads people would use a nmp for, their choice of gpu hardware is perfectly fine in my opinion.
     
  5. quagmire macrumors 603

    quagmire

    Joined:
    Apr 19, 2004
    #5
    The D500 and D700 are based on the Tahiti core. The D300 is Pitcairn.
     
  6. VoR macrumors 6502a

    Joined:
    Sep 8, 2008
    Location:
    UK
    #6
    brainfart, you can see what I mean (hopefully) though :)
     
  7. haravikk macrumors 65816

    Joined:
    May 1, 2005
    #7
    I think only the D700 GPUs would be able to support ECC VRAM, but since the cards are custom constructed I expect it's just a lot easier to assemble them in pretty much the same way, and just vary how many GDDR5 RAM chips are installed?

    It does suck a bit though, as if Apple are really serious about GPU computation as the future then not having ECC VRAM kind of defeats the point of wanting to run your complex, long running tasks on a pair of GPUs if there's a chance that errors will creep into it.

    I wouldn't expect it until the next version at the earliest now, shame though.
     
  8. prfrma macrumors regular

    Joined:
    May 29, 2010
    #8
    I wouldn't have minded either, but +$1000 for 2x under clocked 7970s trashes the value of the system. Your really paying double for custom pcbs and additional vram .
     
  9. Luba thread starter macrumors 6502a

    Luba

    Joined:
    Apr 22, 2009
    #9
    Could Apple "turn on" ECC VRAM later on through software if the D700 already supports ECC VRAM hardware-wise?

    So using the the GPU for compute purpose in video, 3D, graphics,etc. is pointless because the bit error won't show up in the video or 3D, correct?

    But what if you're a scientist or mathematician or engineer, and using the nMP to perhaps solve a large system of equations, then not having ECC VRAM would matter, because the nMP would be using one of the GPU to help out the CPU in calculations, correct? Or would the bit error first originating/noticed at the GPU be caught later on by the ECC RAM at the CPU??

    As you can tell I don't know much about bit errors and how serious they are. None of the MBP have ECC RAM, are they in a disadvantage when it comes to "serious" work?

     
  10. AidenShaw macrumors P6

    AidenShaw

    Joined:
    Feb 8, 2003
    Location:
    The Peninsula
    #10
    The CPU ECC wouldn't be able to tell that the GPU had a RAM error.

    The real problem with bit errors in computational code is that usually a sequence of many calculations is needed to get a result. If an error occurs early in the sequence, later operations will be off because of the bad data, and the final result can be quite wrong.

    The PowerMac cluster at Virginia Tech was mostly unusable due to memory errors - it didn't really go into effective production until all of the systems were replaced with XServes with ECC memory. http://en.wikipedia.org/wiki/System_X_(computing)
     
  11. haravikk, Jan 4, 2014
    Last edited: Jan 4, 2014

    haravikk macrumors 65816

    Joined:
    May 1, 2005
    #11
    Only if the memory on the graphics cards is ECC; I believe the D700 chips support both, but since the others don't then I wouldn't be surprised if Apple just uses non-ECC memory for all of them, so it's not something that could just be turned on.

    Depends how long running the job is, but I think that the chances of an error throwing off the results will be negligible for these kinds of uses (the main uses Apple seems to intend for the new Mac Pro after all).

    For scientific use it would definitely be more important; it's not to say that you can't still use them, but you'd probably have to implement your own error checking in software. Since the majority of GPUs right now simply don't support ECC VRAM then it's probably good practice to do this anyway, plus depending on your task you may be able to use the CPU to do this, since it probably won't be doing as much work in a fully OpenCL accelerated pipeline.

    Basically it's like most other features; if it can be done in hardware then it saves you from having to implement it in software, or from having to run the software ECC (which slows down your program).

    Also, it may not affect properly designed workloads, after all ECC is essentially just for verifying that data written into VRAM is the same data being read out again; however, I think all recent FirePros have EDC which eliminates bus errors (a major concern for GDDR5) and as long as you're not heavily reading and writing to the VRAM the chances of producing an error are slim to none; i.e - if you're just using the VRAM as a massive buffer for your input data, then you may not have to worry much about whether you have ECC or not, but for those with huge accuracy requirements even the slimmest of chances of an error is unacceptable.

    Lastly, while ECC may not be enabled for the GDDR5 memory, it's possible it may still be enabled for cache memory on the D500's and D700's, since that shouldn't require any additional hardware. It's still a case whereby you'll need to weigh up whether that's good enough, but since the memory bus and cache memory are likely to be the busiest parts of the card it should give a big boost to stability, especially for GPU computation.


    For the majority of users it's one of those features where you can be glad you have it, but you probably don't really need.
     
  12. deconstruct60, Jan 5, 2014
    Last edited: Jan 6, 2014

    deconstruct60 macrumors 604

    Joined:
    Mar 10, 2009
    #12
    No. There is no AMD graphics card on the market now with ECC DRAM memory on it. None, zip, nada.

    The ECC function is actually done inside the GPU's memory controller.

    "... So starting with the FirePro W series AMD will have full ECC support in selected models. This will include both ECC for internal SRAM caches (which is actually a free operation), and ECC for the external VRAM (accomplished through the use of a virtual ECC scheme). This functionality is going to be limited to products based on the Tahiti GPU, ..."
    http://www.anandtech.com/show/6137/the-amd-firepro-w9000-w8000-review-part-1/6


    The D500 is "Tahiti" based also; Tahiti LE. There isn't both types of memory. And should absolutely not be surprised at all with the presence of non-ECC memory. Nobody else uses it on AMD cards either; including AMD.


    Chuckle even though there is hardware right here on the GPU's die? No thank you. Making people pay for something, turning it off, and then making slave to put it back in with a kludge. Sure.

    Even if could pulled the ECC computations back into the application's computation stream, you have no just gutted the computation flow. It is like saying dump the double floating point instructions and then emulate them with a software floating point unit. If desperate perhaps. For giggles as an hackery/educational exercise maybe. For work? Frack NO.

    Chances are not none. The "reading and writing" is flawed causality for the errors involved. The error source is not necessarily in the transport. Or that if write once and read 1000 times the data is immutable is also offbase ( has to be written at least once to read it as it doesn't spring out of no where).

    The risk goes up with the amount of RAM. When get into close to double digit GB RAM you are in a different zone than a 128MB VRAM buffer is.
     
  13. TRG24, Jan 5, 2014
    Last edited: Jan 5, 2014

    TRG24 macrumors newbie

    Joined:
    Jan 5, 2014
  14. Luba thread starter macrumors 6502a

    Luba

    Joined:
    Apr 22, 2009
    #14
    Then this is terrible that Apple didn't put in ECC for the GPU since using the GPU to help the CPU is Apple's new strategy for performance. What's the point of having ECC RAM for the CPU if the GPU doesn't have ECC RAM?

    Universities, engineers, mathematicians, or anybody doing modeling with many calculations relying on previous calculations can't use ordinary computers such as a MBP or iMac, since those computers never had ECC.

    In a way, my old 4,1 MP which has ECC RAM, maybe better since very little of CPU compute work is off loaded to the GPU.



     
  15. Luba thread starter macrumors 6502a

    Luba

    Joined:
    Apr 22, 2009
    #15
    Just wondering if the following are true or not.

    1. i5 and i7 CPUs don't have ECC RAM, but bit errors are corrected in OS X. The benefit of Xeon CPUs with ECC RAM is that bit errors are corrected by hardware which is better for speed performance. True or False?

    2. All data and program/app instructions go to the CPU first, then some the compute work is offloaded to the GPU if the app is written for the GPU to help out. Since data and app program code is never sent directly to the GPU, GPU with ECC VRAM is not that important. The CPU with ECC RAM would have already caught any bit errors before offloading it to the GPU. True or False?
     
  16. ZnU macrumors regular

    Joined:
    May 24, 2006
    #16
    Bit errors just go uncorrected without ECC. Fortunately they're not all that common.

    The first part is sort of true; the CPU does generally act as 'command and control,' divvying up tasks and dispatching them to the to the GPUs. But the GPUs have their own working memory (VRAM). Bit errors can occur there as well and absent ECC VRAM those won't be corrected.
     

Share This Page