New York Times Writes in Praise of <R>, the Open Source Statistics Package

Discussion in 'Apple, Inc and Tech Industry' started by mkrishnan, Jan 7, 2009.

  1. mkrishnan Moderator emeritus

    mkrishnan

    Joined:
    Jan 9, 2004
    Location:
    Grand Rapids, MI, USA
    #1
    http://www.nytimes.com/2009/01/07/technology/business-computing/07program.html

    Excellent publicity for one of the best, and yet, most underappreciated, applications in the OSS world. :)

    The article glosses over one thing that makes <R> really well suited to the statistics world... A SAS representative makes this gibe:

    But one of the hidden but major problems with programs like SAS, SPSS, LISREL, MPlus, etc, is that, for complicated statistics, for which there are no completely well-established numerical methodologies, these programs, being closed source, do not publish their source code and are therefore not open to analysis by the community to determine whether their assumptions in conducting these analyses are actually properly warranted. Even in relatively simple analyses like basic SEM, it's fairly common that results from different packages are slightly different, with little clarity as to why.

    R has a huge advantage, in principle, that even for its most complex statistics, how it arrives at its results is completely available for analysis.
     
  2. Blue Velvet Moderator emeritus

    Joined:
    Jul 4, 2004
    #2
    Is it mere coincidence that it's right next to Q in the alphabet? Enquiring minds need to know.
     
  3. mkrishnan thread starter Moderator emeritus

    mkrishnan

    Joined:
    Jan 9, 2004
    Location:
    Grand Rapids, MI, USA
    #3
    Let me run some numbers and get back to you. :D

    Incidentally, the interface for R in OS X is particularly nice.
     
  4. gotzero macrumors 68040

    Joined:
    Jan 6, 2007
    Location:
    Mid-Atlantic, US
    #4
    There is much more of a coincidence that it is right next to "S".

    I love R, and am always surprised that it does not get more attention from businesses and people learning statistics.


    Agreed! For once the OS X version of a truly multi-platform program is not way behind...
     
  5. Sesshi macrumors G3

    Sesshi

    Joined:
    Jun 3, 2006
    Location:
    One Nation Under Gordon
    #5
    In principle. It remains to be seen if flaws in the results get picked up on and fixed, even if they are readily visible. Personally, I think that element of FOSS being touted as an advantage is at best a debatable one in practice.
     
  6. twoodcc macrumors P6

    twoodcc

    Joined:
    Feb 3, 2005
    Location:
    Right side of wrong
    #6
    interesting. thanks for posting! i have never heard of R, and i plan to check it out!
     
  7. gotzero macrumors 68040

    Joined:
    Jan 6, 2007
    Location:
    Mid-Atlantic, US
    #7
    Taking that bait, I have found plenty of errors in the closed platforms too, and in my opinion, at least with R they are well discussed, and you can often see why there is a problem. It is a little rough around the edges, and most people now are terrified of anything with a terminal, but if money is an object, it sure fills a gap.

    One of the advantages of FOSS here is that you can move around companies and computers and know that you can get an additional unit of the software without breaking the bank.

    R is not going to overtake SAS, but I think it is nice to have a platform where everyone can share data/output.
     
  8. Sesshi macrumors G3

    Sesshi

    Joined:
    Jun 3, 2006
    Location:
    One Nation Under Gordon
    #8
    That depends on whether your focus in terms of how you acquire and use software.

    To me, unless we're talking about a custom development the cost of the acquisition of the software is actually the cheapest part of the software lifecycle - and for most packaged applications, a completely negligible part of the operating cost of the application as a whole.

    If you're a very large organisation and custom-develop many of your application, then OSS makes a lot of sense as a means of using or adapting existing code as a time-compression method. We're also currently doing that. Similarly if your rollout is measured in millions of units, then the cost of packages starts to become a sizeable sum which has to be considered.

    However, for many situations I'm not sure if nerds who can't see the wood for the trees have progressed higher up the management chain or if there's just been a general dumbing-down in terms of decision-making sysadmins, but increasingly I'm starting to hear the same arguments for FOSS, not all of which actually makes sense over the lifecycle of the product in a professional scenario in the applications being considered.
     
  9. gotzero macrumors 68040

    Joined:
    Jan 6, 2007
    Location:
    Mid-Atlantic, US
    #9
    From a top-of-the-enterprise argument, I am sure you are right, and it highlights to me that you play in much larger pools than I do right now.

    My biggest problem was when managing a department of less than 20 people in a large organization, and we needed better software FAST. Even though money was not necessarily an object, time was, and it can take weeks or months to go from the "I need this" stage to having it in hand. R saved me a couple of times simply because I could sweet talk IT into getting proven FOSS platforms on my team's systems often same day.

    I guess to me part of the benefit of FOSS is the speed. You can be up and running in a few minutes if need be. It also allows you to prove the benefit of the software before you have to justify purchasing it. A specific benefit for me was that after leaving my bank, I was able to take R with me, and I now am able to use it for school and when I do consulting. Mastering one program helped me with several jobs, an advantage over non-monopolistic closed-source programs or custom applications.
     
  10. Sesshi macrumors G3

    Sesshi

    Joined:
    Jun 3, 2006
    Location:
    One Nation Under Gordon
    #10
    I see the budgetary reason. However that's quite a band-aid approach, although I don't see it as uncommon, and not necessarily a guarantee of the best solution, as often in that case you haven't actually used the competing monetised applications 'in anger'. It's 'what you can get' as opposed to 'what you might actually need'.
     
  11. mkrishnan thread starter Moderator emeritus

    mkrishnan

    Joined:
    Jan 9, 2004
    Location:
    Grand Rapids, MI, USA
    #11
    I think in statistics and other "technical" computing, there are some additional issues. First, by tradition, the support from the companies that make technical software is not always great. As I understand it, SAS is less of an issue, but the people who make SPSS have no customer management skills whatsoever. They routinely find their software broken by operating system updates and then take months to fix it. They mysteriously post warning messages six or eight months after an operating system version comes out indicating that statistics might be miscomputed on that version even though the software appears to run. All because they programmed their software (in both Windows and OS X) in some god forsaken development environment that flaunts every convention of both operating systems (although recently they have "possibly" fixed this with their transition to using Java).

    As for the value of open code, in basic statistics, it's not a big deal, but in primarily academic advanced statistics, it can be crucial. But I do agree with Sesshi in that, although R is very advanced, there hasn't always been a lightning-speed advancement of its features based on the fact that its algorithms are freely available and that many leading statisticians are working on it. I'd say, in comparison to the Mozilla foundation, for instance, the management of R is a bit haphazard.
     
  12. gotzero macrumors 68040

    Joined:
    Jan 6, 2007
    Location:
    Mid-Atlantic, US
    #12
    Agreed, but the market is also a lot smaller.

    A lot of times for me it is not the budget but the speed. There is a big difference between "what I need" and "what I need right now", especially if I am billing, or getting billed, by the hour.

    You are both right that it is not the most mature, but I have a lot of high hopes for it. Teaching statistics, it is quite refreshing to be able to point people to something they can D/L and use at home, and through my rose colored glasses, it will help the next generation of the statistically inclined get some exposure at a younger age, not only to stat, but also to a command line ;).
     
  13. haiggy macrumors 65816

    Joined:
    Aug 20, 2003
    Location:
    Ontario, Canada
    #13
    We use this in my statistics class... never thought I'd see anything mentioned about it. I don't find it particularly easy to use unless you've gone through quite a few tutorials... then I guess you'd get the hang of it.
     
  14. mkrishnan thread starter Moderator emeritus

    mkrishnan

    Joined:
    Jan 9, 2004
    Location:
    Grand Rapids, MI, USA
    #14
    I think R is much harder to use than SPSS and even SAS, but the big difference is that once you understand R's way of doing things, I think it's much more consistent than SPSS in particular. I found the statement in the Times article kind of ironic, actually, when they said that R found a following among scientists who didn't know how to program. R is so much easier to use if you understand the basics of object oriented programming.
     
  15. pooky macrumors 6502

    Joined:
    Jun 2, 2003
    #15
    R is wonderful, and what you've written above is a perfect example of what pisses me off about the commercial scientific software community, particularly SAS. Not only do users have no access to their algorithms, users are FORCED to do things in a particular way. Want to do a regression? On SAS, you have virtually no options as far as controlling how it is computed, unless it is an option that you have been granted permission to use. In R, if you don't like the way it is computed, you can write your own damn code.

    I've noticed in my field in particular (Ecology/Evolutionary biology), the past few decades have been SAS-dominated, with older workers entering a comfort zone with the software that isn't always entirely logical. I've had reviewers ask for computations (e.g. least squares means, type 3 sums of squares) that were completely invented by the SAS institute without much support for their actual utility. It's one thing when bad, overpriced software is all that is available, but when that software can completely change the scientific culture, you're in trouble.

    In other words, R kicks ass, and while it may not be the end-all-be-all of statistics, it will certainly force the commercial developers to innovate to remain competitive. This is a Good Thing™.
     
  16. gotzero macrumors 68040

    Joined:
    Jan 6, 2007
    Location:
    Mid-Atlantic, US
    #16
    For what it is worth, Ashlee provided an update on an NYT blog (link) after what must have been deluge of geekmail.

    Seeing the ~800% difference in the estimate of the R user base (goes up with commercial dependence, how random), does anyone know where to find an intelligent estimate of the number of SAS users? It would be interesting to even roughly compare the number of eyeballs starting at each.
     
  17. mkrishnan thread starter Moderator emeritus

    mkrishnan

    Joined:
    Jan 9, 2004
    Location:
    Grand Rapids, MI, USA
    #17
    Try psychology... :p SPSS is already considered "advanced" software and most people will not go anywhere near SAS because of its perceived difficulty. :eek:

    I would be curious to know the SAS userbase size, too (and SPSS while we're at it). I think it's quite large, though.
     

Share This Page