How to work with columns in R Statistical Package

Discussion in 'Mac Apps and Mac App Store' started by Erniecranks, Apr 12, 2011.

  1. Erniecranks macrumors newbie

    Joined:
    Apr 10, 2011
    #1
    Ok, I entered my CSV file into R. One of the columns is numberspeciesoftrees, another is bedrooms (also a numeric value). I have been trying to figure out how to get the mean of each of these columns, to no avail.
    How do I get the mean, sd, etc, or otherwise work with values by column? I have columns which are factors with levels as well.
     
  2. MisterMe macrumors G4

    MisterMe

    Joined:
    Jul 17, 2002
    Location:
    USA
    #2
    There are applications that will handle your task that are much easier to use than R. If you prefer to R, then you should be prepared to learn R. It has a rather good Help file that is just a mouse-click away.
     
  3. Erniecranks thread starter macrumors newbie

    Joined:
    Apr 10, 2011
    #3
    fine

    Thank you for the lesson in attitude. I was hoping to learn about R.

    I have been reading a lot on R, but like most manuals, they are written for those who already speak the language. Perhaps if I can find a few hundred dollars of spare change in the sofa cushions I can afford a license for SAS, which I did learn once upon a time.
     
  4. kuwisdelu macrumors 65816

    Joined:
    Jan 13, 2008
    #4
    Data in R is stored in a data.frame, which is what's made when you do your read.csv(), and you access the columns (the variables) by using the $ operator.

    If you have a data.frame called data1 and a variable called x1, you can access that variable by doing data1$x1.

    You can also treat data.frames as matrices and manipulate the rows and columns directly using, say, data1[,1] to get the first column. To get, say, the first element in the first row of data1, do data1[1,1]. Notice before, we left out the first number, because we wanted the whole column. Similarly, data1[1,] will give you the first row. You can use the : operator to quickly ask for a range, so doing data1[1:10,1] will give you the first 10 observations in the first column. If you're dealing with variables, it's often easier to just use the $ operator like I mentioned above, though.

    To get things like mean, standard deviation, etc., just use the functions on the variable (or the column vector) like so:

    mean(data1$x1)
    sd(data1$x1)

    If x1 is the first variable (the first column) in data1, it would be equivalent to do the following:

    mean(data1[,1])
    sd(data1[,1])

    But that's a little harder to read and understand immediately.

    R is an imperative language (you tell the program what to do) while SAS is a declarative language (you ask the program what to do), and the latter can be a little easier for non-programmers. This makes R more powerful for generic statistical programming, but can sometimes require a little more work than SAS for certain functionality. That's why R is generally the choice for research, while SAS is generally the choice for industry.

    SAS is nice and would probably be better for a non-programmer, non-statistician, but like you said, it's expensive as hell for an individual. For most things, there are ways to do what you want in both, though.
     

Share This Page