Become a MacRumors Supporter for $50/year with no ads, ability to filter front page stories, and private forums.

Erniecranks

macrumors newbie
Original poster
Apr 10, 2011
11
0
Is there an easy way in R to change the variables in a column, say 'blue' 'red' and 'green' to 1, 2, and 3?

Thanks,

ernie
 
Assuming your categorical variables are factors, you can use them integers, or call as.integer. When creating the factor, you can determine the sequence with levels()

Code:
> clrs <- factor(c("red", "green", "blue"))
> clrs
[1] red   green blue 
Levels: blue green red
> as.integer(clrs)
[1] 3 2 1
> clrs <- factor(c("red", "green", "blue"), levels=c("red", "green", "blue"))
> clrs
[1] red   green blue 
Levels: red green blue
> as.integer(clrs)
[1] 1 2 3
> r <- rnorm(10)
> r
 [1]  1.8639513  0.4930384  1.8170273  1.7606512  0.1418951  1.0160500
 [7] -2.1571495 -1.0363607 -0.4395486 -0.6859069
> r[clrs]
[1] 1.8639513 0.4930384 1.8170273
 
Alternative that works for both factors and character vectors:

Code:
> a <- data.frame(cbind(sample(c("red","green","blue"),10,T),matrix(rnorm(30),10,3,T)))
> a
      X1                 X2                 X3                  X4
1   blue  -2.65160520677154   1.13671997203813    1.59844807462027
2    red   1.63603301299993  -1.44809803772613  -0.372299702576141
3    red   -1.0520070389011   1.17005686224478   0.747703941203762
4    red -0.577843522326433  0.157226421406988  0.0672999761529491
5  green   1.04109600264608  0.103028340501787    2.76900952476021
6   blue -0.811299237328568   1.42245069258426    2.09960604012682
7   blue   1.92844116562255  0.371461524110289   -0.69285790935153
8    red  -1.38648123984735 -0.377298779883762  0.0943156014716379
9  green  0.579690002553588  0.172524006604432  -0.568180791202796
10 green  0.198367546634958 -0.848545701166513 -0.0666525679750112

> a[,1] <- sapply(a[,1],switch,"blue"=1,"red"=2,"green"=3)
> a
   X1                 X2                 X3                 X4
1   1 -0.406057311060619   1.37221795975874   1.03588708890414
2   2  -1.14795852027568  -1.02997903738951 -0.371426930694828
3   3  0.586066884589126   1.10068549323689  0.414053801828515
4   3  -0.23266477205783 -0.127766174966108 0.0115180462499652
5   2  -1.42033488605275 0.0983241940109921   1.06460692207479
6   2 -0.377867851352621  -1.22987957019859  0.651746344101077
7   1 -0.951456500887181  0.260840314961966   2.04018777986721
8   2  0.758593153216336 0.0765212264963914   1.41236673762932
9   2 -0.917024506889731  -1.37698559321206 0.0197024018447221
10  3  0.343258023825715 -0.561586274559691   1.12637896095723
 
Above are two good options. My question is why you want to do it. Depending on that, it may be unnecessary or there may be a better way of doing what you want.
 
Why do I want to do it? Because....

Isn't it easier to work with single numerals than character strings? I don't really know that answer in R. I've used minitab, and 1 is a lot easier than saying 'left lateral recumbency', or even LLR.
 
I would go the other way, "left lateral recumbency" or LLR carries meaning in the problem domain, but 1 doesn't, but then again, whatever works for you. Providing you are happy with '1' when you come to review your work, then go for it.

This mapping of nominal / categorical or ordinal values is basically what factors in R are for. I edit scripts in a text editor, so copy and pasting long names (or using autocompletion) is less of an issue than working interactively.

If your data is in a data.frame you will probably use strings.as.factors by default. If you want to specify the levels see above (or more details here http://www.statmethods.net/input/valuelabels.html).

I think the easiest thing for you to do is to add another column to your data.frame using as.integer
i.e.
Code:
obs <- read.delim('/your/data/observations.tsv')
obs$posture.int <- as.integer(obs$posture.factor)
obs$reading[obs$posture.int == 2]

but Hansr's solution will work too and is probably less typing if you have character data.
 
Last edited:
Isn't it easier to work with single numerals than character strings? I don't really know that answer in R. I've used minitab, and 1 is a lot easier than saying 'left lateral recumbency', or even LLR.

As AlmostThere suggests, you want your code to be as readable as possible. Using "red," "green," and "blue" as variable names may be very slightly more typing, but it's a lot more clear what you're doing and what you're model is than if you used "1", "2", and "3."

Furthermore, R will use character variables as factors (categorical/class variables) by default. If you change them to integers, you'll have to remember you tell R to use them as factors rather than numeric variables.

I'd suggest you just keep them as character variables and not change them to numeric.
 
Register on MacRumors! This sidebar will go away, and you'll see fewer ads.