Search code examples
rdata-analysis

Shorter method to replace entries in R


I have started learning R recently. Here's the source file I am working with (https://github.com/cosname/art-r-translation/blob/master/data/Grades.txt). Is there anyway I can change the letter grade from, say, A to 4.0, A- to 3.7 etc. without using the loop?

I am asking because if there were 1M entries, "for" loop might not be the most efficient way to modify the data. I would appreciate any help.


Since one of the posters told me to post my code, I thought of running the for loop to see whether I am able to do it. Here's my code:

mygrades<-read.table("grades.txt",header = TRUE)

i <- for (i in 1:nrow(mygrades))
{
  #print(i)  
  #for now, see whether As get replaced with 4.0.
  if(mygrades[i,1]=="A")
  {
    mygrades[i,1]=4.0
  }
  else if (mygrades[i,2]=="A")
  {
    mygrades[i,2]=4.0
  }
  else if (mygrades[i,3]=="A")
  {
    mygrades[i,3]=4.0
  }
  else
  {
    #do nothing...continues
  }

}

write.table(mygrades,"newgrades.txt")

However, the output is a little weird. For some "A"s, I get NA and others are left as it is. Can someone please help me with this code?


@alistaire, I did try Hadley's look-up table, and it works. I also looked at dplyr code, and it works well. However, for sake of my understanding, I'm still trying to use for loops. Please note that it has been about two days since I opened an R book. Here's the modified code.

#there was one mistake in my code: I didn't use stringsAsFactors=False.
#now, this code doesn't work for all "A"s. It spits out 4.0 for some As, and #doesn't do so for others. Why would that be?

mygrades<-read.table("grades.txt",header = TRUE,stringsAsFactors=FALSE)

i <- for (i in 1:nrow(mygrades))
{
  #print(i)  
  if(mygrades[i,1]=="A")
  {
    mygrades[i,1]=4.0
  }
  else if (mygrades[i,2]=="A")
  {
    mygrades[i,2]=4.0
  }
  else if (mygrades[i,3]=="A")
  {
    mygrades[i,3]=4.0
  }
  else
  {
    #do nothing...continues
  }

}

write.table(mygrades,"newgrades.txt")

The output is:

"final_exam" "quiz_avg" "homework_avg"
"1" "C" "4" "A"
"2" "C-" "B-" "4"
"3" "D+" "B+" "4"
"4" "B+" "B+" "4"
"5" "F" "B+" "4"
"6" "B" "A-" "4"
"7" "D+" "B+" "A-"
"8" "D" "A-" "4"
"9" "F" "B+" "4"
"10" "4" "C-" "B+"
"11" "A+" "4" "A"
"12" "A-" "4" "A"
"13" "B" "4" "A"
"14" "D-" "A-" "4"
"15" "A+" "4" "A"
"16" "B" "A-" "4"
"17" "F" "D" "A-"
"18" "B" "4" "A"
"19" "B" "B+" "4"
"20" "A+" "A-" "4"
"21" "4" "A" "A"
"22" "B" "B+" "4"
"23" "D" "B+" "4"
"24" "A-" "A-" "4"
"25" "F" "4" "A"
"26" "B+" "B+" "4"
"27" "A-" "B+" "4"
"28" "A+" "4" "A"
"29" "4" "A-" "A"
"30" "A+" "A-" "4"
"31" "4" "B+" "A-"
"32" "B+" "B+" "4"
"33" "C" "4" "A"

As you can see in the first row, the first A got recoded as 4, but the second A didn't get recoded. Any idea why this is happening?

Thanks in advance.


Solution

  • A typical way in base R would be to make a named vector as a lookup table, e.g.

    # data with fewer levels for simplicity
    df <- data.frame(x = rep(1:3, 2), y = rep(1:2, 3))
    
    lookup <- c(`1` = "A", `2` = "B", `3` = "C")
    

    and subset it with each column:

    data.frame(lapply(df, function(x){lookup[x]}))
    ##   x y
    ## 1 A A
    ## 2 B B
    ## 3 C A
    ## 4 A B
    ## 5 B A
    ## 6 C B
    

    Alternately, dplyr recently added a recode function that's useful for such a job:

    library(dplyr)
    
    df <- read.table('https://raw.githubusercontent.com/cosname/art-r-translation/master/data/Grades.txt', header = TRUE)
    
    df %>% mutate_all(funs(recode(., A = '4.0', 
                                  `A-` = '3.7'))) %>%    # etc.
        as_data_frame()    # for prettier printing
    
    ## # A tibble: 33 x 3
    ##    final_exam quiz_avg homework_avg
    ##        <fctr>   <fctr>       <fctr>
    ## 1           C      4.0          4.0
    ## 2          C-       B-          4.0
    ## 3          D+       B+          4.0
    ## 4          B+       B+          4.0
    ## 5           F       B+          4.0
    ## 6           B      3.7          4.0
    ## 7          D+       B+          3.7
    ## 8           D      3.7          4.0
    ## 9           F       B+          4.0
    ## 10         39       C-           B+
    ## # ... with 23 more rows