Search code examples
rsapplytapply

Incorrect returns using tapply in R


I am working with the tapply function in R. I am simply trying to get the tapply function to return the same results as the sapply function (The one I am pretty sure is correct).

GOAL:

I am working with the state.x77 data and trying to find the literacy rate of each region using the sapply and tapply functions.

CODE:

####Setting up the data
state.df = data.frame(state.x77, Region=state.region, Division=state.division)
state.by.region = split(state.df, f=state.region)
state.by.div = split(state.df, f=state.division)

####Tapply
tapply(state.df$Illiteracy, INDEX = state.region,FUN = function(v){
  li.rate = 100 - state.df$Illiteracy
  return(median(li.rate))
})

I see that I'm using different data frames for tapply. I think I SHOULD be using state.by.region but I simply can't get it to go. The best I can think of is:

tapply(state.by.region[,"Illiteracy"], INDEX = state.region, FUN = function(v){
  li.rate = 100 - state.by.region$Illiteracy
  return(median(li.rate))
})

What can I try next?


Solution

  • In tapplys anonymous function you should subtract 100 by v and not state.df$Illiteracy as subtracting by v means you are only taking values for that Region and not complete dataframe. Also you don't need to split the data, you can refer the column name as INDEX.

    tapply(state.df$Illiteracy, INDEX = state.df$Region,FUN = function(v){
          li.rate = 100 - v
          return(median(li.rate))
    })
    
    #    Northeast         South North Central          West 
    #        98.90         98.25         99.30         99.40