I am working with the tapply function in R. I am simply trying to get the tapply function to return the same results as the sapply function (The one I am pretty sure is correct).
GOAL:
I am working with the state.x77 data and trying to find the literacy rate of each region using the sapply and tapply functions.
CODE:
####Setting up the data
state.df = data.frame(state.x77, Region=state.region, Division=state.division)
state.by.region = split(state.df, f=state.region)
state.by.div = split(state.df, f=state.division)
####Tapply
tapply(state.df$Illiteracy, INDEX = state.region,FUN = function(v){
li.rate = 100 - state.df$Illiteracy
return(median(li.rate))
})
I see that I'm using different data frames for tapply. I think I SHOULD be using state.by.region but I simply can't get it to go. The best I can think of is:
tapply(state.by.region[,"Illiteracy"], INDEX = state.region, FUN = function(v){
li.rate = 100 - state.by.region$Illiteracy
return(median(li.rate))
})
What can I try next?
In tapply
s anonymous function you should subtract 100 by v
and not state.df$Illiteracy
as subtracting by v
means you are only taking values for that Region
and not complete dataframe. Also you don't need to split
the data, you can refer the column name as INDEX
.
tapply(state.df$Illiteracy, INDEX = state.df$Region,FUN = function(v){
li.rate = 100 - v
return(median(li.rate))
})
# Northeast South North Central West
# 98.90 98.25 99.30 99.40