Search code examples
rdatabasedataframerscript

How to insert a column with median household income by states to the existing dataframe in R? "Stack overflow Error"


I have a downloaded dataframe of American citizens. It has a column "State" providing each observation with the name of some American state.

I need to add median household income by state to the analysis from external source. I'm a freshmen in R so I was doing it manually, like this:

(1) Firstly, I created a vector of mean HH income values

hincome <- (c(42.8, 72.2, 48.5, ......... )) 

(2) I made a cycle that should create a new variable in the data with assigned values of HH median income corresponding with each state.

data$hincome <- (ifelse(data$State == "Alabama", 42.8,
            ifelse(data$State == "Alaska", 72.2,
            ifelse(data$State == "Arizona", 48.5,
            ............ ))))

Obviously this code has around 56 rows and I get an error:

"Ошибка: переполнение стека целых чисел на строке 50" (for russian users)
"Error: stack overflow of integers on line 50"

I tried debug() and browse() to eliminate it but it didn't work. Maybe there is another way to get rid of the error. Or should I somehow incert the vector as a new column to the data so that median HH income values correspond with the column of states?


Solution

  • Nesting that goes extremely deep can cause problems. There are several alternatives - merge() or a join as suggested by qdread, dplyr::case_when if your conditions are more complicated, switch() can work too...

    Assuming your hincome vector is in alphabetical order by state and you have all 50 US states, we can use the built-in state.name object to create a lookup table and then merge:

    lookup_data = data.frame(hincome, State = state.name)
    data = merge(data, lookup_data, by = "State", all.x = TRUE)