I have a downloaded dataframe of American citizens. It has a column "State" providing each observation with the name of some American state.
I need to add median household income by state to the analysis from external source. I'm a freshmen in R so I was doing it manually, like this:
(1) Firstly, I created a vector of mean HH income values
hincome <- (c(42.8, 72.2, 48.5, ......... ))
(2) I made a cycle that should create a new variable in the data with assigned values of HH median income corresponding with each state.
data$hincome <- (ifelse(data$State == "Alabama", 42.8,
ifelse(data$State == "Alaska", 72.2,
ifelse(data$State == "Arizona", 48.5,
............ ))))
Obviously this code has around 56 rows and I get an error:
"Ошибка: переполнение стека целых чисел на строке 50" (for russian users)
"Error: stack overflow of integers on line 50"
I tried debug() and browse() to eliminate it but it didn't work. Maybe there is another way to get rid of the error. Or should I somehow incert the vector as a new column to the data so that median HH income values correspond with the column of states?
Nesting that goes extremely deep can cause problems. There are several alternatives - merge()
or a join as suggested by qdread, dplyr::case_when
if your conditions are more complicated, switch()
can work too...
Assuming your hincome
vector is in alphabetical order by state and you have all 50 US states, we can use the built-in state.name
object to create a lookup table and then merge:
lookup_data = data.frame(hincome, State = state.name)
data = merge(data, lookup_data, by = "State", all.x = TRUE)