Let's say you have a data set that looks like this:
Vietnam Gulf War Iraq War
veteran1 1 0 0
veteran2 0 1 0
veteran3 0 0 1
veteran4 0 1 1 # <---- Note this row
You want to consolidate these columns without affecting other columns in the dataframe like so:
Service
veteran1 1
veteran2 2
veteran3 3
veteran4 2 # <---- Note this row
Where
1 = Vietnam
, 2 = Gulf War
, 3 = Iraq War
veteran4
where it picked their left-most column)Questions:
How would you do this in R
?
(Note: if it's easier to do in some other free open source program, please feel free to share which program and how you would do it. This is a massive dataset: 3 million rows, the American Community Survey.)
Looking at your data, it seems a simple issue of:
if Vietnam > 0, then use 1, otherwise if Gulf war > 0 then 2, otherwise if Iraq > 0 then 3, else 0
vietnam = c(1, 0, 0,0)
gulfwar = c(0,1,0,1)
iraq = c(0,0,1,1)
df = data.frame(vietnam, gulfwar, iraq)
df$service <- ifelse(df$vietnam > 0,1,ifelse(df$gulfwar>0,2,ifelse(df$iraq>0,3,0)))
df
Result:
vietnam gulfwar iraq service
1 1 0 0 1
2 0 1 0 2
3 0 0 1 3
4 0 1 1 2