Search code examples
rreshapereshape2data-management

Reshape long to wide where most columns have multiple values


I have data as below:

IDnum   zipcode   City          County   State
10011   36006     Billingsley   Autauga  AL 
10011   36022     Deatsville    Autauga  AL
10011   36051     Marbury       Autauga  AL
10011   36051     Prattville    Autauga  AL
10011   36066     Prattville    Autauga  AL
10011   36067     Verbena       Autauga  AL
10011   36091     Selma         Autauga  AL
10011   36703     Jones         Autauga  AL
10011   36749     Plantersville Autauga  AL
10011   36758     Uriah         Autauga  AL
10011   36480     Atmore        Autauga  AL
10011   36502     Bon Secour    Autauga  AL

I have a list of zipcodes, the cities they encompass, and counties/states they are located in. IDnum = numeric value for county and state, combined. List is in format you see now, I need to reshape it from long to wide / vertical to horizontal, where the IDnum variable becomes the unique identifier, and all other possible value combinations become wide variables.

IDnum  zip1   city1        county1  state1  zip2   city2       county2
10011  36006  Billingsley  Autauga  AL      36022  Deatsville  Autauga

This is just sample of the dataset, it encompasses every zip in the USA and includes more variables. I have seen other questions and answers similar to this one, but not where there are multiple values in almost every column.

There are commands in SPSS and STATA that will reshape data this way, in SPSS I can run a Restructure/Cases to Vars command that turns 11 variables in my initial dataset into about 1750, b/c one county has over 290 zips and it replicates most of the other variables 290+ times. This will create many blanks, but I need it to be reshaped into one very long horizontal file.

I have looked at reshape and reshape2, and am hung up on the 'default to length' error message. I did get melt/dcast to sorta work, but this creates one variable that is a list of all values, rather than creating variables for each value.

melted_dupes <- melt(zip_code_list_dupes, id.vars= c("IDnum"))
HRZ_dupes <- dcast(melted_dupes, IDnum ~ variable, fun.aggregate = list) 

I have tried tidyr and dplyr but got lost in syntax. Am a little surprised there isn't a command the data similar to built in commands in other packages, making me assume there is, and I just haven't figured it out.

Any help is appreciated.


Solution

  • You can do this with the base function reshape after adding in a consecutive count by IDnum. Assuming your data is stored in a data.frame named df:

    df2 <- within(df, count <- ave(rep(1,nrow(df)),df$IDnum,FUN=cumsum)) 
    

    Provides a new column of the consecutive count named "time". And now we can reshape to wide format

    reshape(df2,direction="wide",idvar="IDnum",timevar="count") 
    
      IDnum zipcode.1      City.1 County.1 State.1 zipcode.2     City.2 County.2 State.2 zipcode.3  City.3 County.3 State.3 zipcode.4     City.4 County.4 State.4
    1 10011     36006 Billingsley  Autauga      AL     36022 Deatsville  Autauga      AL     36051 Marbury  Autauga      AL     36051 Prattville  Autauga      AL
    

    (output truncated, goes all the way to zipcode.12, etc.)