Search code examples
rdataframereshape2

Further specification on how to properly use the reshape() function


I'm working on a project that studies to see if there is a trend to how much a student studies on average each semester of each year during university. Each dataframe was listed as year1, year2, etc. when read in. The data frame I started with has the following format:

studentID region year semester week numHours
1 West 1 Fall 1 4-6
1 West 1 Fall 2 7-9
1 West 1 Fall 3 7-9 hour(s)

This, of course, only lists some of the entries.

The region, year, and week columns were removed. The year because there was a separate csv file for each year and the week column because the particular week didn't matter, just the number of hours. The new dataframe was given the name year1Reduced, year2Reduced, etc.

The next thing after removing the columns I did was use the reshape function in order to change the data from "long" to "wide" format so I could have three new columns, one for each semester (Fall, Spring, Summer) and the number of hours each student studied every week during whichever semester. This is how I wrote it:

year1Reduced.wide.Reshape <- reshape(data = year1Reduced,
                             idvar = c("studentID"),
                             timevar = "semester",
                             direction = "wide",
                             v.names = c("numHours")
)

It worked, sort of. I got an unnecessary column, which I removed, but I also got the separate columns for Fall, Spring, and Summer, but instead of listing all of the number of hours each student studied for each week, it only took the first element under numHours for each semester for each student. This is what it looked like:

studentID numHours.Fall numHours.Spring numHours.Summer
1 4-6 Less than 1 1-3 hour(s)
2 4-6 hour(s) 1-3 1-3 hour(s)
3 4-6 hour(s) less than 1 1-3 hour(s)

What I'm trying to figure out is how to have all of the numHours entries included instead of just one entry per student per semester. If it's just a small change in the reshape code I wrote, please let me know.

The format I want would look like this:

studentID numHours.Fall numHours.Spring numHours.Summer
1 4-6 Less than 1 1-3 hour(s)
1 5-7 hour(s) 1-4 1-2 hour(s)
1 4-6 hour(s) 5-6 1-4 hour(s)

Any help would be appreciated. I'm pretty new at R, so try to explain it as though I know nothing if possible.


Solution

  • We don't have enough data to test this but you can try :

    library(dplyr)
    library(tidyr)
    
    year1Reduced %>%
      select(studentID, semester, numHours) %>%
      group_by(studentID, semester) %>%
      mutate(row = row_number()) %>%
      ungroup %>%
      pivot_wider(names_from = semester, values_from = numHours) %>%
      select(-row)