Search code examples
rdatabasedataframetranspose

How to match data to specified rows and columns in R (or transpose data with matching)


I had multiple datasets with information about the day shift in the company like this:

Day1Set

name    shift work_time
Worker1 1     8
Worker3 1     6
...

Day2Set

name    shift work_time
Worker2 2     8
Worker3 2     6
...

Where "Workers" are unique employee names (strings) and all other information is numeric. Of course, I can merge them all into one dataset with an additional numeric variable describing the working day - if that matters.

And what I'm trying to do is change the form of the database: transpose rows and columns while sorting columns by employees and rows by dates (day number). As output I am interested in the following set:

Day Worker1_shift Worker1_time Worker2_shift Worker2_time Worker3_shift Worker2_time ...
1   1             8            -             -            1             6
2   -             -            2             8            2             6
...

where "-" means no data/NA. If it makes a difference, the number of employee names is known in advance and finite, the number of workdays is also known (I can convert them to dates or strings), so one can determine the size of the dataset in advance.

In Statistica, I had ETL modules so far, thanks to which I matched data by dates and variables, creating new datasets. I'm kinda new to R and honestly don't know where to start. I don't even know what phrases to look for - matching columns/rows, filling columns/rows; what modules, what libraries, and so on.

It would be enough for me even a hint in which threads or under what phrases to search, with the rest (I hope) I will manage somehow. If there is any simple method (even memory/proc consuming), I prefer it as a beginner.

Thanks in advance for your tips.


Solution

  • Using the data provided, here's an option to tidyr::pivot_wider your data. Where you "widen" the data by creating new columns based on the values defined in the names_from argument.

    I did create a new variable day in both data.frames representing the working day. Then rbinded or row binded the data.frames together to create one dataset.

    From there all you need to do is pivot_wider the table and sort the columns to your preference.

    Given that you're new to R. The two packages included are: dplyr and tidyr each containing the functions necessary for this example. I'd recommend checking out some tutorials on dplyr and tidyverse to get a fundamental understanding of the syntax. As it's different from base R or other popular packages like data.table.

    library(dplyr)
    library(tidyr)
    
    day1 = data.frame(
      name = c("Worker1", "Worker3"),
      shift = c(1,1),
      work_time = c(8,6)
    )
    
    day2 = data.frame(
      name = c("Worker2", "Worker3"),
      shift = c(2,2),
      work_time = c(8,6)
    )
    
    day1 = day1 %>%
      mutate(Day = 1)
    
    day2 = day2 %>%
      mutate(Day = 2)
    
    days = rbind(day1, day2)
    
    days = days %>%
      pivot_wider(names_from = "name", values_from = c("shift", "work_time"), names_glue = "{name}_{.value}") %>%
      select(sort(names(.))) %>%
      rename_with( ~ gsub("_work","", .x)) 
    
    # Output
    
    days
    # A tibble: 2 × 7
        Day Worker1_shift Worker1_time Worker2_shift Worker2_time Worker3_shift Worker3_time
      <dbl>         <dbl>        <dbl>         <dbl>        <dbl>         <dbl>        <dbl>
    1     1             1            8            NA           NA             1            6
    2     2            NA           NA             2            8             2            6