Search code examples
rdplyrreshape2

Untypical Data Format convert to Wide from Long


My Data :

    # A tibble: 6 x 4
  X__1            X__6                                                     X__7     X__8        
  <chr>           <chr>                                                    <chr>    <chr>       
1 Emp #:          xxyy                                                    Departm~ Corporate S~
2 Reason of Resi~ I think below are areas of improvement within my team C~ NA       NA          
3 Emp #:          xyyy                                                    Departm~ Corporate S~
4 Reason of Resi~ better oppurtunity                                       NA       NA          

I want to change the Data to the following Format

Emp #     Reason                                                 Department
10282     I think below are areas of improvement within my team  Corporate
10308     better oppurtunity                                     Corporate

Reproduce Data

structure(list(X__1 = c("Emp #:", "Reason of Resignation:", "Emp #:", 
"Reason of Resignation:", "Emp #:", "Reason of Resignation:", 
"Emp #:", "Reason of Resignation:", "Emp #:", "Reason of Resignation:"
), X__6 = c("10282", "I think below are areas of improvement within my team CS / SME or my be cross the organization on my level (L1-L2). Lack of career growth specially in my department i.e. CS HOD/RSM/TLs/KAMs are on same position from last 5 years. Many people are here on same position from last 10-12 years. lack in focus on low level staff (L1 / L2) in terms of capacity building and career growth i.e. not a single training for my team on it. No rotation plans (for capacity building) for CS i.e. not a single team member rotated since I joined. Better opportunity in terms of career and financials outside ", 
"10308", "better oppurtunity", "11230", "Moving on another organization for career persuade", 
"13370", "Get a new job outside the company.", "14694", "Health Issues"
), X__7 = c("Department:", NA, "Department:", NA, "Department:", 
NA, "Department:", NA, "Department:", NA), X__8 = c("Corporate Solutions", 
NA, "Corporate Solutions", NA, "Region Central A", NA, "Region North", 
NA, "Finance Operations", NA)), row.names = c(NA, -10L), class = c("tbl_df", 
"tbl", "data.frame"))

A bit more detail.

Emp# in X__1 would go in first column header which would have value from X__6 and so on.


Solution

  • I added a new column called rid that grouped pairs of rows, then filtered out the required columns and left_join() them back together by their rid.

    library(dplyr)
    
    df <- mutate(df, rid = lapply(1:(nrow(df)/2), function(x) rep(x, 2)) %>% unlist())
    
    left_join(
      df %>%
        filter(X__1 == "Emp #:") %>%
        select(rid, X__6) %>%
        rename("Emp #" = "X__6"),
      df %>%
        filter(X__1 == "Reason of Resignation:") %>%
        select(rid, X__6) %>%
        rename("Reason" = "X__6"),
      by = "rid") %>%
      left_join(df %>%
                  filter(X__7 == "Department:") %>%
                  select(rid, X__8) %>%
                  rename("Department" = "X__8"),
                by = "rid") %>%
      select(-rid)
    
    #  `Emp #` Reason                                                    Department     
    #   <chr>   <chr>                                                     <chr>          
    # 1 10282   I think below are areas of improvement within my team CS~ Corporate Solu~
    # 2 10308   better oppurtunity                                        Corporate Solu~
    # 3 11230   Moving on another organization for career persuade        Region Central~
    # 4 13370   Get a new job outside the company.                        Region North   
    # 5 14694   Health Issues                                             Finance Operat~