Search code examples
rrow

Duplicating a row based on number of elements filled for that corresponding row


I have a data frame that is resembled by the example below:

Code:

   DF <- data.frame(lapply(data.frame(Company = c("Company1", "Company2", "Company3", "Company4"),
                                  Col1 = c("test 1", "", "", ""),
                                   Col2 = c("", "test 2", "", "test 3"),
                                   Col3 = c("", "", "", "test 4")), as.character), stringsAsFactors=FALSE)

and the data frame looks like this:

    Company      Col1   Col2   Col3 
1   Company1    test 1                  
2   Company2           test 2           
3   Company3           
4   Company4           test 3  test 4  

I would like to achieve the following output data frame:

    Company      Col   
1   Company1    test 1                  
2   Company2    test 2           
3   Company4    test 3
4   Company4    test 4

The output is basically assigning a single available element from columns Col1, Col2 and Col3 for a particular reference column (Company) in the row. If the row has two or more values (like in row number 3), then it should create duplicate rows and assign a single element to it in no particular order.

It is fine to also have a blank against Company 3 in the output (Col) as long as every Company is accounted.

Any help would be greatly appreciated since I have had no luck with keywords for coming up with code to do the above. The closest code has been stackoverflow question but it doesn't address the task of duplicating a reference column (which is "Company" in my case).


Solution

  • Here is a tidyverse approach. Reshape your data from wide to long format and filter for cases were column 'Col' is not "".

    library(tidyverse)
    DF %>% 
      gather('value', 'Col', -Company) %>% 
      filter(Col != "") %>% 
      select(-value)
    #   Company    Col
    #1 Company1 test 1
    #2 Company2 test 2
    #3 Company4 test 3
    #4 Company4 test 4