I have a data frame that is resembled by the example below:
Code:
DF <- data.frame(lapply(data.frame(Company = c("Company1", "Company2", "Company3", "Company4"),
Col1 = c("test 1", "", "", ""),
Col2 = c("", "test 2", "", "test 3"),
Col3 = c("", "", "", "test 4")), as.character), stringsAsFactors=FALSE)
and the data frame looks like this:
Company Col1 Col2 Col3
1 Company1 test 1
2 Company2 test 2
3 Company3
4 Company4 test 3 test 4
I would like to achieve the following output data frame:
Company Col
1 Company1 test 1
2 Company2 test 2
3 Company4 test 3
4 Company4 test 4
The output is basically assigning a single available element from columns Col1, Col2 and Col3 for a particular reference column (Company) in the row. If the row has two or more values (like in row number 3), then it should create duplicate rows and assign a single element to it in no particular order.
It is fine to also have a blank against Company 3 in the output (Col) as long as every Company is accounted.
Any help would be greatly appreciated since I have had no luck with keywords for coming up with code to do the above. The closest code has been stackoverflow question but it doesn't address the task of duplicating a reference column (which is "Company" in my case).
Here is a tidyverse
approach. Reshape your data from wide to long format and filter for cases were column 'Col' is not ""
.
library(tidyverse)
DF %>%
gather('value', 'Col', -Company) %>%
filter(Col != "") %>%
select(-value)
# Company Col
#1 Company1 test 1
#2 Company2 test 2
#3 Company4 test 3
#4 Company4 test 4