I have a data frame of columns, all characters, that I would like to join into one column, delimited by a _ or -, not joining NA's. Now this would usually be very easy, BUT problem is I don't know how many columns there will be in the data frame. Reason behind that is this data frame is created in a repeat group where the number of columns is a variable. Luckily the columns all have the same front name and are numbered, i.e. as below OptionCode1, OptionCode2, etc.
Reason behind doing this, if you're curious, is the OptionCode's pasted together make a unique code for another data frame that has the same ordering that I will then cbind into.
I could probably do this pretty easily in a foreach loop with paste for ncol(df) but I'd like to learn if there is a cleaner way to do this. In a vector language like R I assume there is but I don't know how and couldn't find it on SO anywhere.
Reproducible Examples:
Example 1:
Suppose I have data like this:
OptionCode1 <- c("aa", "bb", "cc", "dd")
OptionCode2 <- c("bb", NA, "dd", NA)
df1 <- data.frame(OptionCode1, OptionCode2)
I'd like the data like this:
OptionCodeMerge <- c("aa-bb","bb","cc-dd","dd")
df1.desired <- data.frame(OptionCodeMerge)
Example 2:
Suppose instead there are four columns:
OptionCode1 <- c("aa", "bb", "cc", "dd")
OptionCode2 <- c("bb", NA, "dd", NA)
OptionCode3 <- c("cc", "dd", NA, NA)
OptionCode4 <- c("dd", NA, "dd", NA)
df2 <- data.frame(OptionCode1, OptionCode2, OptionCode3, OptionCode4)
I'd like the data like this:
OptionCodeMerge2 <- c("aa-bb-cc-dd","bb-dd","cc-dd-dd","dd")
df2.desired <- data.frame(OptionCodeMerge2)
Thanks!
Here is a possible solution:
OptionCode1 <- c("aa", "bb", "cc", "dd")
OptionCode2 <- c("bb", NA, "dd", NA)
OptionCode3 <- c("cc", "dd", NA, NA)
OptionCode4 <- c("dd", NA, "dd", NA)
df2 <- data.frame(OptionCode1, OptionCode2, OptionCode3, OptionCode4)
data.frame(x=apply(df2,1,function(x) {paste(x[!is.na(x)],collapse='-')}))
Output:
x
1 aa-bb-cc-dd
2 bb-dd
3 cc-dd-dd
4 dd
If there are other columns in the data.frame
, you could replace df2
with
df2[,grepl('OptionCode',colnames(df2))]
Hope this helps!