Search code examples
rstringconcatenation

How to concatenate character strings based on condition in r?


I need to prepare queries that are made of characters strings (DOI, Digital Object Identifier) stored in a data frame. All strings associated with the same case have to be joined to produce one query.

The df looks like this:

Case DOI
1 1212313/dfsjk23
1 322332/jdkdsa12
2 21323/xsw.w3
2 311331313/q1231
2 1212121/1231312

The output should be a data frame looking like this:

Case Query
1 DO=(1212313/dfsjk23 OR 322332/jdkdsa12)
2 DO=(21323/xsw.w3 OR 311331313/q1231 OR 1212121/1231312)

The prefix ("DO="), suffix (")") and "OR" are not critical, I can add them later, but how to aggregate character strings based on a case number?


Solution

  • In base R you could do:

    aggregate(DOI~Case, df1, function(x) sprintf('DO=(%s)', paste0(x, collapse = ' OR ')))
      Case                                                     DOI
    1    1                 DO=(1212313/dfsjk23 OR 322332/jdkdsa12)
    2    2 DO=(21323/xsw.w3 OR 311331313/q1231 OR 1212121/1231312)
    

    if Using R 4.1.0

    aggregate(DOI~Case, df1, \(x)sprintf('DO=(%s)', paste0(x, collapse = ' OR ')))