Search code examples
rdplyrstring-concatenation

How do I concatenate consecutive rows in R based on a unique set of ID variables and a consecutively increasing and resetting variable?


An example data frame:

    df <- data.frame(ID = c(1, 1, 2, 2, 2, 2, 2),
                Name = c("Alice", "Alice", "Bob", "Bob", "Bob", "Bob", "Bob"),
                 Age = c(25, 25, 30, 30, 30, 30, 30),
                 LINE = c(1, 2, 1, 2, 1, 2, 3),
                 NOTE_TEXT = c("This is the fir", "st note",
                               "This is the seco", "nd note",
                               "This is ", "the th", "ird note"))

Essentially my complete "Note" due to character constraints from the source pull of data is split into substrings of "NOTE_TEXT" across multiple rows. The substrings that belong to the same "Note" are consecutively listed by another variable called "LINE" which can be anywhere from 1 to 65 (the vast majority are within 4 lines). I want to consolidate the "NOTE_TEXT"s belonging to the same "Note" into one line, and also make a new variable to denote the uniqueness of each "Note" belonging to the same set of ID, Name, Age, variables.

The resultant dataframe would look like this:

    data.frame(ID = c(1, 2, 2),
                 Name = c("Alice", "Bob", "Bob"),
                 Age = c(25, 30, 30),
                 Note = c(1, 1, 2),
                 NOTE_TEXT = c("This is the first note",
                               "This is the second note",
                               "This is the third note"))

I imagine I need to use some sort of for loop to loop through "LINE" for each set of unique variables, but I'm not sure where to start. Thanks for any help!


Solution

  •   df |>
        mutate(note_num = cumsum(LINE == 1), .by = c(ID, Name, Age)) |>
        summarize(NOTE_TEXT = paste(NOTE_TEXT, collapse = ""), 
               .by = c(ID, Name, Age, note_num))
    

    Result

      ID  Name Age note_num               NOTE_TEXT
    1  1 Alice  25        1  This is the first note
    2  2   Bob  30        1 This is the second note
    3  2   Bob  30        2  This is the third note