Search code examples
rcopy-paste

Combine text elements in a data frame and remove rows the text came from


This toy data frame represents time entries by Persons. The format available to me has multiple Text entries for the same person and day in a totally random pattern. There can be as many as 15 Text entries for the same person and day. No Person entry is in the row for a multiple-text entry.

structure(list(Date = structure(c(1514764800, 1514764800, NA, 
1517443200, 1519862400, NA, NA, NA, 1519862400, NA, NA), class = c("POSIXct", 
"POSIXt"), tzone = "UTC"), Person = c("FMC", "ABC", NA, "FMC", 
"ABC", NA, NA, NA, "RWM", NA, NA), Text = c("work on request", 
"More text", "third line", "email to re: summary", "work on loan documents", 
"sixth line of text", "text seven", "eighth in a series", "conferences with working group", 
"line ten", "review and provide comments")), row.names = c(NA, 
-11L), class = c("tbl_df", "tbl", "data.frame"))

How can I combine the text elements so that there is only one line for each person's entry for each day, drop the unneeded rows (once the text is pasted together) and arrive at the following object?

enter image description here

EDITED QUESTION omits the for loop I tried unsuccessfully.

There must be a way to combine all the text into one row for a given person for a given day (for example, ABC has two entries on 1/1/2018) and delete the row(s) from which the combined text came.


Solution

  • No need to get complicated, just use tidyverse.

    Adjusted per changes to question:

    library(tidyverse)
    
    > df%>%
       fill(Date:Person, Date:Person) %>% # Fills missing values in using the previous entry.
       group_by(Date, Person) %>%
       summarise(Text = paste(Text, collapse = ' '))
    
    # A tibble: 5 x 3
      Date                Person Text                                                                   
      <dttm>              <chr>  <chr>                                                                  
    1 2018-01-01 00:00:00 ABC    More text third line                                                   
    2 2018-01-01 00:00:00 FMC    work on request                                                        
    3 2018-02-01 00:00:00 FMC    email to re: summary                                                   
    4 2018-03-01 00:00:00 ABC    work on loan documents sixth line of text text seven eighth in a series
    5 2018-03-01 00:00:00 RWM    conferences with working group line ten review and provide comments   
    

    Data:

    # A tibble: 11 x 3
       Date                Person Text                          
       <dttm>              <chr>  <chr>                         
     1 2018-01-01 00:00:00 FMC    work on request               
     2 2018-01-01 00:00:00 ABC    More text                     
     3 NA                  NA     third line                    
     4 2018-02-01 00:00:00 FMC    email to re: summary          
     5 2018-03-01 00:00:00 ABC    work on loan documents        
     6 NA                  NA     sixth line of text            
     7 NA                  NA     text seven                    
     8 NA                  NA     eighth in a series            
     9 2018-03-01 00:00:00 RWM    conferences with working group
    10 NA                  NA     line ten                      
    11 NA                  NA     review and provide comments