Search code examples
rstringtwitter

Duplicate row and string manipulation in R


I have a dataframe in R which has some rows as follows:

c("LouDobbs", "gen_jackkeane") || RT @LouDobbs: #AmericaFirst- @gen_jackkeane: The Taliban for 9 months have told their fighters to kill as many people as you can, to includ…

above is an example of 2 columns where column 1 (I am using separator ||) has more than one username and column 2 has the tweet text. I want that this row should be duplicated into 2 (number of users) and each individual user singly can be placed in column 1 for all such rows in the data frame where more than 1 user is listed against the tweet text.

structure(list(user = list("Dandhy_Laksono", c("LouDobbs", "gen_jackkeane"
), "DeepStateExpose", "AndruewJamess", "jrossman12", "BiLLRaY2019", 
    "DeepStateExpose", "Dandhy_Laksono", "DeepStateExpose", "DeepStateExpose"), 
    full_text = c("RT @Dandhy_Laksono: Sebagian pendukung Jokowi ini mengalami bagaimana fitnah \"komunis dan PKI\" digunakan selama pemilu.\n\nSekarang mereka me…", 
    "RT @LouDobbs: #AmericaFirst- @gen_jackkeane: The Taliban for 9 months have told their fighters to kill as many people as you can, to includ…", 
    "RT @DeepStateExpose: The Only Reason The Deep State Cabal Has Stayed in Afghanistan For 18 Years Is To Protect Their Largest Poppy/Opium/Na…", 
    "RT @AndruewJamess: @BillOReilly @KamalaHarris is wrong. @realDonaldTrump has accomplished a lot. He set a record for  incoherent toilet twe…", 
    "RT @jrossman12: @SaraCarterDC Pakistan won't allow that as you already know. Your husband and the other U.S. troops have been forced to fig…", 
    "RT @BiLLRaY2019: JOKOWI TIDAK MEMBUNUH KPK..!\nMarkibong…\"Selamat tinggal Taliban di dalam KPK. Kalian kalah lagi, kalah lagi..!\"\n\n#JumatBer…", 
    "RT @DeepStateExpose: The Only Reason The Deep State Cabal Has Stayed in Afghanistan For 18 Years Is To Protect Their Largest Poppy/Opium/Na…", 
    "RT @Dandhy_Laksono: Sebagian pendukung Jokowi ini mengalami bagaimana fitnah \"komunis dan PKI\" digunakan selama pemilu.\n\nSekarang mereka me…", 
    "RT @DeepStateExpose: The Only Reason The Deep State Cabal Has Stayed in Afghanistan For 18 Years Is To Protect Their Largest Poppy/Opium/Na…", 
    "RT @DeepStateExpose: The Only Reason The Deep State Cabal Has Stayed in Afghanistan For 18 Years Is To Protect Their Largest Poppy/Opium/Na…"
    )), row.names = c(NA, 10L), class = "data.frame")

Solution

  • We can use lengths to get the length of each of the elements of the list column. It should be fast enough as lengths is fast

    l1 <- lengths(df$user)
    out <- data.frame(user = unlist(df$user), n = rep(l1, l1),
              text = rep(df$full_text, l1))