Search code examples
rrtweet

Compute pairwise similarity over time


I am trying to compute the pairwise similarity between accounts using similar hashtags over time.

I have code (below) that gives me the pairwise similarity between accounts for the most recent 300 tweets sent by each account. However, I would like to compute the pairwise similarity between accounts for specific slices of time (day, week, month). How can I do that?

library(rtweet)
library(widyr)
library(tidyverse)

rstats <- search_users("rstats", n = 10)
 
rstats_tmls <- get_timeline(rstats$user_id, n = 300)

rstats_tmls %>%
   unnest(hashtags) %>%
   count(user_id, hashtags) %>%
   pairwise_similarity(user_id, hashtags, n, sort = T, upper = FALSE)


# A tibble: 45 x 3
   item1               item2              similarity
   <chr>               <chr>                   <dbl>
 1 2170413740          792007388358410240      1.00 
 2 2170413740          961691888939126784      1.00 
 3 792007388358410240  961691888939126784      1.00 
 4 1153678152838852614 2170413740              1.00 
 5 1153678152838852614 792007388358410240      1.00 
 6 1153678152838852614 961691888939126784      1.00 
 7 2170413740          824037040996098049      0.998
 8 792007388358410240  824037040996098049      0.998
 9 824037040996098049  961691888939126784      0.998
10 1153678152838852614 824037040996098049      0.998


Solution

  • Using group_by() should work:

    rstats_tmls %>%
      mutate(year = lubridate::year(created_at), 
             week = lubridate::week(created_at)) %>% 
      unnest(hashtags) %>%
      group_by(year, week) %>% 
      count(user_id, hashtags) %>%
      pairwise_similarity(user_id, hashtags, n, sort = T, upper = FALSE)
    
    
    # # A tibble: 204 × 5
    # # Groups:   year, week [112]
    #    year  week item1      item2              similarity
    #   <dbl> <dbl> <chr>      <chr>                   <dbl>
    # 1  2014     3 2170413740 559211484               0.5  
    # 2  2014    11 2170413740 559211484               0.707
    # 3  2017    28 2170413740 824037040996098049      1    
    # 4  2017    29 2170413740 824037040996098049      0.986
    # 5  2017    30 2170413740 824037040996098049      1    
    # 6  2017    32 2170413740 824037040996098049      0.949
    # 7  2017    33 2170413740 824037040996098049      0.962
    # 8  2017    34 2170413740 824037040996098049      0.981
    # 9  2017    36 2170413740 824037040996098049      0.707
    # 10  2017    37 2170413740 824037040996098049      0.943