Search code examples
rtwitter

Extracting users from twitter status in R


I am trying to find out how often a specific user has tweeted to/mentioned another user. Through the twitteR-package I can retrieve the tweets for a given user, however if a tweet mentions several users only the first is mentioned in the replyToUID-field. So the first column of my data frame contains the tweets as for example:

"@user1 @user2 have you read what @user3 wrote?"

and I would like to extract the usernames to a list like this

  • user1
  • user2
  • user3

with users from the next tweet being added below. If someone knows how to do (the extraction, I can deal with loops) it or point me in the right direction it would be much apprechiated.

Optionally, for the real helpful, if you have an idea how to compound the list that in the end (after n tweets have been processed), instead of

  • user1
  • user2
  • user3
  • user1
  • user3
  • user4

the list (or then table) reads like this (counting how often a certain user has been mentioned)

  • user1, 2
  • user2, 1
  • user3, 2
  • user4, 1

it would be even more apprechiated.

Thank you, Elias


Solution

  • I'm not sure what the rules are for a valid twitter user name, but assuming only alphanumeric characters are allowed, you can do it with a simple regular expression:

    x <- "@user1 @user2 have you read what @user3 wrote?"
    
    users <- function(x){
      xx <- strsplit(x, " ")
      lapply(xx, function(xx)xx[grepl("@[[:alnum:]]", xx)])
    }
    
    users(x)
    [[1]]
    [1] "@user1" "@user2" "@user3"
    

    In addition, this solution also assumes that all words are split with spaces, i.e. it won't work for user names followed by punctuation marks. You'll have to extend this answer to cope with that scenario.