Search code examples
rtwittertweetsedges

Creating edges (rows) for several mentions in one tweet


I have retrieved many tweets from twitter using the r package twitteR.

After I've done this successfully, my goal is to create edges for a network analysis based on the mentions in those tweets. For this purpose I used the following code to get twitter usernames which were mentioned in a tweet:

tweets <- read.csv(file="tweets.csv")

tweets$mentions <- str_extract_all(tweets$text, "@\\w+")

There are tweets in which more than one username is mentioned for example "usernameA, usernameB and usernameC", but they are together in one row. Now I would like to multiple the rows with those tweets that mention more than one username with the number of usernames in this tweets. At the same time only one username should show up per row in the end. Let me illustrate what I mean on the already used example:

At the time being I have a row with two columns (text, mentions):

  1. "text of the tweet"; "usernameA, userNameB, usernameC"

I would like to have three rows in this case:

  1. "text of the tweet"; "usernameA"
  2. "text of the tweet"; "usernameB"
  3. "text of the tweet"; "usernameC"

My problems are:

  1. How do I let r check for entries that consist of a list (c ("usernameA", "usernameB", ...) in a specified column?
  2. How do I tell r to multiple this certain entry x-1 times (x=number of mentions)?
  3. How do I get r to leave only one username in each row?

Solution

  • You can use plyr for your problem and split the data frame of tweets by the text column:

    plyr::ddply(tweets, c("text"), function(x){
        mention <- unlist(stringr::str_extract_all(x$text, "@\\w+"))
        # some tweets do not contain mentions, making this necessary:
        if (length(mention) > 0){
            return(data.frame(mention = mention))
        } else {
            return(data.frame(mention = NA))    
        }
    })
    

    Example:

    tweets <- data.frame(text = c("A tweet with text and @user1 and @user2.",
                                  "Another tweet @user3 and @user4 should hear about."))
    

    Running the above function returns:

                                                    text mention
    1           A tweet with text and @user1 and @user2.  @user1
    2           A tweet with text and @user1 and @user2.  @user2
    3 Another tweet @user3 and @user4 should hear about.  @user3
    4 Another tweet @user3 and @user4 should hear about.  @user4