Search code examples
rtwittertwitter-r

Identifying mention on a comment and populating data frame


I'm trying to get mentions on twitter data like @someone @somebody from a data frame twitter and creating a new data frame with the information of who tweeted and which people they mentioned.

Example:

tweets <- data.frame(user=c("people","person","ghost"),text = c("Hey, check this out 
@somebody @someone","love this @john","amazing"))

Resulting on this data frame:

**user     text**

*people   Hey, check this out @somebody @someone*

*person   love this @john*

*ghost    amazing*

The desired result is:

**id      mention**

*people  @somebody*

*people  @someone*

*person  john*

*ghost*

Can you guys help me , please?


Solution

  • You can do something like this by using library stringr:

    library(stringr)
    tweets$mention <- str_extract_all(tweets$text, '\\@\\S+')
    

    Output is as follows:

    tweets
    
        user                                     text             mention
    1 people Hey, check this out \n@somebody @someone @somebody, @someone
    2 person                          love this @john               @john
    3  ghost                                  amazing                    
    

    To get the output in long format, you can do something like this:

    library(dplyr)
    library(tidyr)
    tweets <- rbind(filter(tweets, !grepl('\\@', mention)), unnest(tweets))
    tweets <- tweets[, -2]
    

    Output is as follows:

        user   mention
    1  ghost          
    2 people @somebody
    3 people  @someone
    4 person     @john