Search code examples
rstatisticsanalyticshashtagsentiment-analysis

extracting hashtags from tweets


I am trying to perform sentiment analysis and facing a small problem. I am using a dictionary which has hashtags and some other junk value(shown below). It also has associated weight of the hashtag. I want to extract only the hashtags and its corresponding weight into a new data frame. Is there any easy way to do it? I have tried using regmatches, but some how its giving output in list format and is messing things up. Input:

            V1    V2
1    #fabulous 7.526
2   #excellent 7.247
3      superb 7.199
4  #perfection 7.099
5    #terrific 6.922
6 #magnificent 6.672

Output:

            V1    V2
1    #fabulous 7.526
2   #excellent 7.247
3  #perfection 7.099
4    #terrific 6.922
5 #magnificent 6.672

Solution

  • This code should work and will give you desired output as data.frame

     Input<- data.frame(V1 = c("#fabulous","#excellent","superb","#perfection","#terrific","#magnificent"), V2 = c("7.526",  "7.247" , "7.199", "7.099",  "6.922", "6.672")) 
     extractHashtags <- Input[which(substr(Input$V1,1,1) == "#"),]
     View(extractHashtags)