Search code examples
rtwitterhistogramnormalizationtweets

How to normalize tweets in Histogram using R language?


I retrieved Twitter tweets for various hashtags with different tracking period. For example, hashtag1 was tracked for 6 days, Hashtag2 tracked for 4 days, Hashtag3 tracked for 2 days. How can I normalize each hashtag? How can I divide them into equal quarters? Thanks in advance...Here is the code ......>

    library(streamR)
    library(rjson)

    setwd("/Users/Desktop")
    Tweets = parseTweets("Hashtag1.json")
    table(Tweets$created_at)

    dated_Tweets <- as.POSIXct(Tweets$created_at, format = "%a %b %d %H:%M:%S   
    +0000 %Y")

    hist(dated_Tweets, breaks="hours", freq=TRUE, xlab="dated_Tweets", main= 
    "Distribution of tweets", col="blue")

Solution

  • I think your main stumbling block is to convert date-times to 6-hour bins. You can achieve this with format.POSIXct and cut. Here is a suggestion, complete with a histogram. There are many ways to do the histograms, maybe you will prefer a table instead.

       library(magrittr)
       library(ggplot2)
       ## create some tweet times
       hash1 <- lubridate::ymd("20170101") + lubridate::seconds(runif(100, 0, 10*86400))
       hash2 <- lubridate::ymd("20170101") + lubridate::seconds(runif(100, 0, 31*86400))
       hash3 <- lubridate::ymd("20170101") + lubridate::seconds(runif(300, 0, 5*86400))
       ## bin these into 6h intervals
       bins1 <- format(hash1, "%H") %>%
           as.numeric() %>%
               cut(breaks=c(0,6,12,18,24), include.lowest = TRUE)
       hTags <- data.frame(tag="#1", bins=bins1)
       bins2 <- format(hash2, "%H") %>%
           as.numeric() %>%
               cut(breaks=c(0,6,12,18,24), include.lowest = TRUE)
       hTags <- rbind(hTags,
                      data.frame(tag="#2", bins=bins2 ))
       bins3 <- format(hash3, "%H") %>%
           as.numeric() %>%
               cut(breaks=c(0,6,12,18,24), include.lowest = TRUE)
       hTags <- rbind(hTags,
                      data.frame(tag="#3", bins=bins3 ))
       ggplot(data=hTags, aes(x=bins, fill=tag)) + geom_bar(position="dodge", aes(y=..prop.., group=tag))
    

    enter image description here