I retrieved Twitter tweets for various hashtags with different tracking period. For example, hashtag1 was tracked for 6 days, Hashtag2 tracked for 4 days, Hashtag3 tracked for 2 days. How can I normalize each hashtag? How can I divide them into equal quarters? Thanks in advance...Here is the code ......>
library(streamR)
library(rjson)
setwd("/Users/Desktop")
Tweets = parseTweets("Hashtag1.json")
table(Tweets$created_at)
dated_Tweets <- as.POSIXct(Tweets$created_at, format = "%a %b %d %H:%M:%S
+0000 %Y")
hist(dated_Tweets, breaks="hours", freq=TRUE, xlab="dated_Tweets", main=
"Distribution of tweets", col="blue")
I think your main stumbling block is to convert date-times to 6-hour bins. You can achieve this with format.POSIXct
and cut
. Here is a suggestion, complete with a histogram. There are many ways to do the histograms, maybe you will prefer a table instead.
library(magrittr)
library(ggplot2)
## create some tweet times
hash1 <- lubridate::ymd("20170101") + lubridate::seconds(runif(100, 0, 10*86400))
hash2 <- lubridate::ymd("20170101") + lubridate::seconds(runif(100, 0, 31*86400))
hash3 <- lubridate::ymd("20170101") + lubridate::seconds(runif(300, 0, 5*86400))
## bin these into 6h intervals
bins1 <- format(hash1, "%H") %>%
as.numeric() %>%
cut(breaks=c(0,6,12,18,24), include.lowest = TRUE)
hTags <- data.frame(tag="#1", bins=bins1)
bins2 <- format(hash2, "%H") %>%
as.numeric() %>%
cut(breaks=c(0,6,12,18,24), include.lowest = TRUE)
hTags <- rbind(hTags,
data.frame(tag="#2", bins=bins2 ))
bins3 <- format(hash3, "%H") %>%
as.numeric() %>%
cut(breaks=c(0,6,12,18,24), include.lowest = TRUE)
hTags <- rbind(hTags,
data.frame(tag="#3", bins=bins3 ))
ggplot(data=hTags, aes(x=bins, fill=tag)) + geom_bar(position="dodge", aes(y=..prop.., group=tag))