I am having a hard time to create ggplot2 from my data. I need to create a plot should look like this:
If you can give some advice about it will be really good for my research. Thank you for your time and effort in advance.
A very small sample of data set (df) is looks like this:
tweet_created_at hashtag_text
2015-05-08 00:07:58 ogretmenemayistamujdehazirandaatama
2015-05-08 00:07:58 onlarkonusurakpartiyapar
2015-05-08 00:10:48 ogretmenemayistamujdehazirandaatama
2015-05-08 00:10:48 onlarkonusurakpartiyapar
2015-05-08 02:50:03 onlarkonusurakpartiyapar
2015-05-08 00:10:56 ogretmenemayistamujdehazirandaatama
2015-05-08 00:10:56 onlarkonusurakpartiyapar
2015-05-08 02:53:13 onlarkonusurakpartiyapar
2015-05-08 02:53:13 pinokyokemal
2015-05-08 00:11:03 ogretmenemayistamujdehazirandaatama
2015-05-08 00:11:03 onlarkonusurakpartiyapar
2015-05-08 00:11:06 ogretmenemayistamujdehazirandaatama
2015-05-08 00:11:06 onlarkonusurakpartiyapar
2015-05-08 02:53:48 bingolunkararibuyumenindevami
2015-05-08 02:53:48 onlarkonusurakpartiyapar
2015-05-08 00:11:17 ogretmenemayistamujdehazirandaatama
2015-05-08 00:11:17 onlarkonusurakpartiyapar
2015-05-08 00:16:21 ogretmenemayistamujdehazirandaatama
2015-05-08 00:16:21 onlarkonusurakpartiyapar
I used this script but I didn't figure out to create frequency part:
ggplot(data=df,
aes(x=as.POSIXct(tweet_created_at), y=hashtag_text,color=hashtag_text)) +
geom_line()
I know that the value for y axis is not correct but I didn't find the right version for it. It creates something like this:
PS: There are hundreds hashtags in my data set so I need to choose top 25 hashtags.
You can use geom_freqpoly
.
If your tweet_created_at
variable isn't POSIXct yet, transform it:
df$tweet_created_at <- as.POSIXct(df$tweet_created_at )
Then find your most frequent hashtags and create a select variable:
#will look for top 2 now, easily expanded to 25
hashtag_table <- sort(table(df$hashtag_text),decreasing=T)
df$select <- as.character(df$hashtag_text) %in% names(hashtag_table)[1:2]
Then plot:
p1 <- ggplot(df[df$select,],
aes(x=tweet_created_at,group=hashtag_text, colour=hashtag_text)) +
geom_freqpoly(binwidth=30*60) #as POSIXct, bindwidth in seconds. Now 30 min