Search code examples
rggplot2time-seriesfrequency

ggplot2 create time frequency


I am having a hard time to create ggplot2 from my data. I need to create a plot should look like this: enter image description here

If you can give some advice about it will be really good for my research. Thank you for your time and effort in advance.

A very small sample of data set (df) is looks like this:

tweet_created_at     hashtag_text
2015-05-08 00:07:58  ogretmenemayistamujdehazirandaatama
2015-05-08 00:07:58  onlarkonusurakpartiyapar
2015-05-08 00:10:48  ogretmenemayistamujdehazirandaatama
2015-05-08 00:10:48  onlarkonusurakpartiyapar
2015-05-08 02:50:03  onlarkonusurakpartiyapar
2015-05-08 00:10:56  ogretmenemayistamujdehazirandaatama
2015-05-08 00:10:56  onlarkonusurakpartiyapar
2015-05-08 02:53:13  onlarkonusurakpartiyapar
2015-05-08 02:53:13  pinokyokemal
2015-05-08 00:11:03  ogretmenemayistamujdehazirandaatama
2015-05-08 00:11:03  onlarkonusurakpartiyapar
2015-05-08 00:11:06  ogretmenemayistamujdehazirandaatama
2015-05-08 00:11:06  onlarkonusurakpartiyapar
2015-05-08 02:53:48  bingolunkararibuyumenindevami
2015-05-08 02:53:48  onlarkonusurakpartiyapar
2015-05-08 00:11:17  ogretmenemayistamujdehazirandaatama
2015-05-08 00:11:17  onlarkonusurakpartiyapar
2015-05-08 00:16:21  ogretmenemayistamujdehazirandaatama
2015-05-08 00:16:21  onlarkonusurakpartiyapar

I used this script but I didn't figure out to create frequency part:

ggplot(data=df,
       aes(x=as.POSIXct(tweet_created_at), y=hashtag_text,color=hashtag_text)) +
  geom_line()

I know that the value for y axis is not correct but I didn't find the right version for it. It creates something like this:

enter image description here

PS: There are hundreds hashtags in my data set so I need to choose top 25 hashtags.


Solution

  • You can use geom_freqpoly. If your tweet_created_at variable isn't POSIXct yet, transform it:

    df$tweet_created_at <-  as.POSIXct(df$tweet_created_at )
    

    Then find your most frequent hashtags and create a select variable:

    #will look for top 2 now, easily expanded to 25
    hashtag_table <- sort(table(df$hashtag_text),decreasing=T)
    df$select <- as.character(df$hashtag_text) %in% names(hashtag_table)[1:2]
    

    Then plot:

    p1 <- ggplot(df[df$select,], 
    aes(x=tweet_created_at,group=hashtag_text, colour=hashtag_text)) +
      geom_freqpoly(binwidth=30*60) #as POSIXct, bindwidth in seconds. Now 30 min
    

    Results in (with facets because overlapping data) enter image description here