Search code examples
pythonplotlineanalysis

Plotting sentiment analysis over time in python


I am trying to plot the results of my sentiment analysis over time. The code involves comments from a forum. An example of my code looks something like this:

Timestamp            Sentiment
2021-01-28 21:37:41  Positive
2021-01-28 21:32:10  Negative
2021-01-29 21:30:35  Positive
2021-01-29 21:28:57  Neutral
2021-01-29 21:26:56  Negative

I would like to plot a line graph with just the date from the timestamp on the x-axis, and then a separate line for the value counts of the "sentiment" column. So 3 lines total, one for each of the sentiments (positive, negative and neutral) with the y axis representing the count. I think I need to somehow use groupby() but I cannot figure out how.


Solution

  • My solution is a bit convoluted, and you should probably enhance the graph later to fit what you want (like a stacked bar).

    First, let's separate your dataframe timestamp into the dates.

    import pandas as pd
    import matplotlib.pyplot as plt
    example = {'Timestamp':
              ['2021-01-28 21:37:41', '2021-01-28 21:32:10', '2021-01-29 21:30:35',
               '2021-01-29 21:28:57', '2021-01-29 21:26:56'],
               'Sentiment':
               ['Positive', 'Negative', 'Positive', 'Neutral', 'Negative']}
    df = pd.DataFrame(example)
    df['Timestamp'] = pd.to_datetime(df['Timestamp'])
    df['Date'] = df['Timestamp'].dt.date
    

    Then, let's groupby the date, and count the unique values.

    grouped = df.groupby(by='Date')['Sentiment'].value_counts()
    

    Output:

    Date        Sentiment
    2021-01-28  Negative     1
                Positive     1
    2021-01-29  Negative     1
                Neutral      1
                Positive     1
    Name: Sentiment, dtype: int64
    

    This is a multi index series. To get it in a better format, we can unstack the multi index.

    unstacked = grouped.unstack(level=1)
    

    Then, we can plot on the object directly, unstacked.plot.bar(). This is the result.

    Output