Search code examples
pythonplotlytrendline

Trendline in Plotly Python


I am generating a plot in Python using Plotly, which shows data in a timeseries. I am using the following data from my SQLite database (as dates and lines below):

[(u'2015-12-08 00:00:00',), (u'2015-11-06 00:00:00',), (u'2015-11-06 00:00:00',), (u'2015-10-07 00:00:00',), (u'2015-10-06 00:00:00',), (u'2015-10-06 00:00:00',), (u'2015-09-17 00:00:00',), (u'2015-09-17 00:00:00',), (u'2015-09-17 00:00:00',), (u'2015-09-17 00:00:00',), (u'2015-09-16 00:00:00',), (u'2015-09-15 00:00:00',), (u'2015-09-15 00:00:00',), (u'2015-09-15 00:00:00',), (u'2015-08-30 00:00:00',), (u'2015-08-22 00:00:00',), (u'2015-08-22 00:00:00',), (u'2015-08-17 00:00:00',), (u'2015-08-09 00:00:00',), (u'2015-08-09 00:00:00',), (u'2015-08-08 00:00:00',), (u'2015-08-07 00:00:00',), (u'2015-07-28 00:00:00',), (u'2015-07-26 00:00:00',), (u'2015-07-22 00:00:00',), (u'2015-07-22 00:00:00',), (u'2015-07-22 00:00:00',), (u'2015-07-13 00:00:00',), (u'2015-07-13 00:00:00',), (u'2015-07-13 00:00:00',), (u'2015-07-13 00:00:00',), (u'2015-07-09 00:00:00',), (u'2015-07-09 00:00:00',), (u'2015-07-09 00:00:00',), (u'2015-07-09 00:00:00',), (u'2015-06-28 00:00:00',), (u'2015-06-28 00:00:00',), (u'2015-06-28 00:00:00',), (u'2015-06-16 00:00:00',), (u'2015-06-14 00:00:00',), (u'2015-06-14 00:00:00',), (u'2015-06-14 00:00:00',), (u'2015-06-04 00:00:00',), (u'2015-04-09 00:00:00',), (u'2015-03-31 00:00:00',), (u'2015-03-09 00:00:00',), (u'2015-03-09 00:00:00',), (u'2015-03-09 00:00:00',), (u'2015-03-09 00:00:00',), (u'2015-03-09 00:00:00',), (u'2015-03-09 00:00:00',)]
[(18,), (24,), (17,), (22,), (16,), (18,), (24,), (20,), (16,), (14,), (21,), (21,), (24,), (15,), (23,), (22,), (22,), (20,), (24,), (20,), (20,), (20,), (22,), (21,), (21,), (23,), (23,), (17,), (25,), (20,), (25,), (25,), (25,), (26,), (26,), (19,), (17,), (16,), (16,), (14,), (17,), (17,), (13,), (27,), (19,), (19,), (12,), (17,), (20,), (12,), (21,)]

Some data is overlapping (multiple instances in the same day), but presumably this would not matter for a fitted line. My code looks like this:

import sqlite3
import plotly.plotly as py
from plotly.graph_objs import *
import numpy as np

db = sqlite3.connect("Applications.db")
cursor = db.cursor()

cursor.execute('SELECT date FROM applications ORDER BY date(date) DESC')
dates = cursor.fetchall()
cursor.execute('SELECT lines FROM applications ORDER BY date(date) DESC')
lines = cursor.fetchall()

trace0 = Scatter(
    x=dates,
    y=lines,
    name='Amount of lines',
    mode='markers'
)
trace1 = Scatter(
    x=dates,
    y=lines,
    name='Fit',
    mode='markers'
)
data = Data([trace0, trace1])

py.iplot(data, filename = 'date-axes')

How do I make trace1 a fitted trendline base on this data? That is, a smooth representation showing the development of the data.


Solution

  • Per Plotly support: "Unfortunately fits aren't exposed through the API right now. We're working on add the fit GUI to the IPython interface though and eventually the API" (25th of September, 2015).

    I found the easiest way of doing this, after an inordinate amount of reading and googling, was through Matplotlib, Numbpy, and SciPy. Having cleaned up the data a bit, the following code worked:

    import plotly.plotly as py
    import plotly.tools as tls
    from plotly.graph_objs import *
    import numpy as np
    import matplotlib.pyplot as plt
    import matplotlib.dates as dates
    
    def line(x, a, b):
        return a * x + b
    
    popt, pcov = curve_fit(line, trend_dates.ravel(), trend_lines.ravel())
    
    fig1 = plt.figure(figsize=(8,6))
    plt.plot_date(new_x, trend_lines, 'o', label='Lines')
    z = np.polyfit(new_x, trend_lines, 1)
    p = np.poly1d(z)
    plt.plot(new_x, p(new_x), '-', label='Fit')
    plt.title('Lines per day')
    fig = tls.mpl_to_plotly(fig1)
    fig['layout'].update(showlegend=True)
    fig.strip_style()
    py.iplot(fig)
    

    Where essentially new_x are dates as expected by Matplotlib, and trend_lines regular data as in the question. This is not a full example, as a fair amount of the aforementioned data cleaning and importing of libraries precedes it, but it shows a way of getting the Plotly figure as output but going through Matplotlib, Numbpy, and SciPy.