Search code examples
pythonjupyter-notebookseabornlinear-regressionscatter-plot

No line of regression appear when I use regplot and lmplot for seaborn using imported CSV data


I want to create a linear regression model for the rent over a number of years in Tokyo. So far, I have managed to do the scatter plot with seaborn. However, when I try to do linear regression, the error UFuncTypeError: ufunc 'multiply' did not contain a loop with signature matching types (dtype('<U32'), dtype('<U32')) -> dtype('<U32') pops up and there is no line of regression (but the scatter plot seems to work fine).

My code

df = pd.read_csv('Tokyo rent (One Bedroom apartment in the city centre).csv', thousands=',')
g = sns.lmplot(x='Date 2.0', y='Rent(Local Currency)', data=df)
g.figure.autofmt_xdate()

My CSV File

Year    Rent(USD)   Date    Rent(Local Currency)    Date 2.0
2017    1223.4  17/03/17    129,594.59              2017-03-17
2017    1070.24 28/10/17    121,656.25              2017-10-28
2018    1104.51 23/01/18    121,689.66              2018-01-23
2018    1030.61 22/10/18    116,270.83              2018-10-22
2019    1124.33 14/06/19    122,062.50              2019-06-14
2019    1129.6  20/06/19    121,255.32              2019-06-20
2019    1129.9  21/06/19    121,255.32              2019-06-21
2020    1198.53 23/03/20    128,701.75              2020-03-23
2020    1183.66 01/07/20    127,195.65              2020-07-01
2020    1213.38 17/09/20    127,466.67              2020-09-17
2020    1168.37 05/10/20    123,578.95              2020-10-05
2020    1192.5  11/11/20    125,525.00              2020-11-11
2020    1228.34 02/12/20    128,312.50              2020-12-02
2021    1220    06/03/21    132,200.00              2021-03-06
2021    1342.84 29/08/21    147,524.40              2021-08-29
2021    1284.65 14/10/21    145,696.54              2021-10-14

My scatter plot result (sorry for the crushed dates)

enter image description here


Solution

  • In this case, the x-axis is a time series, so once we convert it to matplotlib's date numeric format, we can display the regression line. You can then change the x-axis display to a time series.

    import pandas as pd
    import numpy as np
    import io
    import matplotlib.dates as mdates
    
    data = '''
    Year    Rent(USD)   Date    "Rent(Local Currency)"    "Date 2.0"
    2017    1223.4  17/03/17    129,594.59              2017-03-17
    2017    1070.24 28/10/17    121,656.25              2017-10-28
    2018    1104.51 23/01/18    121,689.66              2018-01-23
    2018    1030.61 22/10/18    116,270.83              2018-10-22
    2019    1124.33 14/06/19    122,062.50              2019-06-14
    2019    1129.6  20/06/19    121,255.32              2019-06-20
    2019    1129.9  21/06/19    121,255.32              2019-06-21
    2020    1198.53 23/03/20    128,701.75              2020-03-23
    2020    1183.66 01/07/20    127,195.65              2020-07-01
    2020    1213.38 17/09/20    127,466.67              2020-09-17
    2020    1168.37 05/10/20    123,578.95              2020-10-05
    2020    1192.5  11/11/20    125,525.00              2020-11-11
    2020    1228.34 02/12/20    128,312.50              2020-12-02
    2021    1220    06/03/21    132,200.00              2021-03-06
    2021    1342.84 29/08/21    147,524.40              2021-08-29
    2021    1284.65 14/10/21    145,696.54              2021-10-14
    '''
    
    df = pd.read_csv(io.StringIO(data), delim_whitespace=True, thousands=',')
    
    df['Date 2.0'] = pd.to_datetime(df['Date 2.0'])
    df['Date 2.0'] = mdates.date2num(df['Date 2.0'])
    
    import seaborn as sns
    
    g = sns.lmplot(x='Date 2.0', y='Rent(Local Currency)', data=df)
    
    locator = mdates.AutoDateLocator()
    formatter = mdates.ConciseDateFormatter(locator)
    
    g.ax.xaxis.set_major_locator(locator)
    g.ax.xaxis.set_major_formatter(formatter)
    
    # while running in python code other than jupyter
    import matplotlib.pyplot as plt  
    plt.show()
    

    enter image description here