I want to create a linear regression model for the rent over a number of years in Tokyo. So far, I have managed to do the scatter plot with seaborn. However, when I try to do linear regression, the error UFuncTypeError: ufunc 'multiply' did not contain a loop with signature matching types (dtype('<U32'), dtype('<U32')) -> dtype('<U32') pops up and there is no line of regression (but the scatter plot seems to work fine).
My code
df = pd.read_csv('Tokyo rent (One Bedroom apartment in the city centre).csv', thousands=',')
g = sns.lmplot(x='Date 2.0', y='Rent(Local Currency)', data=df)
g.figure.autofmt_xdate()
My CSV File
Year Rent(USD) Date Rent(Local Currency) Date 2.0
2017 1223.4 17/03/17 129,594.59 2017-03-17
2017 1070.24 28/10/17 121,656.25 2017-10-28
2018 1104.51 23/01/18 121,689.66 2018-01-23
2018 1030.61 22/10/18 116,270.83 2018-10-22
2019 1124.33 14/06/19 122,062.50 2019-06-14
2019 1129.6 20/06/19 121,255.32 2019-06-20
2019 1129.9 21/06/19 121,255.32 2019-06-21
2020 1198.53 23/03/20 128,701.75 2020-03-23
2020 1183.66 01/07/20 127,195.65 2020-07-01
2020 1213.38 17/09/20 127,466.67 2020-09-17
2020 1168.37 05/10/20 123,578.95 2020-10-05
2020 1192.5 11/11/20 125,525.00 2020-11-11
2020 1228.34 02/12/20 128,312.50 2020-12-02
2021 1220 06/03/21 132,200.00 2021-03-06
2021 1342.84 29/08/21 147,524.40 2021-08-29
2021 1284.65 14/10/21 145,696.54 2021-10-14
My scatter plot result (sorry for the crushed dates)
In this case, the x-axis is a time series, so once we convert it to matplotlib's date numeric format, we can display the regression line. You can then change the x-axis display to a time series.
import pandas as pd
import numpy as np
import io
import matplotlib.dates as mdates
data = '''
Year Rent(USD) Date "Rent(Local Currency)" "Date 2.0"
2017 1223.4 17/03/17 129,594.59 2017-03-17
2017 1070.24 28/10/17 121,656.25 2017-10-28
2018 1104.51 23/01/18 121,689.66 2018-01-23
2018 1030.61 22/10/18 116,270.83 2018-10-22
2019 1124.33 14/06/19 122,062.50 2019-06-14
2019 1129.6 20/06/19 121,255.32 2019-06-20
2019 1129.9 21/06/19 121,255.32 2019-06-21
2020 1198.53 23/03/20 128,701.75 2020-03-23
2020 1183.66 01/07/20 127,195.65 2020-07-01
2020 1213.38 17/09/20 127,466.67 2020-09-17
2020 1168.37 05/10/20 123,578.95 2020-10-05
2020 1192.5 11/11/20 125,525.00 2020-11-11
2020 1228.34 02/12/20 128,312.50 2020-12-02
2021 1220 06/03/21 132,200.00 2021-03-06
2021 1342.84 29/08/21 147,524.40 2021-08-29
2021 1284.65 14/10/21 145,696.54 2021-10-14
'''
df = pd.read_csv(io.StringIO(data), delim_whitespace=True, thousands=',')
df['Date 2.0'] = pd.to_datetime(df['Date 2.0'])
df['Date 2.0'] = mdates.date2num(df['Date 2.0'])
import seaborn as sns
g = sns.lmplot(x='Date 2.0', y='Rent(Local Currency)', data=df)
locator = mdates.AutoDateLocator()
formatter = mdates.ConciseDateFormatter(locator)
g.ax.xaxis.set_major_locator(locator)
g.ax.xaxis.set_major_formatter(formatter)
# while running in python code other than jupyter
import matplotlib.pyplot as plt
plt.show()