I'm starting on a project where I want to create an interactive plot from this dataset:
For now I'm just trying to plot the first row from the 2000 to 2012 columns, for that I use this :
import pandas as pd
from bokeh.io import output_file
from bokeh.models import ColumnDataSource
from bokeh.plotting import figure
from bokeh.plotting import show
output_file('test.html')
df = pd.read_csv('Swedish_Population_Statistics.csv', encoding="ISO-8859-1")
df.dropna(inplace=True) # Drop rows with missing attributes
df.drop_duplicates(inplace=True) # Remove duplicates
# Drop all the column I don't use for now
df.drop(['region', 'marital_status', 'sex'], inplace=True, axis=1)
x = df.loc[[0]]
print(x)
Which gives me this dataframe
2000 | 2001 | 2002 | 2003 | 2004 | 2005 | 2006 | 2007 | 2008 | 2009 | 2010 | 2011 | 2012 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 10406 | 10362 | 10322 | 10288 | 10336 | 10336 | 10429 | 10585 | 10608 | 10718 | 10860 | 11121 | 11288 |
Now I want to take the column names as x-axis and the row values as y-axis.
This is where I'm stuck.
I figure the code would look like this but can't figure what to put in x and y
x = df.columns.tolist() #Take columns names into a list
y = df.loc[[0]].values.tolist() # Take the first row
source = ColumnDataSource(x, y)
p = figure(title="Test")
p.line(x='x', y='y', source=source, line_color="blue", line_width=2)
I get this error :
BokehUserWarning: ColumnDataSource's columns must be of the same length. Current lengths: ('x', 13), ('y', 1)
I don't understand why the lengths are not the same as I used tolist()
on both.
Any help would be very appreciated, I've been trying to find a solution for the past 3 hours with no success.
Okay so I found my problem, the main thing was that y
was a 2-dimensional list but I needed a 1-d list.
Which leads me to this this working code :
output_file('test.html')
df = pd.read_csv('Swedish_Population_Statistics.csv', encoding="ISO-8859-1")
df.dropna(inplace=True) # Drop rows with missing attributes
df.drop_duplicates(inplace=True) # Remove duplicates
# Drop all the column I don't use for now
df.drop(['region', 'marital_status', 'sex'], inplace=True, axis=1)
x = df.columns.tolist()
y = df.loc[[0]]
temp = []
temp2 = []
# Append each value of the dataframe row in a 1-dimension list one by one
for i in range(13):
temp.append(y[str(2000+i)].tolist())
temp2.append(temp[i][0])
p = figure(title="Test", sizing_mode="scale_both")
p.line(x, temp2, line_color="blue", line_width=2)
p.circle(x, temp2, fill_color="white", size=8)
show(p)
With this result :