OK So i have this code in Python that Im importing from a csv file the problem is that there are columns in that csv file that aren't basic numbers. There is one column that is text in the format "INT, EXT" and there is a column that is in o'clock format from "0:00 to 11:59" format. I have a third column as a normal number distance in "00.00" format.
My question is how do I go about plotting distance vs o'clock and then basing whether one is INT or EXT changing the colors of the dots for the scatterplot.
My first problem is having how to make the program read oclock format. and text formats from a csv.
Any ideas or suggestions? Thanks in advance
Here is a sample of the CSV im trying to import
ML INT .10 534.15 0:00
ML EXT .25 654.23 3:00
ML INT .35 743.12 6:30
I want to plot the 4th column as the x axis and the 5th column as the y axis I also want to color code the scatter plot dots red or blue depending if one is INT or EXT
Here is a sample of the code i have so far
import matplotlib.pyplot as plt
from matplotlib import style
import numpy as np
style.use('ggplot')
a,b,c,d = np.loadtxt('numbers.csv',
unpack = True,
delimiter = ',')
plt.scatter(a,b)
plt.title('Charts')
plt.ylabel('Y Axis')
plt.xlabel('X Axis')
plt.show()
Reading in from your example csv using pandas:
import pandas as pd
import matplotlib.pyplot as plt
import datetime
data = pd.read_csv('data.csv', sep='\t', header=None)
print data
prints:
0 1 2 3 4
0 ML INT 0.10 534.15 0:00
1 ML EXT 0.25 654.23 3:00
2 ML INT 0.35 743.12 6:30
Then separate the 'INT' from the 'EXT':
ints = data[data[1]=='INT']
exts = data[data[1]=='EXT']
change them to datetime and grab the distances:
int_times = [datetime.datetime.time(datetime.datetime.strptime(t, '%H:%M')) for t in ints[4]]
ext_times = [datetime.datetime.time(datetime.datetime.strptime(t, '%H:%M')) for t in exts[4]]
int_dist = [d for d in ints[3]]
ext_dist = [d for d in exts[3]]
then plot a scatter plot for 'INT' and 'EXT' each:
fig, ax = plt.subplots()
ax.scatter(int_dist, int_times, c='orange', s=150)
ax.scatter(ext_dist, ext_times, c='black', s=150)
plt.legend(['INT', 'EXT'], loc=4)
plt.xlabel('Distance')
plt.show()
EDIT: Adding code to answer a question in the comments regarding how to change the time to 12 hour format (ranging from 0:00 to 11:59) and strip the seconds.
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
data = pd.read_csv('data.csv', header=None)
ints = data[data[1]=='INT']
exts = data[data[1]=='EXT']
INT_index = data[data[1]=='INT'].index
EXT_index = data[data[1]=='EXT'].index
time = [t for t in data[4]]
int_dist = [d for d in ints[3]]
ext_dist = [d for d in exts[3]]
fig, ax = plt.subplots()
ax.scatter(int_dist, INT_index, c='orange', s=150)
ax.scatter(ext_dist, EXT_index, c='black', s=150)
ax.set_yticks(np.arange(len(data[4])))
ax.set_yticklabels(time)
plt.legend(['INT', 'EXT'], loc=4)
plt.xlabel('Distance')
plt.ylabel('Time')
plt.show()