Search code examples
pythoncsvnumpyformats

Python 3.4 reading from CSV formats


OK So i have this code in Python that Im importing from a csv file the problem is that there are columns in that csv file that aren't basic numbers. There is one column that is text in the format "INT, EXT" and there is a column that is in o'clock format from "0:00 to 11:59" format. I have a third column as a normal number distance in "00.00" format.

My question is how do I go about plotting distance vs o'clock and then basing whether one is INT or EXT changing the colors of the dots for the scatterplot.

My first problem is having how to make the program read oclock format. and text formats from a csv.

Any ideas or suggestions? Thanks in advance

Here is a sample of the CSV im trying to import

ML  INT  .10  534.15  0:00
ML  EXT  .25  654.23  3:00
ML  INT  .35  743.12  6:30

I want to plot the 4th column as the x axis and the 5th column as the y axis I also want to color code the scatter plot dots red or blue depending if one is INT or EXT

Here is a sample of the code i have so far

import matplotlib.pyplot as plt
from matplotlib import style
import numpy as np

style.use('ggplot')

a,b,c,d = np.loadtxt('numbers.csv',
                unpack = True,
                delimiter = ',')



plt.scatter(a,b)




plt.title('Charts')
plt.ylabel('Y Axis')
plt.xlabel('X Axis')

plt.show()

Solution

  • Reading in from your example csv using pandas:

    import pandas as pd
    import matplotlib.pyplot as plt
    import datetime
    
    data = pd.read_csv('data.csv', sep='\t', header=None)
    print data
    

    prints:

        0    1     2       3     4
    0  ML  INT  0.10  534.15  0:00
    1  ML  EXT  0.25  654.23  3:00
    2  ML  INT  0.35  743.12  6:30
    

    Then separate the 'INT' from the 'EXT':

    ints = data[data[1]=='INT']
    exts = data[data[1]=='EXT']
    

    change them to datetime and grab the distances:

    int_times = [datetime.datetime.time(datetime.datetime.strptime(t, '%H:%M')) for t in ints[4]]
    ext_times = [datetime.datetime.time(datetime.datetime.strptime(t, '%H:%M')) for t in exts[4]]
    int_dist = [d for d in ints[3]]
    ext_dist = [d for d in exts[3]]
    

    then plot a scatter plot for 'INT' and 'EXT' each:

    fig, ax = plt.subplots()
    ax.scatter(int_dist, int_times, c='orange', s=150)
    ax.scatter(ext_dist, ext_times, c='black', s=150)
    plt.legend(['INT', 'EXT'], loc=4)
    plt.xlabel('Distance')
    plt.show()
    

    enter image description here

    EDIT: Adding code to answer a question in the comments regarding how to change the time to 12 hour format (ranging from 0:00 to 11:59) and strip the seconds.

    import pandas as pd
    import matplotlib.pyplot as plt
    import numpy as np
    
    data = pd.read_csv('data.csv', header=None)
    ints = data[data[1]=='INT']
    exts = data[data[1]=='EXT']
    INT_index = data[data[1]=='INT'].index
    EXT_index = data[data[1]=='EXT'].index
    time = [t for t in data[4]]
    int_dist = [d for d in ints[3]]
    ext_dist = [d for d in exts[3]]
    
    fig, ax = plt.subplots()
    ax.scatter(int_dist, INT_index, c='orange', s=150)
    ax.scatter(ext_dist, EXT_index, c='black', s=150)
    ax.set_yticks(np.arange(len(data[4])))
    ax.set_yticklabels(time)
    plt.legend(['INT', 'EXT'], loc=4)
    plt.xlabel('Distance')
    plt.ylabel('Time')
    plt.show()
    

    enter image description here