Search code examples
pythonpandasdistancegame-physicsscientific-computing

Calculating area under curve (AUC) of a speed (m/s) vs time (per second) graph using pandas and numpy trapz


I am working with this csv file. I am trying to calculate the distance the car has travelled in the 700 seconds it has recorded. The distance should be the area below the graph as (m/s) * (s) should be meters.

This is my code:

import csv
import pprint
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt 
from numpy import trapz


df = pd.read_csv("AutoRitData.csv")

new = df.filter(['timestamp','speed'], axis=1)
new_array = np.concatenate( new.values, axis=0 )
print(new_array)
area = trapz(new_array, dx=1)
print("area =", area)

df.plot(x='timestamp', y='speed')
plt.show()


# print(df.columns)

I am confused why the result it different for different dx values. In my eyes making more trapiods (smaller dx) should make the result more accurate, not smaller. Or is de dx not the width of the trapoids?

Also, I would like to change the color of the line where the values of curve is above 13.9 (which is 50 km/h).

I hope someone who is familiar with scientific/physics programming can help me out.

The outcome graph looks like this:

enter image description here


Solution

  • If you see the documentation on numpy.trapz

    https://docs.scipy.org/doc/numpy/reference/generated/numpy.trapz.html you will notice, that dx =1 is the default - and you can have any scalar

    Best accuracy, is to do

    import numpy as np 
    dx = np.diff(new['timestamp'])
    

    if your timedeltas are changing and in seconds this should be enough

    In fact, dx should be the units of your time, i.e if you are integrating km/h, then dx = 3600 if you plan to multiply by seconds (700).

    To answer your question dx is

    INTEGRAL(Velocity * dx)

    It is dx of the trapezoid --- but your data is timeresolved in 1 second timesteps, so you cannot arbitrarily set dx. If you had 0.5 sec data you could have done dx=0.5

    ****EDIT****

    import pandas as pd
    import numpy as np
    
    Df = pd.read_csv('AutoRitData.csv')
    Distance1 = np.trapz(Df['speed'],dx=1)
    Distance2 = np.trapz(Df['speed'],dx=0.5)
    Distance3 = np.trapz(Df['speed'],dx=np.diff(Df['timestamp']))
    
    >>>  Distance1 = 10850.064
    >>>  Distance2 = 5425.03
    >>>  Distance3 = 10850.064
    

    Its clear that Distance3 and Distance1 are correct answers, since your data is not avaialble at dx=0.5, ie. half second resolution.