Search code examples
pythoncsvpandasdataframenonetype

CSV file gets lost between Python function calls


I'm writing a script where I plot a pandas data frame. I first preprocess data in one file in a particular Python module. Then I import that function from the module. When I call the function in my new Python file, it thinks the data frame is a None type object whereas in the original function it prints the data frame correctly.

Here's my code:

import numpy as np
import matplotlib.pyplot as plt
import os
from Preprocessing import sample_difference as sd

files = [
    'PickUpPhoneAccelerometer1.csv',
    'PickUpPhoneAccelerometer2.csv',
    'PickUpPhoneAccelerometer3.csv',
    'Wave1Accelerometer.csv',
    'Wave2Accelerometer.csv',
    'Wave3Accelerometer.csv'
]


def segment_energy(data, th):
    print data
    mag = np.array([np.linalg.norm(data['x']), np.linalg.norm(data['y']), np.linalg.norm(data['z'])])

Running this, data is None.

Here's the other Python file sample_difference

 def sample_difference(filename):
     df = pd.read_csv(filename, header=None, names=['timestamp', 'time skipped', 'x', 'y', 'z', 'label']).set_index('timestamp')
     df.assign(dx=df.x.diff(), dy=df.y.diff(), dz=df.z.diff())
     print df

This prints out the data frame correctly. The error when I run the final script is:

line 17, in segment_energy
mag = np.array([np.linalg.norm(data['x']), np.linalg.norm(data['y']), np.linalg.norm(data['z'])])
TypeError: 'NoneType' object has no attribute '__getitem__'

I'm calling segment_energy as so in that same file:

for f in files:
    with open(os.path.join("/Users", "myname", "PycharmProjects", "sensorLogProject", "Data", f), 'rU') as my_file:
        segment_energy(sd.sample_difference(my_file), 2)

Solution

  • first, you need to call sample_difference function with a filename instead of the file object that is:

    for f in files:
        filename = os.path.join("/Users", "myname", "PycharmProjects", "sensorLogProject", "Data", f)
        segment_energy(sd.sample_difference(filename), 2)
    

    second, sample_difference function should be returning the dataframe instead of printing:

    def sample_difference(filename):
         df = pd.read_csv(filename, header=None, names=['timestamp', 'time skipped', 'x', 'y', 'z', 'label']).set_index('timestamp')
         df.assign(dx=df.x.diff(), dy=df.y.diff(), dz=df.z.diff())
         return df