I'm writing a script where I plot a pandas data frame. I first preprocess data in one file in a particular Python module. Then I import that function from the module. When I call the function in my new Python file, it thinks the data frame is a None type object whereas in the original function it prints the data frame correctly.
Here's my code:
import numpy as np
import matplotlib.pyplot as plt
import os
from Preprocessing import sample_difference as sd
files = [
'PickUpPhoneAccelerometer1.csv',
'PickUpPhoneAccelerometer2.csv',
'PickUpPhoneAccelerometer3.csv',
'Wave1Accelerometer.csv',
'Wave2Accelerometer.csv',
'Wave3Accelerometer.csv'
]
def segment_energy(data, th):
print data
mag = np.array([np.linalg.norm(data['x']), np.linalg.norm(data['y']), np.linalg.norm(data['z'])])
Running this, data is None
.
Here's the other Python file sample_difference
def sample_difference(filename):
df = pd.read_csv(filename, header=None, names=['timestamp', 'time skipped', 'x', 'y', 'z', 'label']).set_index('timestamp')
df.assign(dx=df.x.diff(), dy=df.y.diff(), dz=df.z.diff())
print df
This prints out the data frame correctly. The error when I run the final script is:
line 17, in segment_energy
mag = np.array([np.linalg.norm(data['x']), np.linalg.norm(data['y']), np.linalg.norm(data['z'])])
TypeError: 'NoneType' object has no attribute '__getitem__'
I'm calling segment_energy
as so in that same file:
for f in files:
with open(os.path.join("/Users", "myname", "PycharmProjects", "sensorLogProject", "Data", f), 'rU') as my_file:
segment_energy(sd.sample_difference(my_file), 2)
first, you need to call sample_difference
function with a filename instead of the file object that is:
for f in files:
filename = os.path.join("/Users", "myname", "PycharmProjects", "sensorLogProject", "Data", f)
segment_energy(sd.sample_difference(filename), 2)
second, sample_difference
function should be returning the dataframe instead of printing:
def sample_difference(filename):
df = pd.read_csv(filename, header=None, names=['timestamp', 'time skipped', 'x', 'y', 'z', 'label']).set_index('timestamp')
df.assign(dx=df.x.diff(), dy=df.y.diff(), dz=df.z.diff())
return df