I have downloaded and labeled data from http://archive.ics.uci.edu/ml/datasets/pamap2+physical+activity+monitoring
my task is to gain an insight into the data from what is given, I have round 34 attributes in a data frame(all clean no nan values)
and want to train a model based on one target attribute 'heart_rate' given the rest of the attributes(all are numbers of a participant performing various activities )
I wanted to use Linear regression model but can not use my dataframe for some reason, however, I do not mind starting from 0 if you think I am doing it wrong
my DataFrame columns:
> Index(['timestamp', 'activity_ID', 'heart_rate', 'IMU_hand_temp',
> 'hand_acceleration_16_1', 'hand_acceleration_16_2',
> 'hand_acceleration_16_3', 'hand_gyroscope_rad_7',
> 'hand_gyroscope_rad_8', 'hand_gyroscope_rad_9',
> 'hand_magnetometer_μT_10', 'hand_magnetometer_μT_11',
> 'hand_magnetometer_μT_12', 'IMU_chest_temp', 'chest_acceleration_16_1',
> 'chest_acceleration_16_2', 'chest_acceleration_16_3',
> 'chest_gyroscope_rad_7', 'chest_gyroscope_rad_8',
> 'chest_gyroscope_rad_9', 'chest_magnetometer_μT_10',
> 'chest_magnetometer_μT_11', 'chest_magnetometer_μT_12',
> 'IMU_ankle_temp', 'ankle_acceleration_16_1', 'ankle_acceleration_16_2',
> 'ankle_acceleration_16_3', 'ankle_gyroscope_rad_7',
> 'ankle_gyroscope_rad_8', 'ankle_gyroscope_rad_9',
> 'ankle_magnetometer_μT_10', 'ankle_magnetometer_μT_11',
> 'ankle_magnetometer_μT_12', 'Intensity'],
> dtype='object')
first 5 rows:
timestamp activity_ID heart_rate IMU_hand_temp hand_acceleration_16_1 hand_acceleration_16_2 hand_acceleration_16_3 hand_gyroscope_rad_7 hand_gyroscope_rad_8 hand_gyroscope_rad_9 ... ankle_acceleration_16_1 ankle_acceleration_16_2 ankle_acceleration_16_3 ankle_gyroscope_rad_7 ankle_gyroscope_rad_8 ankle_gyroscope_rad_9 ankle_magnetometer_μT_10 ankle_magnetometer_μT_11 ankle_magnetometer_μT_12 Intensity
2928 37.66 lying 100.0 30.375 2.21530 8.27915 5.58753 -0.004750 0.037579 -0.011145 ... 9.73855 -1.84761 0.095156 0.002908 -0.027714 0.001752 -61.1081 -36.8636 -58.3696 low
2929 37.67 lying 100.0 30.375 2.29196 7.67288 5.74467 -0.171710 0.025479 -0.009538 ... 9.69762 -1.88438 -0.020804 0.020882 0.000945 0.006007 -60.8916 -36.3197 -58.3656 low
2930 37.68 lying 100.0 30.375 2.29090 7.14240 5.82342 -0.238241 0.011214 0.000831 ... 9.69633 -1.92203 -0.059173 -0.035392 -0.052422 -0.004882 -60.3407 -35.7842 -58.6119 low
2931 37.69 lying 100.0 30.375 2.21800 7.14365 5.89930 -0.192912 0.019053 0.013374 ... 9.66370 -1.84714 0.094385 -0.032514 -0.018844 0.026950 -60.7646 -37.1028 -57.8799 low
2932 37.70 lying 100.0 30.375 2.30106 7.25857 6.09259 -0.069961 -0.018328 0.004582 ... 9.77578 -1.88582 0.095775 0.001351 -0.048878 -0.006328 -60.2040 -37.1225 -57.8847 low
if you check the timestamp attribute you will see that the data acquired is in milliseconds so it might be a good idea to use the data from this dataframe as in every 2-5 seconds and train the model
also as an option, I want to use as one of these models for this task Linear,polynomial, multiple linear, agglomerative clustering and kmeans clustering.
my code:
target = subject1.DataFrame(data.target, columns=["heart_rate"])
X = df
y = target[“heart_rate”]
lm = linear_model.LinearRegression()
model = lm.fit(X,y)
predictions = lm.predict(X)
print(predictions)[0:5]
Error:
AttributeError Traceback (most recent call last)
<ipython-input-93-b0c3faad3a98> in <module>()
3 #heart_rate
4 # Put the target (housing value -- MEDV) in another DataFrame
----> 5 target = subject1.DataFrame(data.target, columns=["heart_rate"])
c:\python36\lib\site-packages\pandas\core\generic.py in __getattr__(self, name)
5177 if self._info_axis._can_hold_identifiers_and_holds_name(name):
5178 return self[name]
-> 5179 return object.__getattribute__(self, name)
5180
5181 def __setattr__(self, name, value):
AttributeError: 'DataFrame' object has no attribute 'DataFrame'
for fixing the error I have used:
subject1.columns = subject1.columns.str.strip()
but still did not work
Thank you, sorry if I was not precise enough.
Try this:
X = df.drop("heart_rate", axis=1)
y = df[[“heart_rate”]]
X=X.apply(zscore)
test_size=0.30
seed=7
X_train, X_test, y_train, y_test=train_test_split(X, y, test_size=test_size, random_state=seed)
lm = linear_model.LinearRegression()
model = lm.fit(X,y)
predictions = lm.predict(X)
print(predictions)[0:5]