machine-learning classification prediction feature-selection feature-engineering

Features from consecutive measurements for classification

I'm currently working on a small machine learning project. The task deals with medical data of a couple of thousands of patients. For each patient there where taken 12 of measurements of the same bunch of vital signs each one hour apart. These measurements must note been taken immediately after the patient has entered the hospital but could start with some offset. However the patient will stay 24h in the hospital in total, so they can't start later than after 11 hours after the entrance.

Now the task is to predict for each patient whether none, one or multiple of 10 possible tests will be ordered during the remainder of the stay, and also to predict the future mean value of some of the vital signs for the remainder of the stay. I have a training set that comes together with the labels that I should predict.

My question is mainly about how I can process the features, I thought about turning the measurement results for a patient into one long vector and use it as training example for a classifier. However I'm not quite shure how I should include the Time information of each measurement into the features (should I even consider time at all?).

Solution

If I understood correctly, you want to include time information of each measurement into features. One way I thought is to make an empty vector of length 24, as the patient stays for 24 hours in the hospital. Then you can use one-hot representation, for example, if the measurement was taken in 12th, 15th and 20th hours of his stay, your time feature vector will have 1 at 12th, 15th and 20th position and all others are zero. You can append this time vector with other features and make a single vector for each patient of length = length(other vector) + length(time vector). Or you can use different approaches to combine these features.

Please let me know if you think this approach makes sense for you. Thanks.