Search code examples
python-3.xnumpydataframemachine-learningsklearn-pandas

Grouping arrays with common classes for classification in CNN


I have a data set with three columns,the first two columns are the features and the third column contain classes,there are 4 classes,part of it can be seen here.

enter image description here

The data set is big,lets say 100,000 rows and 3 columns(two column features and one column for classes),so I am using a moving window of length 50 on the data set before training my deep learning model. So far I have tried two different method to slice the data set with no good results and I am pretty sure my data set is good. I first used a moving window on my entire data set,resulting into 2000 data samples with 50 rows and 2 columns(2000,50,2). As some data samples contain mixed classes,I selected only data samples with common classes and find the average of the classes to assign that particular data sample into a single class only,I have not get results with this.Here are my codes,`

def moving_window(data_, length, step=1):
    streams = it.tee(data_, length)
    return zip(*[it.islice(stream, i, None, step * length) for stream, i in zip(streams, it.count(step=step))])


data = list(moving_window(data_, 50))
data = np.asarray(data)
# print(len(data))
for i in data:
    label=np.all(i==i[0,2],axis=0)

    if label[2]==True:
        X.append(i[:,0:2])
        Y.append(sum(i[:,2])/len(i[:,2]))`

I tried another way by collecting only features corresponding to a particular class,putting the values into separate lists(4 lists as I have 4 classes) then used a moving window to slice each list separately and assign to its class. No good results too.Here are my codes.

for i in range(5):
    labels.append(i)
yy= pd.get_dummies(labels)
yy= yy.values
yy= yy.astype(np.float32)


def moving_window(x, length, step=1):
    streams = it.tee(x, length)
    return zip(*[it.islice(stream, i, None, step * length) for stream, i in zip(streams, it.count(step=step))])


x_1 = list(moving_window(x1, 50))
x_1 = np.asarray(x_1)
y_1 = [yy[0]] * len(x_1)
X.append(x_1)
Y.append(y_1)
# print(x_1.shape)

x_2 = list(moving_window(x2, 50))
x_2 = np.asarray(x_2)
# print(yy[1])
y_2 = [yy[1]] * len(x_2)
X.append(x_2)
Y.append(y_2)
# print(x_2.shape)

x_3 = list(moving_window(x3, 50))
x_3 = np.asarray(x_3)
# print(yy[2])
y_3 = [yy[2]] * len(x_3)
X.append(x_3)
Y.append(y_3)
# print(x_3.shape)

x_4 = list(moving_window(x4, 50))
x_4 = np.asarray(x_4)
# print(yy[3])
y_4 = [yy[3]] * len(x_4)
X.append(x_4)
Y.append(y_4)
# print(x_4.shape)

the architecture of the model which I am trying to train works perfect with other data set. So I think I am missing something on how I process the data.What am I missing on my ways of processing the data before I start training?,is there any other way?. All the work done is in python.


Solution

  • I finally managed to train my CNN model and achieved good training,validation and testing accuracy. The only thing I added was normalization of my input data with the following lines,

    minmax_scale = preprocessing.MinMaxScaler().fit(x)
    X = minmax_scale.transform(x)
    

    The rest remains the same.