machine-learning deep-learning conv-neural-network weka

How to export all the information from 3d numpy array to a csv file

I'm trying to solve the above Kaggle problem and I want to export preprocessed csv so that I can build a model on weka, but when I'm trying to save it in csv I'm losing a dimension, I want to retain all the information in that csv.

please help me with the relevant code or any resource.

Thanks

print (scaled_x)

    |x           |y          |z          |label
    |1.485231    |-0.661030  |-1.194153  |0
    |0.888257    |-1.370361  |-0.829636  |0
    |0.691523    |-0.594794  |-0.936247  |0

Fs=20
frame_size = Fs*4 #80
hop_size = Fs*2 #40
    
def get_frames(df, frame_size, hop_size):
    N_FEATURES = 3
    frames = []
    labels = []
    for i in range(0,len(df )- frame_size, hop_size):
        x = df['x'].values[i: i+frame_size]
        y = df['y'].values[i: i+frame_size]
        z = df['z'].values[i: i+frame_size]
        
        label = stats.mode(df['label'][i: i+frame_size])[0][0]
        frames.append([x,y,z])
        labels.append(label)
        
    frames = np.asarray(frames).reshape(-1, frame_size, N_FEATURES)
    labels = np.asarray(labels)
    
    return frames, labels

x,y = get_frames(scaled_x, frame_size, hop_size)
    x.shape, y.shape

((78728, 80, 3), (78728,))

Solution

According to the link you posted, the data is times series accelerometer/gyro data sampled at 20 Hz, with a label for each sample. They want to aggregate the time series into frames (with the corresponding label being the most common label during a given frame).

So frame_size is the number of samples in a frame, and hop_size is the amount the sliding window moves forward each iteration. In other words, the frames overlap by 50% since hop_size = frame_size / 2.

Thus at the end you get a 3D array of 78728 frames of length 80, with 3 values (x, y, z) each.

EDIT: To answer your new question about how to export as CSV, you'll need to "flatten" the 3D frame array to a 2D array since that's what a CSV represents. There are multiple different ways to do this but I think the easiest may just be to concatenate the final two dimensions, so that each row is a frame, consisting of 240 values (80 samples of 3 co-ordinates each). Then concatenate the labels as the final column.

x_2d = np.reshape(x, (x.shape[0], -1))
full = np.concatenate([x, y], axis=1)

import pandas as pd
df = pd.DataFrame(full)
df.to_csv("frames.csv")

If you also want proper column names:

columns = []
for i in range(1, x.shape[1] + 1):
    columns.extend([f"{i}_X", f"{i}_Y", f"{i}_Z"])
columns.append("label")
df = pd.DataFrame(full, columns=columns)