Search code examples
machine-learningdeep-learningconv-neural-networkweka

How to export all the information from 3d numpy array to a csv file


Kaggle Dataset and code link

I'm trying to solve the above Kaggle problem and I want to export preprocessed csv so that I can build a model on weka, but when I'm trying to save it in csv I'm losing a dimension, I want to retain all the information in that csv.

please help me with the relevant code or any resource.

Thanks

print (scaled_x)

    |x           |y          |z          |label
    |1.485231    |-0.661030  |-1.194153  |0
    |0.888257    |-1.370361  |-0.829636  |0
    |0.691523    |-0.594794  |-0.936247  |0
Fs=20
frame_size = Fs*4 #80
hop_size = Fs*2 #40
    
def get_frames(df, frame_size, hop_size):
    N_FEATURES = 3
    frames = []
    labels = []
    for i in range(0,len(df )- frame_size, hop_size):
        x = df['x'].values[i: i+frame_size]
        y = df['y'].values[i: i+frame_size]
        z = df['z'].values[i: i+frame_size]
        
        label = stats.mode(df['label'][i: i+frame_size])[0][0]
        frames.append([x,y,z])
        labels.append(label)
        
    frames = np.asarray(frames).reshape(-1, frame_size, N_FEATURES)
    labels = np.asarray(labels)
    
    return frames, labels
x,y = get_frames(scaled_x, frame_size, hop_size)
    x.shape, y.shape

((78728, 80, 3), (78728,))

Solution

  • According to the link you posted, the data is times series accelerometer/gyro data sampled at 20 Hz, with a label for each sample. They want to aggregate the time series into frames (with the corresponding label being the most common label during a given frame).

    So frame_size is the number of samples in a frame, and hop_size is the amount the sliding window moves forward each iteration. In other words, the frames overlap by 50% since hop_size = frame_size / 2.

    Thus at the end you get a 3D array of 78728 frames of length 80, with 3 values (x, y, z) each.

    EDIT: To answer your new question about how to export as CSV, you'll need to "flatten" the 3D frame array to a 2D array since that's what a CSV represents. There are multiple different ways to do this but I think the easiest may just be to concatenate the final two dimensions, so that each row is a frame, consisting of 240 values (80 samples of 3 co-ordinates each). Then concatenate the labels as the final column.

    x_2d = np.reshape(x, (x.shape[0], -1))
    full = np.concatenate([x, y], axis=1)
    
    import pandas as pd
    df = pd.DataFrame(full)
    df.to_csv("frames.csv")
    

    If you also want proper column names:

    columns = []
    for i in range(1, x.shape[1] + 1):
        columns.extend([f"{i}_X", f"{i}_Y", f"{i}_Z"])
    columns.append("label")
    df = pd.DataFrame(full, columns=columns)