I have an Array with Size (2200, 1000, 12). The first value (2200) is the index, in each index there are 1000 records.
I have another Array for the Class with Size (2200). Each variable here represents a label for the 1000 records in each index.
How can I in the first array put everything together to transform from 3 dimensions to 2 dimensions?
And how can I put each class variable in the 1000 records?
Dataframe Size (2200000,13)
The 2200000 would be the combined amount of the 1000 records in the 2200 index. And column 13 would be the junction with the Class, where each variable of the class would be repeated a thousand times to keep the same number of lines.
Let us first import the necessary modules and generate mock data:
import numpy as np
import pandas as pd
M = 2200
N = 1000
P = 12
data = np.random.rand(M, N, P)
classes = np.arange(M)
How can I transform from 3 dimensions to 2 dimensions?
data.reshape(M*N, P)
How can I put each class variable in the 1000 records?
np.repeat(classes, N)
Desired result: Dataframe Size (2200000,13)
arr = np.hstack([data.reshape(M*N, P), np.repeat(classes, N)[:, None]])
df = pd.DataFrame(arr)
print(df)
The code above outputs:
0 0.371495 0.598211 0.038224 ... 0.777405 0.193472 0.0
1 0.356371 0.636690 0.841467 ... 0.403570 0.330145 0.0
2 0.793879 0.008617 0.701122 ... 0.021139 0.514559 0.0
3 0.318618 0.798823 0.844345 ... 0.931606 0.467469 0.0
4 0.307109 0.076505 0.865164 ... 0.809495 0.914563 0.0
... ... ... ... ... ... ... ...
2199995 0.215133 0.239560 0.477092 ... 0.050997 0.727986 2199.0
2199996 0.249206 0.881694 0.985973 ... 0.897410 0.564516 2199.0
2199997 0.378455 0.697581 0.016306 ... 0.985966 0.638413 2199.0
2199998 0.233829 0.158274 0.478611 ... 0.825343 0.215944 2199.0
2199999 0.351320 0.980258 0.677298 ... 0.791046 0.736788 2199.0