I have a series of 2d matrices like these two:
matrix_1 = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
matrix_2 = np.array([[10, 11, 12], [13, 14, 15], [16, 17, 18]])
And Each matrix has a label like:
labels = np.array([0, 1])
I want to make a dataset from these matrices to train my ML model later. First I tried to make small .csv files for each matrix but we cannot train an ML model on multiple .csv files.
Then, I tried this code:
matrix_1_flat = matrix_1.flatten()
matrix_2_flat = matrix_2.flatten()
dataset = np.array([matrix_1_flat, matrix_2_flat])
dataset = np.transpose(dataset_1)
But I feel like that spatial information will be lost. Is there any other function apart from those I'm using to create what I want?
Actually by labels, I mean y variables in machine learning terms. In this example, matrix_1 and matrix_2 (two 2d matrices) are my x_train and the label of matrix_1 is 0 (or even cat if it makes it easier to understand) and the label of matrix_2 is 1 (or dog).
I want the train and its labels to be like this:
x_train = np.array([[[1, 2, 3],[4, 5, 6],[7, 8, 9]],[[10, 11, 12],[13, 14, 15],[16, 17, 18]]])
y_train = y = np.array(["cat", "dog"])
I guess you want to make a dataset such that each x-y pair (a matrix, and a label) have x in its original shape (to not loose spatial information, treating each matrix as image-like).
With the aid of numpy
, you can create a compressed file representing the dataset as follows:
matrix_1 = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
matrix_2 = np.array([[10, 11, 12], [13, 14, 15], [16, 17, 18]])
# preparing "x" and "y" - the dataset
matrices = [matrix_1, matrix_2]
labels = np.array([0, 1])
# save into an npz object:
# - it's dict-like, so we use "x" and "y" as keys
# - this will be saved as "matrix_dataset.npz"
np.savez_compressed('matrix_dataset', x=matrices, y=labels)
The npz
file can be later loaded into memory:
ds = np.load('matrix_dataset.npz')
You can access the "x" and "y" fields simply by their key:
# e.g. if you want to train your model, after loading
x_train = np.array(ds['x'])
y_train = np.array(ds['y'])
# your model fitting code...
Note that the shape of x_train
is now (N, 3, 3)
where N
(in this case is 2) refers to the batch axis, so doing x_train[0]
will retrieve the first 3x3 matrix.