I am trying to write a numpy.ndarray as the labels for Amazon Sagemaker's conversion tool: write_numpy_to_dense_tensor(). It converts a numpy array of features and labels to a RecordIO for better use of Sagemaker algorithms.
However, if I try to pass a multilabel output for the labels, I get an error stating it can only be a vector (i.e. a scalar for every feature row).
Is there any way of having multiple values in the label? This is useful for multidimensional regressions which can be achieved with XGBoost, Random Forests, Neural Networks, etc.
Code
import sagemaker.amazon.common as smac
print("Types: {}, {}".format(type(X_train), type(y_train)))
print("X_train shape: {}".format(X_train.shape))
print("y_train shape: {}".format(y_train.shape))
f = io.BytesIO()
smac.write_numpy_to_dense_tensor(f, X_train.astype('float32'), y_train.astype('float32'))
Output:
Types: <class 'numpy.ndarray'>, <class 'numpy.ndarray'>
X_train shape: (9919, 2684)
y_train shape: (9919, 20)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-14-fc1033b7e309> in <module>()
3 print("y_train shape: {}".format(y_train.shape))
4 f = io.BytesIO()
----> 5 smac.write_numpy_to_dense_tensor(f, X_train.astype('float32'), y_train.astype('float32'))
~/anaconda3/envs/python3/lib/python3.6/site-packages/sagemaker/amazon/common.py in write_numpy_to_dense_tensor(file, array, labels)
94 if labels is not None:
95 if not len(labels.shape) == 1:
---> 96 raise ValueError("Labels must be a Vector")
97 if labels.shape[0] not in array.shape:
98 raise ValueError("Label shape {} not compatible with array shape {}".format(
ValueError: Labels must be a Vector
Tom, XGBoost does not support RecordIO format. It only supports csv and libsvm. Also, the algorithm itself doesn’t natively support multi-label. But there are a couple of ways around it: Xg boost for multilabel classification?
Random Cut Forest does not support multiple labels either. If more than one label is provided it picks up the first only.