Search code examples
pythonmachine-learningdata-augmentation

Non-image data augmentation


I am looking for an algorithm and-or tutorial about data augmentation but all of them belong to image augmentation , is it possible to do that in other datasets ? I am working on parkinsons data set (https://archive.ics.uci.edu/ml/datasets/parkinsons) and want to create an example of data aug with python , is this possible ? or should i use smt like mnist/fmnist ?


Solution

  • If you had access to the actual voice recordings, you could apply some augmentation techniques used in speech recognition and then re-extract the features such as fundamental frequency. However, since you're dealing directly with the features, augmentation is more tricky. It is possible to generate synthetic samples by interpolating between existing ones or adding noise, but since the features are highly correlated, you need a smart way of doing that (see this paper for a simple approach and this one for a more advanced technique). If you have a class imbalance problem, you can try simply over- or under-sampling.