machine-learning supervised-learning transfer-learning

Transfer Learning for small datasets of structured data

I am looking to implement machine learning for a problems that are built on small data sets related to approvals of expenses in a specific supply chain domain. Typically labelled data is unavailable

I was looking to build models in one data set that I have labelled data and then use that model developed in similar contexts- where the feature set is very similar, but not identical. The expectation is that this allows the starting point for recommendations and gather labelled data in the new context.

I understand this is the essence of Transfer Learning. Most of the examples I read in this domain speak of image data sets- any guidance how this can be leveraged in small data sets using standard tree-based classification algorithms

Solution

After some research, we have decided to proceed with random forest models with the intuition that trees in the original model that have common features will form the starting point for decisions.

As we gain more labelled data in the new context, we will start replacing the original trees with new trees that comprise of (a)only new features and (b) combination of old and new features

This has worked to provide reasonable results in initial trials