machine-learning pca feature-extraction feature-selection

Feature extraction for multiple sub-features

I would like to conduct some feature extraction(or clustering) for dataset containing sub-features. For example, dataset is like below. The goal is to classify the type of robot using the data.

Samples : 100 robot samples [Robot 1, Robot 2, ..., Robot 100]
Classes : 2 types [Type A, Type B]
Variables : 6 parts, and 3 sub-features for each parts (total 18 variables)
[Part1_weight, Part1_size, Part1_strength, ..., Part6_size, Part6_strength, Part6_weight]

I want to conduct feature extraction with [weight, size, strength], and use extracted feature as a representative value for the part.

In short, my aim is to reduce the feature to 6 - [Part1_total, Part2_total, ..., Part6_total] - and then, classify the type of robot with those 6 features. So, make combined feature with 'weight', 'size', and 'strength' is the problem to solve.

First I thought of applying PCA (Principal Component Analysis), because it is one of the most popular feature extraction algorithm. But it considers all 18 features separately, so 'Part1_weight' can be considered as more important than 'Part2_weight'. But what I have to know is the importance of 'weights', 'sizes', and 'strengths' among samples, so PCA seems to be not applicable.

Is there any supposed way to solve this problem?

Solution

If you want to have exactly one feature per part I see no other way than performing the feature reduction part-wise. However, there might be better choices than simple PCA. For example, if the parts are mostly solid, their weight is likely to correlate with the third power of the size, so you could take the cubic root of the weight or the cube of the size before performing the PCA. Alternatively, you can take a logarithm of both values, which again results in a linear dependency.

Of course, there are many more fancy transformations you could use. In statistics, the Box-Cox Transformation is used to achieve a normal-looking distribution of the data.

You should also consider normalising the transformed data before performing the PCA, i.e. subtracting the mean and dividing by the standard deviations of each variable. It will remove the influence of units of measurement. I.e. it won't matter whether you measure weight in kg, atomic units, or Sun masses.