r machine-learning image-processing computer-vision feature-extraction

Extract specific PCB colors in R for defects classification

Assume we have the following picture that contains a PCB defect ( called missing hole defect ) :

The defects list i need to identify in my project are :

For this purpose , i need to extract colors related to defects categories.

I know that using R , we can do :

library(colorfindr)
img_path="C:/Users/Rayane_2/Desktop/Data/PCB1/PCB/images/Mouse_bite/01_mouse_bite_04.jpg"
colorfindr::get_colors(img_path,top_n=20)
# A tibble: 20 × 3
   col_hex col_freq col_share
   <chr>      <int>     <dbl>
 1 #005B0C    31106   0.00646
 2 #005B0E    29117   0.00605
 3 #01590B    27768   0.00577
 4 #005A0B    24135   0.00502
 5 #015C0D    23771   0.00494
 6 #00580A    22397   0.00465
 7 #005B0B    21529   0.00447
 8 #00560F    21476   0.00446
 9 #005A0D    21324   0.00443
10 #025A0C    21191   0.00440
11 #01590D    21026   0.00437
12 #005709    20009   0.00416
13 #00580E    19063   0.00396
14 #005909    18666   0.00388
15 #015C0F    18450   0.00383
16 #025A0E    17979   0.00374
17 #015710    16621   0.00345
18 #01590F    16619   0.00345
19 #00580C    16546   0.00344
20 #005A0A    15614   0.00324

From the defects type picture , i see there are 3 colors that allows to distinct those defects.

I need to identify those 3 and extract from tibble dataset.

Solution

The problem is interesting and we can try different features, starting from manual, handcrafted (from simple to complex features and different machine learning models) to automatically extracted features (e.g., with deep learning deep neural net models).

Let's try a very simple feature based on colors only - the feature we shall use will be color cluster proportion.

We shall first cluster the image RGB color values into k groups (e.g., k=3) using kmeans clustering algorithm and obtain k color cluster centers using the function get.color.clusters(), as shown below (we need to extract red, green, blue values from the hex color values).

Then we shall use the kmeans model to predict the color cluster each pixel of an image belongs to and then compute the proportion of pixels in an image belonging to a color cluster as features (hence we shall have k features). Hence, our data frame will look like the following for k=3 clusters:

cluster1 cluster2 cluster3 class (label)

image1 0.6 0.3 0.1 missing holes

which means we have 60%, 30% and 10% pixels belonging to cluster 1, 2 and 3, respectively, for the missing hole image1.

Now this dataset will be used to train a (binary) classifier and classifier will do a descent job if our assumption that the color cluster proportions for the same defect class has similar pattern.

Here are the two sets of images we shall use for only 2 classes:

missing-holes

mouse-bites

Now, let's extract the color cluster proportion features and try SVM classifier with RBF kernel for the classification and prediction of the defect classes.

find_cluster_kmeans <- function(cl, x) { # predict the color cluster a pixel belongs to
  return (which.min(apply(cl$centers, 1, function(y) sum((y-x)^2))))
}

extract.color.features <- function(img_path, cl) {
  col_df <- colorfindr::get_colors(img_path, top_n=20)
  cols <- as.data.frame(t(do.call(rbind, lapply(col_df['col_hex'], col2rgb))))
  col_cluster <- apply(cols, 1, function(x) find_cluster_kmeans(cl, x))
  col_df <- cbind(col_df, cols, col_cluster=col_cluster)
  col_df <- col_df[c('col_cluster', 'col_share')]
  df_feat <- aggregate(col_df$col_share, list(col_df$col_cluster), FUN=sum) # group by color clusters and sum proportions
  names(df_feat) <- c('col_clust', 'prop')
  for (i in 1:(nrow(cl$centers))) { # ensure that all color clusters are present
    if (nrow(df_feat[df_feat$col_clust == i,]) == 0) {
      df_feat <- rbind(df_feat, data.frame(col_clust=i, prop=0))
    }
  }
  df_feat$prop <- df_feat$prop / sum(df_feat$prop) # normalize
  return(df_feat)
}

get.color.clusters <- function(k=3, top_n=50) {
  col_df <- NULL
  for (folder in c('missing_hole', 'Mouse_bite')) {
    img_path <- list.files(folder,".png", full.names = T)
    cdf <- do.call(rbind, lapply(img_path, function(p) colorfindr::get_colors(p,top_n=top_n)))
    col_df <- rbind(col_df, cdf)
  }
  cols <- as.data.frame(t(do.call(rbind, lapply(col_df['col_hex'], col2rgb))))
  cl <- kmeans(cols, k)
  #print(cl$center)
  return (cl)
}

library(colorfindr)
set.seed(12)
k <- 3 # 3 color clusters
cl <- get.color.clusters(k)
df <- NULL
for (cls in c('missing_hole', 'Mouse_bite')) {
  img_path <- list.files(cls,".png", full.names = T)
  df_feat <- NULL
  for (img in img_path) {
    #print(img)
    df_feat <- rbind(df_feat, extract.color.features(img, cl)$prop)
  }
  df_feat <- as.data.frame(df_feat)
  df_feat$class <- cls
  df <- rbind(df, df_feat)
}
names(df)[1:k] <- paste0('cluster', 1:k)
df$class <- as.factor(df$class)
df    # each row corrspeonds to an image and each column to a color cluster
#     cluster1   cluster2   cluster3        class
#1 0.318473896 0.68152610 0.00000000 missing_hole
#2 0.984514797 0.01548520 0.00000000 missing_hole
#3 0.967479675 0.03252033 0.00000000 missing_hole
#4 0.010911326 0.80282772 0.18626095   Mouse_bite
#5 0.008364049 0.96257443 0.02906153   Mouse_bite
#6 0.446066380 0.55393362 0.00000000   Mouse_bite

library(e1071)
svmfit = svm(class ~ ., data = df, kernel = "radial", cost = 1, scale = FALSE, type='C')
#print(svmfit)
plot(svmfit, df, cluster1 ~ cluster2, fill=TRUE, alpha=0.2)
df$prdicted <- predict(svmfit, df)
df
#     cluster1   cluster2   cluster3        class     prdicted
#1 0.318473896 0.68152610 0.00000000 missing_hole   Mouse_bite
#2 0.984514797 0.01548520 0.00000000 missing_hole missing_hole
#3 0.967479675 0.03252033 0.00000000 missing_hole missing_hole
#4 0.010911326 0.80282772 0.18626095   Mouse_bite   Mouse_bite
#5 0.008364049 0.96257443 0.02906153   Mouse_bite   Mouse_bite
#6 0.446066380 0.55393362 0.00000000   Mouse_bite   Mouse_bite

Ideally we should train on a proportion of dataset and evaluate the classifier on a held-out dataset to achieve generalizability.

Now the color cluster proportion feature is quite naive and is likely not preform that good, then you can try to extract shape features and features like HOG, SIFT, SURF, BRISK, BRIEF and use the corresponding descriptors as feature vectors for the ML classifiers.

Finally, in order to get the best performance we can use deep neural nets to enable automatic feature generation at different layers, but in this case we need to have reasonably large number of training images (increase training dataset size with data augmentation) or use transfer learning on top of some standard pretrained network (e.g., Vgg-16 or ResNet-150).