I am trying to perform PCA on an image and then output an image with pixels coloured based on the cluster they fall in in the PCA. I am doing unsupervised PCA. Ultimate goal is seen at this link: Forward PC rotation
I am currently using the pandas library(if people have other more elegant solutions I am all ears) as well as open for image manipulation.
I am trying to load in the b,g,r bands as my column with the index being a pixel giving a table with rows of all pixels in image (each with a column for the color bands).
When populating the data I ultimately have 3 million + pixels in my image and I have it populating but it takes about 5 seconds to do so for each pixel so can't event tell if I am doing it correctly. Is there a better way? Also if people understand how to use PCA with images I would be greatly appreciative.
Code:
import pandas as pd
import numpy as np
import random as rd
from sklearn.decomposition import PCA
from sklearn import preprocessing
import matplotlib.pyplot as plt
import cv2
#read in image
img = cv2.imread('/Volumes/EXTERNAL/Stitched-Photos-for-Chris/p7_0015_20161005-949am-75m-pass-1.jpg.png',1)
row,col = img.shape[:2]
print(row , col)
#get a unique pixel ID for each pixel
pixel = ['pixel-' + str(i) for i in range(0,row*col)]
bBand = ['bBand']
gBand = ['gBand']
rBand = ['rBand']
data = pd.DataFrame(columns=[bBand,gBand,rBand],index = pixel)
#populate data for each band
b,g,r = cv2.split(img)
#each index value
indexCount = row*col
for index in range(indexCount):
i = int(index/row)
j = index%row
data.loc[pixel,'bBand'] = b[i,j]
data.loc[pixel,'gBand'] = g[i,j]
data.loc[pixel,'rBand'] = r[i,j]
print(data.head())
Yes that for loop that you have there can take a long time.
Use np.ravel (for a 1D view) or np.flatten (for a 1D copy) or np.flat (for an 1D iterator) to convert 2d arrays to a series.
Also, creating a string index with x y encoded can be expensive too. I would either use row number as index and calculate x,y
as row_num/row, row_num%col
or a multi index with x,y depending on how frequent x,y are used in your calculations.