Search code examples
rsom

Not able to refer back to genes clustered in som analysis using R's kohonen package


I am stuck in a problem, I am trying to apply SOM analysis using the kohonen package in R. The dataset I am using is a gene expression dataframe. I am using the code given below,

dim(bink)
[1]  401 1198

Where bink is a dataframe having expression values for 1198 genes, where gene names are in the columns as shown below,

enter image description here

The rest of the code is given below,

grid <- somgrid(xdim = 5, ydim = 5, topo = "hexagonal")

som.wines <- som(scale(bink), grid = somgrid(xdim = 5, ydim = 5, "hexagonal"))
str(som.wines)
plot(som.wines, type = "mapping")

After applying the code above I get the plot shown below,

enter image description here

But I am not able to get the name of the genes clustered in each circle, I have tried to use the answer given here for which the code is given below,

x= attr(som.wines$data,"scaled:center")

y= attr(som.mines$data,"scaled:scale")

for (i in 1:ncol(som.wines$data)){
z[,i] = som.wines$data[,i][som.wines$unit.classif==1] * y[i]+x[i]
}

Then I am getting the error given below,

# Error in 1:ncol(som.wines$data) : argument of length 0

I also tried changing the way to access the data by using som.wines$data[[1]] but it does not work.

Is there any way to solve this problem ?

Thanks


Solution

  • Using data(wines) as an example.

    som.wines <- som(scale(wines), grid = somgrid(5, 5, "hexagonal"))

    Each big circle in your plot is a cluster of samples found in the data by rows. The profiles of the clusters are stored in som.wines$codes. Each line here is a cluster, V1 - Vx. This corresponds, obviously, to the number of big circles. You find the associated rows, i.e. the original data, in som.wines$unit.classif.

    Associate the clusters with your original data with

    cbind(wines, cluster=som.wines$unit.classif)
    

    The arrangement of big circles used in the plot correspond with numbers in som.wines$codes in that the bottom left big circle is V1 and the top right is Vx, i.e. the last cluster.