Search code examples
rdata-manipulationpca

Data from one table to select data columns from another table, using r


My data table, tab, is 2000 x 500, y1 = col1, y2 = col2, y3 = col3 …. Y500 = col500. See image.

partial data table

I want to carry out some PCA work on a section of this, e.g y1 = col1, y22 = col22, y36 = col36, y41 = col41, and so on.

A separate data table, SM, contains the column ID,and refers to the columns in the main data table (tab) I want to consider. There are 200 such entries.

Image of SM follows.

Partial ID table

The following

fit.std <- prcomp(tab, scale.=T)

Pulls in all the column entries.

If I have 200 specific columns of data to consider, entering the column numbers manually would be very time consuming and error prone.

Can someone please tell me how to take the data from column ID (in data table SM), to select the corresponding columns in the data table tab, and then include in the fit.std line?

Is there a way to take in the data in SM to enable me to select the required columns in the larger data table tab? In order words, SM col1 would correspond to tab col1, SM col22 would correspond to tab col22, and so on.

fit.std <- promo(c(ID$*), scale = TRUE)

where ID$* contains the data table SN entries I want to match with columns in tab?

Thank you.


Solution

  • Ok based on your updated question, it looks like you want to subset the dataframe tab, selecting only the columns listed in SM$ID.

    You can do that with:

    tab[,SM$ID]