Search code examples
rmultiple-columns

How to assign 1s and 0s to columns if variable in row matches or not match in R


I'm an absolute beginner in coding and R and this is my third week doing it for a project. (for biologists, I'm trying to find the sum of risk alleles for PRS) but I need help with this part

df
  x y z
1 t c a
2 a t a
3 g g t

so when code applied:

  x y z
1 t 0 0
2 a 0 1
3 g 1 0
```

I'm trying to make it that if the rows in y or z match x the value changes to 1 and if not, zero
I started with: 
```
for(i in 1:ncol(df)){
  df[, i]<-df[df$x == df[,i], df[ ,i]<- 1]
}
```
But got all NA values 
In reality, I have 100 columns I have to compare with x in the data frame. Any help is appreciated

Solution

  • An alternative way to do this is by using ifelse() in base R.

    df$y <- ifelse(df$y == df$x, 1, 0)
    df$z <- ifelse(df$z == df$x, 1, 0)
    df
    #  x y z
    #1 t 0 0
    #2 a 0 1
    #3 g 1 0
    

    Edit to extend this step to all columns efficiently

    For example:

    df1
    #  x y z w
    #1 t c a t
    #2 a t a a
    #3 g g t m
    
    

    To apply column editing efficiently, a better approach is to use a function applied to all targeted columns in the data frame. Here is a simple function to do the work:

    edit_col <- function(any_col) any_col <- ifelse(any_col == df1$x, 1, 0)
    

    This function takes a column, and then compare the elements in the column with the elements of df1$x, and then edit the column accordingly. This function takes a single column. To apply this to all targeted columns, you can use apply(). Because in your case x is not a targeted column, you need to exclude it by indexing [,-1] because it is the first column in df.

    # Here number 2 indicates columns. Use number 1 for rows.
    
    df1[, -1] <- apply(df1[,-1], 2, edit_col)
    df1
    #  x y z w
    #1 t 0 0 1
    #2 a 0 1 1
    #3 g 1 0 0
    

    Of course you can also define a function that edit the data frame so you don't need to do apply() manually.

    Here is an example of such function

    edit_df <- function(any_df){
        edit_col <- function(any_col) any_col <- ifelse(any_col == any_df$x, 1, 0)
        
        # Create a vector containing all names of the targeted columns.
        
        target_col_names <- setdiff(colnames(any_df), "x")
        
        any_df[,target_col_names] <-apply( any_df[,target_col_names], 2, edit_col)
        return(any_df)
    }
    

    Then use the function:

    edit_df(df1)
    #  x y z w
    #1 t 0 0 1
    #2 a 0 1 1
    #3 g 1 0 0