Search code examples
rrecode

How to recode one column conditional on another


I currently have a dataframe in R with a variable for names ("Name") and one that notes whether an individual is female ("Female") (1, NA). A given individual likely has multiple rows because they responded to the survey multiple times. However, sometimes, the "Female" column is incorrect and will not list an individual who has previously responded to being Female (1), as (1) and instead will have an NA.

So for example, if my dataset looks like this:

Name <- c("Claire Smith", "John Doe", "Serena Williams", "Claire Smith", "Claire Smith", "Serena Williams")
Female <- c(1, NA, NA, NA, NA, 1)
df <- data.frame(Name, Female)

I want it to check across all rows so that it will recognize that "Claire Smith" and "Serena Williams" do have at least one "1" associated for their "Names" and propagate the rest of the "Female" columns for these names as "1". So I would want the code to do this:

Name <- c("Claire Smith", "John Doe", "Serena Williams", "Claire Smith", "Claire Smith", "Serena Williams")
Female <- c(1, NA, 1, 1, 1, 1)
df <- data.frame(Name, Female)

I'm sure this is quite an easy fix but I can't seem to figure it out. Thank you very much in advance.


Solution

  • Using a bit of dplyr you can use

    library(dplyr)
    
    df %>% 
      mutate(Female=any(Female==1)+0, .by=Name)
    #              Name Female
    # 1    Claire Smith      1
    # 2        John Doe     NA
    # 3 Serena Williams      1
    # 4    Claire Smith      1
    # 5    Claire Smith      1
    # 6 Serena Williams      1
    

    You just check for any "1" value for the name and propagate that to the group. The any() function returns a TRUE/FALSE value so we use +0 to turn TRUE into 1.