Search code examples
rdplyrpanel

creating a new column that takes initial values of another column in R


I am working on a cross sectional dataset. I want to create a new column, named "initial", that will contain initial values of another column. More specifically, for each country, the column initial takes the value of another column called "ratio" for the first year for which the data is available and will take the value 0 for all remaining years. Sample code:

country <- c(rep(c("A","B","C","D"),each=5))
year <- c(1980:1984, 1980: 1984, 1980:1984, 1980:1984)
ratio <- runif(n = 20, min = 0.20, max = 0.40)
mydata <- data.frame(country, year, ratio)
mydata$ratio[[1]] <- NA
mydata$ratio[6:7] <- NA
mydata$ratio[16:18] <- NA

The output I want to obtain looks like this:

enter image description here

Is there a way of doing this in R preferably using the dplyr package?

Thanks very much in advance!


Solution

  • Making use of dplyr::first you could do:

    
    library(dplyr)
    
    mydata %>% 
      group_by(country) %>% 
      mutate(initial = first(ratio[!is.na(ratio)]),
             initial = ifelse(is.na(ratio) | ratio != initial, 0, initial)) %>% 
      ungroup()
    #> # A tibble: 20 × 4
    #>    country  year  ratio initial
    #>    <chr>   <int>  <dbl>   <dbl>
    #>  1 A        1980 NA       0    
    #>  2 A        1981  0.387   0.387
    #>  3 A        1982  0.257   0    
    #>  4 A        1983  0.366   0    
    #>  5 A        1984  0.328   0    
    #>  6 B        1980 NA       0    
    #>  7 B        1981 NA       0    
    #>  8 B        1982  0.227   0.227
    #>  9 B        1983  0.331   0    
    #> 10 B        1984  0.341   0    
    #> 11 C        1980  0.292   0.292
    #> 12 C        1981  0.344   0    
    #> 13 C        1982  0.387   0    
    #> 14 C        1983  0.251   0    
    #> 15 C        1984  0.292   0    
    #> 16 D        1980 NA       0    
    #> 17 D        1981 NA       0    
    #> 18 D        1982 NA       0    
    #> 19 D        1983  0.295   0.295
    #> 20 D        1984  0.312   0
    

    DATA

    set.seed(42)
    
    country <- c(rep(c("A","B","C","D"),each=5))
    year <- c(1980:1984, 1980: 1984, 1980:1984, 1980:1984)
    ratio <- runif(n = 20, min = 0.20, max = 0.40)
    mydata <- data.frame(country, year, ratio)
    mydata$ratio[[1]] <- NA
    mydata$ratio[6:7] <- NA
    mydata$ratio[16:18] <- NA