Search code examples
rdplyrnafill

Fill next NA rows with the last observed data


I have follow up data for different people, for example for one guy if i have 10 observations, his name will be only on his first row, the 9 following rows will not have name.

My goal is to fill the name column

Here is a reproducible example of my data:

test = data.frame(name = c("Paul",NA,NA,"John",NA,"Ethan",NA,NA),
                  date = c("2016-05-06","2017-05-06","2018-05-06","2012-08-09","2016-02-01","2017-06-06","2017-07-06","2017-08-06"),
                  data = c(1,2,1,NA,2,2,NA,2))

That is how the data looks like :

  name       date data
1  Paul 2016-05-06    1
2  <NA> 2017-05-06    2
3  <NA> 2018-05-06    1
4  John 2012-08-09   NA
5  <NA> 2016-02-01    2
6 Ethan 2017-06-06    2
7  <NA> 2017-07-06   NA
8  <NA> 2017-08-06    2

And my goal is to have that :

  name       date data
1  Paul 2016-05-06    1
2  Paul 2017-05-06    2
3  Paul 2018-05-06    1
4  John 2012-08-09   NA
5  John 2016-02-01    2
6 Ethan 2017-06-06    2
7 Ethan 2017-07-06   NA
8 Ethan 2017-08-06    2

I did not find any function that can replace until the next not NA observation, and for information the data is sorted by person and by date.


Solution

  • One option would be tidyr::fill:

    test = data.frame(name = c("Paul",NA,NA,"John",NA,"Ethan",NA,NA),
                      date = c("2016-05-06","2017-05-06","2018-05-06","2012-08-09","2016-02-01","2017-06-06","2017-07-06","2017-08-06"),
                      data = c(1,2,1,NA,2,2,NA,2))
    
    tidyr::fill(test, name)
    #>    name       date data
    #> 1  Paul 2016-05-06    1
    #> 2  Paul 2017-05-06    2
    #> 3  Paul 2018-05-06    1
    #> 4  John 2012-08-09   NA
    #> 5  John 2016-02-01    2
    #> 6 Ethan 2017-06-06    2
    #> 7 Ethan 2017-07-06   NA
    #> 8 Ethan 2017-08-06    2