Search code examples
rdataframecategoriesrows

Is there a way to select and proportion rows based on their value in R?


I have a dataframe which looks like this:

   a          b  c   d
1  2005-01-01 0 ... ...
2  2005-02-22 1 ... ...
3  2005-04-02 0 ... ...
4  2005-12-01 3 ... ...
5  2006-03-03 0 ... ...
6  2006-06-08 1 ... ...
7  2006-10-11 0 ... ...
8  2006-12-02 4 ... ...
9  2007-03-24 0 ... ...
10 2007-04-06 2 ... ...
11 2008-01-28 0 ... ...
12 2008-08-19 0 ... ...
13 2008-09-12 0 ... ...
14 2008-12-12 2 ... ...
15 2009-05-27 0 ... ...
16    ...     . ... ...

And I want to select all the rows in the year 2005 and view how many of them are 0,1,2,3 or 4 for example (so combined with column b). Maybe with proportions? For example that the result would be:

output:
2005
0    1    2    3    4
20%  20%  20%  20%  20%

I tried table(year(DF$a), c=DF$b) however this only yields an overview of all the years, and no proportions or anything like that. I tried piping this into a proportions function with %>%, however this doesn't work.

Anyone know how to do this?


Solution

  • You can use table and proportions to get the share per Year, where you can give a margin in proportions, here 1, to do it per row.

    proportions(table(format(DF$a, "%Y"), DF$b), 1) * 100
    #         0   1   2   3   4
    #  2005  50  25   0  25   0
    #  2006  50  25   0   0  25
    #  2007  50   0  50   0   0
    #  2008  75   0  25   0   0
    #  2009 100   0   0   0   0
    

    Data:

    DF <- structure(list(a = structure(c(12784, 12836, 12875, 13118, 13210, 
    13307, 13432, 13484, 13596, 13609, 13906, 14110, 14134, 14225, 
    14391), class = "Date"), b = c(0L, 1L, 0L, 3L, 0L, 1L, 0L, 4L, 
    0L, 2L, 0L, 0L, 0L, 2L, 0L), c = c("...", "...", "...", "...", 
    "...", "...", "...", "...", "...", "...", "...", "...", "...", 
    "...", "..."), d = c("...", "...", "...", "...", "...", "...", 
    "...", "...", "...", "...", "...", "...", "...", "...", "..."
    )), row.names = c("1", "2", "3", "4", "5", "6", "7", "8", "9", 
    "10", "11", "12", "13", "14", "15"), class = "data.frame")