Search code examples
raggregate

How to find the population for certain variables


I am having trouble getting a desired result in R and am seeking assistance. I have included my data below.

##       ID        DOB sector meters Oct   Res_FROM     Res_TO   Exp_FROM
## 1  20100 1979-08-24    H38   6400   W 1979-08-15 1991-05-15 1979-08-24
## 2  20101 1980-05-05    B01   1600  NW 1980-05-15 1991-04-15 1980-05-15
## 3  20102 1979-03-17    H04   1600  SW 1972-06-15 1979-08-15 1979-03-17
## 4  20103 1981-11-30    B09   3200  NE 1982-01-15 1984-01-15 1982-01-15
## 5  20103 1981-11-30    B37   8000   N 1984-01-15 1986-04-15 1984-01-15
## 6  20104 1978-09-01    B09   3200  NE 1982-01-15 1984-01-15 1982-01-15

Out of this data, I want to have R figure out how many IDs are in each sector. I shortened my data so that it would not become cluttered, but there are 100 sectors. I want to know how many IDs are in each sector, so for example, I need a result where sector B01 is listed with x number of IDs, sector B02 is listed with x number of IDs, and so on. My overall goal is to find the population of individuals in each sector, which can be identified by the IDs.


Solution

  • In base R with aggregate:

    aggregate(ID ~ sector, function(ID) length(unique(ID)), data = df)
    
      sector ID
    1    B01  1
    2    B09  2
    3    B37  1
    4    H04  1
    5    H38  1
    

    Using the dplyr package:

    library(dplyr)
    
    df %>% 
      group_by(sector) %>% 
      summarize(count = n_distinct(ID)) %>% 
      ungroup()
    
      sector count
      <chr>  <int>
    1 B01        1
    2 B09        2
    3 B37        1
    4 H04        1
    5 H38        1
    

    If you want to add this variable to your data frame, use mutate instead of summarize.