R - Disaggregate coverage area data based on a ranking preference

I have 4G mobile coverage at the Local Authority level in the UK, as a percentage of geographical area covered (for approximately 200 areas). I want to disaggregate this data so I can work with roughly 9000 lower-level postcode sector.

The most appropriate way for me to do this, is to allocate 4G geographic coverage to the most densely populated areas first, as this would best represent how mobile operators would invest in the market. The least populated areas would end up with no coverage. I'm struggling with how I do this in R, however.

I have a data frame that looks like this for the postcode sector data (I've used hypothetical data here):

Name      pcd.sect  pop    area pop.dens  rank  
Cambridge   1      5546    0.6   8341      1     
Cambridge   2      7153    1.1   5970      2     
Cambridge   3      5621    2.3   5289      3     
Cambridge   4      10403   4.3   4361      4     
Cambridge   5      14796   4.2   3495      5     
...

I then took the aggregate local authority data and put it on each row (adding the three right columns):

Name      pcd.sect  pop    area pop.dens  rank  LA.4G  LA.area   LA.4G(km2)
Cambridge   1      5546    0.6   8341      1     58     140        82
Cambridge   2      7153    1.1   5970      2     58     140        82  
Cambridge   3      5621    2.3   5289      3     58     140        82
Cambridge   4      10403   4.3   4361      4     58     140        82
Cambridge   5      14796   4.2   3495      5     58     140        82  
...

I had to shorten the headings, so let me just explain them in more detail:

Name - Local Authority name
pcd.sector - postcode sector (so the lower level unit)
pop - the population in the postcode sector
area - surface area of the postcode sector in km2
pop.dens - is the population density of the postcode sector in persons per km2
rank - rank of the postcode sector based on population density within each local authority
LA.4G - % coverage of the local authority with 4G
LA.area - the sum of the area column for each local authority
LA.4G(km2) - the number of km2 with 4G coverage within each local authority

Taking Cambridge as a hypothetical example, there is 58% 4G coverage across the whole Local Authority. I then want to disaggregate this number to achieve 4G coverage for the individual postcode sectors.

Ideally the data would end up looking like this, with an extra column for the postcode sector coverage:

Name      pcd.sect  ...     pcd.sector.coverage (%)
Cambridge   1       ...         100
Cambridge   2       ...         100
Cambridge   3       ...         100
Cambridge   4       ...         34
Cambridge   5       ...         0
...        ...      ...         ...

How do I get R to allocate this 82km2 (58% geographical coverage) out to the postcode sectors in a new column, based on the area column, but then stopping once it hits the maximum coverage level of 82km2 (58% geographical coverage)?

Solution

this is how I interpret this question. Correct me if this is not what you meant. Suppose you have the following data.

dat <- data.frame(
  Name = "A", pcd.sector = 1:5,
  area = c(2, 3, 1, 5, 3), 
  areaSum = 14, LA.4G = 8
)
dat

#  Name pcd.sector area areaSum LA.4G
#1    A          1    2      14     8
#2    A          2    3      14     8
#3    A          3    1      14     8
#4    A          4    5      14     8
#5    A          5    3      14     8

You have five sectors, with various areas. Although the areas sum up to 14, there are only 8 covered by 4G. You want to allocate the areas from the sectors 1 through 5.

The following code does this job. I used cumsum function to compute the cumulative sum of areas from the top sector, which is capped by the 4G coverage limit. Allocated area can be computed by diff function, which takes the one-step difference of a vector. The sector 1 through 3 gets 100% coverage, which sum up to 6 areas, hence only 2 remains. Although sector 4 has 5 area, it can only enjoy 2, or 40%. This uses up the areas and nothing is left for the sector 5.

dat$area_allocated <- diff(c(0, pmin(cumsum(dat$area), dat$LA.4G)))
dat$area_coverage  <- dat$area_allocated / dat$area * 100
dat

#   Name pcd.sector area areaSum LA.4G area_allocated area_coverage
# 1    A          1    2      14     8              2           100
# 2    A          2    3      14     8              3           100
# 3    A          3    1      14     8              1           100
# 4    A          4    5      14     8              2            40
# 5    A          5    3      14     8              0             0

If you have a lot of areas, then you may want to use dplyr::group_by function.

dat <- rbind(
  data.frame(
    Name = "A", pcd.sector = 1:5,
    area = c(2, 3, 1, 5, 3), 
    areaSum = 14, LA.4G = 8
  ),
  data.frame(
    Name = "B", pcd.sector = 1:3,
    area = c(4, 3, 2), 
    areaSum = 9, LA.4G = 5
  )
)

library(dplyr)
dat <- dat %>% group_by(Name) %>% 
  mutate(area_allocated = diff(c(0, pmin(cumsum(area), LA.4G)))) %>%
  mutate(area_coverage = area_allocated / area * 100)
dat

#     Name pcd.sector  area areaSum LA.4G area_allocated area_coverage
#   <fctr>      <int> <dbl>   <dbl> <dbl>          <dbl>         <dbl>
# 1      A          1     2      14     8              2     100.00000
# 2      A          2     3      14     8              3     100.00000
# 3      A          3     1      14     8              1     100.00000
# 4      A          4     5      14     8              2      40.00000
# 5      A          5     3      14     8              0       0.00000
# 6      B          1     4       9     5              4     100.00000
# 7      B          2     3       9     5              1      33.33333
# 8      B          3     2       9     5              0       0.00000