Search code examples
rif-statementdplyrmutated

R - allocate a share of a number over different columns using an ifelse statement


I have the following data set:

observation <- c(1:10)
pop.d.rank  <- c(1:10)
cost.1  <- c(101:110)
cost.2  <- c(102:111)
cost.3  <- c(103:112)
all <- data.frame(observation,pop.d.rank,cost.1, cost.2, cost.3) 

And I want to allocate the following amount of money over three years:

annual.investment <- 500

I can do this for the first year with the following script:

library(dplyr)

all <- all %>%  
 mutate(capital_allocated.5G = diff(c(0, pmin(cumsum(cost), annual.investment)))) %>%
 mutate(capital_percentage.5G = capital_allocated.5G / cost * 100) %>%
 mutate(year = ifelse(capital_percentage.5G >= 50, "Year.1",0))

But when I try to do this for the second year, taking into account the previous year's investment, the code does not work. Here is my attempt at putting an ifelse statement in the mutate loop so that it does not overwrite the money allocated in the previous year:

all <- all %>%  
 mutate(capital_allocated.5G = ifelse(year == 0, diff(c(0, pmin(cumsum(cost), annual.investment))), 0) %>%
 mutate(capital_percentage.5G = capital_allocated.5G / cost * 100) %>%
 mutate(year = ifelse(capital_percentage.5G >= 50, "Year.2",0))

I want the data to look like the following, where the amount allocated goes first to any row that hasn't been 100% completed from the previous year.

capital_allocated.5G <- c(101, 102, 103, 104, 105, 106, 107, 108, 109, 55)
capital_percentage.5G <- c(100, 100, 100, 100, 100, 100, 100, 100, 100, 50)
year <- c("Year.1", "Year.1","Year.1", "Year.1","Year.1", "Year.2", "Year.2","Year.2", "Year.2","Year.2")
example.output <- data.frame(observation,pop.d.rank,cost,   capital_allocated.5G, capital_percentage.5G, year) 

Edit: cost.1 is the cost variable for year 1, cost.2 is the variable for year 2 and cost.3 is the cost variable for year 3

EDIT: Problem with previously accepted answer

I've realised that this ends up allocating in excess of 100 for the capital_percentage.5G variable. I have created a reproducible example. I think this relates to the fact that some costs decrease over time and some costs increase over time.

The logic behind this is that when an investment is made in one year, there is a specific cost of deployment for a 5G mobile network and that is what the cost columns relate to for that point in time. Once that investment has been made in one year, I want the function to provide a capital_percentage.5G 100% and then not allocate any more capital to it in future years.

How do I get it so that the percentage value hits a limit at 100 and more of the capital allocation isn't allocated to it at a later date?

observation <- c(1:10)
pop.d.rank  <- c(1:10)
cost.1  <- c(101:110)
cost.2  <- c(110:101)
cost.3  <- c(100:91)
all <- data.frame(observation,pop.d.rank,cost.1, cost.2, cost.3) 

capital_allocated.5G <- rep(0,10)   ## initialize to zero
capital_percentage.5G <- rep(0,10)  ## initialize to zero
year <- rep(NA,10)                  ## initialize to NA
all <- data.frame(observation,pop.d.rank,cost.1, cost.2, cost.3,   capital_allocated.5G,capital_percentage.5G,year) 

alloc.invest <- function(df, ann.invest, y) {
  df %>% mutate_(cost=paste0("cost.",y)) %>%
    mutate(capital_percentage.5G = capital_allocated.5G / cost * 100,
           year = ifelse(capital_percentage.5G < 50, NA, year),
           not.yet.alloc = ifelse(capital_percentage.5G < 100,cost-capital_allocated.5G,0),
           capital_allocated.5G = capital_allocated.5G +     ifelse(capital_percentage.5G < 100,diff(c(0, pmin(cumsum(not.yet.alloc), ann.invest))), 0),
       capital_percentage.5G = capital_allocated.5G / cost * 100,
       year = ifelse(is.na(year) & capital_percentage.5G >= 50, paste0("Year.",y), year)) %>%
select(-cost,-not.yet.alloc)
}

annual.investment <- 500
all <- alloc.invest(all,annual.investment,1)
print(all)
all <- alloc.invest(all,annual.investment,2)
print(all)
all <- alloc.invest(all,annual.investment,3)
print(all)

On year 3, in the final investment allocation here, the capital_percentage.5G suddenly shoot up to 110%.


Solution

  • Updated for year-on-year costs that may increase or decrease

    For different costs per year that may decrease per year as well as increase, we simply do not need to check if the capital_percentage.5G exceeded 100 percent when updating not.yet.alloc and capital_allocated.5G:

    library(dplyr)
    alloc.invest <- function(df, ann.invest, y) {
      df %>% mutate_(cost=paste0("cost.",y)) %>%
        mutate(capital_percentage.5G = capital_allocated.5G / cost * 100,
               year = ifelse(capital_percentage.5G < 50, NA, year),
               not.yet.alloc = cost-capital_allocated.5G,
               capital_allocated.5G = capital_allocated.5G + diff(c(0, pmin(cumsum(not.yet.alloc), ann.invest))),
               capital_percentage.5G = capital_allocated.5G / cost * 100,
               year = ifelse(is.na(year) & capital_percentage.5G >= 50, paste0("Year.",y), year)) %>%
        select(-cost,-not.yet.alloc)
    }
    

    With the new cost data:

    observation <- c(1:10)
    pop.d.rank  <- c(1:10)
    cost.1  <- c(101:110)
    cost.2  <- c(110:101)
    cost.3  <- c(100:91)
    

    Augment with initial value columns as before:

    capital_allocated.5G <- rep(0,10)   ## initialize to zero
    capital_percentage.5G <- rep(0,10)  ## initialize to zero
    year <- rep(NA,10)                  ## initialize to NA
    all <- data.frame(observation,pop.d.rank,cost.1, cost.2, cost.3, capital_allocated.5G,capital_percentage.5G,year) 
    

    Year 1:

    annual.investment <- 500
    all <- alloc.invest(all,annual.investment,1)
    print(all)
    ##   observation pop.d.rank cost.1 cost.2 cost.3 capital_allocated.5G capital_percentage.5G   year
    ##1            1          1    101    110    100                  101             100.00000 Year.1
    ##2            2          2    102    109     99                  102             100.00000 Year.1
    ##3            3          3    103    108     98                  103             100.00000 Year.1
    ##4            4          4    104    107     97                  104             100.00000 Year.1
    ##5            5          5    105    106     96                   90              85.71429 Year.1
    ##6            6          6    106    105     95                    0               0.00000   <NA>
    ##7            7          7    107    104     94                    0               0.00000   <NA>
    ##8            8          8    108    103     93                    0               0.00000   <NA>
    ##9            9          9    109    102     92                    0               0.00000   <NA>
    ##10          10         10    110    101     91                    0               0.00000   <NA>
    

    Year 2:

    all <- alloc.invest(all,annual.investment,2)
    print(all)
    ##   observation pop.d.rank cost.1 cost.2 cost.3 capital_allocated.5G capital_percentage.5G   year
    ##1            1          1    101    110    100                  110             100.00000 Year.1
    ##2            2          2    102    109     99                  109             100.00000 Year.1
    ##3            3          3    103    108     98                  108             100.00000 Year.1
    ##4            4          4    104    107     97                  107             100.00000 Year.1
    ##5            5          5    105    106     96                  106             100.00000 Year.1
    ##6            6          6    106    105     95                  105             100.00000 Year.2
    ##7            7          7    107    104     94                  104             100.00000 Year.2
    ##8            8          8    108    103     93                  103             100.00000 Year.2
    ##9            9          9    109    102     92                  102             100.00000 Year.2
    ##10          10         10    110    101     91                   46              45.54455   <NA>
    

    Year 3:

    all <- alloc.invest(all,annual.investment,3)
    print(all)
    ##   observation pop.d.rank cost.1 cost.2 cost.3 capital_allocated.5G capital_percentage.5G   year
    ##1            1          1    101    110    100                  100                   100 Year.1
    ##2            2          2    102    109     99                   99                   100 Year.1
    ##3            3          3    103    108     98                   98                   100 Year.1
    ##4            4          4    104    107     97                   97                   100 Year.1
    ##5            5          5    105    106     96                   96                   100 Year.1
    ##6            6          6    106    105     95                   95                   100 Year.2
    ##7            7          7    107    104     94                   94                   100 Year.2
    ##8            8          8    108    103     93                   93                   100 Year.2
    ##9            9          9    109    102     92                   92                   100 Year.2
    ##10          10         10    110    101     91                   91                   100 Year.3
    

    The original issue with your code is that ifelse just provide a switch on the output based on the condition and not the input cost used within the TRUE branch of the ifelse. Therefore, cumsum(cost) computes the cumsum over all cost and not only on the portion of the TRUE branch of the ifelse. To fix this, we can define the following function that can then be executed for each year in turn.

    library(dplyr)
    alloc.invest <- function(df, ann.invest, y) {
      df %>% mutate(not.yet.alloc = ifelse(capital_percentage.5G < 100,cost-capital_allocated.5G,0),
                    capital_allocated.5G = capital_allocated.5G + ifelse(capital_percentage.5G < 100,diff(c(0, pmin(cumsum(not.yet.alloc), ann.invest))), 0),
                    capital_percentage.5G = capital_allocated.5G / cost * 100,
                    year = ifelse(is.na(year) & capital_percentage.5G >= 50, paste0("Year.",y), year)) %>%
             select(-not.yet.alloc)
    }
    

    Note:

    1. Create a new temporary column not.yet.alloc from which we compute the resulting cumsum for the year's allocation.
    2. Don't need separate mutate statements.
    3. Need to also check is.na(year) before setting year. Otherwise, previous year already labelled will be overwritten.

    To use this function, we must first augment the input data with some initial values for capital_allocated.5G, capital_percentage.5G, and year:

    capital_allocated.5G <- rep(0,10)   ## initialize to zero
    capital_percentage.5G <- rep(0,10)  ## initialize to zero
    year <- rep(NA,10)                  ## initialize to NA
    all <- data.frame(observation,pop.d.rank,cost,capital_allocated.5G,capital_percentage.5G,year) 
    

    Then for Year 1:

    annual.investment <- 500
    all <- alloc.invest(all,annual.investment,1)
    print(all)
    ##   observation pop.d.rank cost capital_allocated.5G capital_percentage.5G   year
    ##1            1          1  101                  101             100.00000 Year.1
    ##2            2          2  102                  102             100.00000 Year.1
    ##3            3          3  103                  103             100.00000 Year.1
    ##4            4          4  104                  104             100.00000 Year.1
    ##5            5          5  105                   90              85.71429 Year.1
    ##6            6          6  106                    0               0.00000   <NA>
    ##7            7          7  107                    0               0.00000   <NA>
    ##8            8          8  108                    0               0.00000   <NA>
    ##9            9          9  109                    0               0.00000   <NA>
    ##10          10         10  110                    0               0.00000   <NA>
    

    and for Year 2:

    all <- alloc.invest(all,annual.investment,2)
    print(all)
    ##   observation pop.d.rank cost capital_allocated.5G capital_percentage.5G   year
    ##1            1          1  101                  101                   100 Year.1
    ##2            2          2  102                  102                   100 Year.1
    ##3            3          3  103                  103                   100 Year.1
    ##4            4          4  104                  104                   100 Year.1
    ##5            5          5  105                  105                   100 Year.1
    ##6            6          6  106                  106                   100 Year.2
    ##7            7          7  107                  107                   100 Year.2
    ##8            8          8  108                  108                   100 Year.2
    ##9            9          9  109                  109                   100 Year.2
    ##10          10         10  110                   55                    50 Year.2  
    

    Update to new requirement of changing costs per year

    If costs are different per year, then the function needs to readjust the capital_percentage.5G and possibly the year columns first:

    library(dplyr)
    alloc.invest <- function(df, ann.invest, y) {
      df %>% mutate_(cost=paste0("cost.",y)) %>%
             mutate(capital_percentage.5G = capital_allocated.5G / cost * 100,
                    year = ifelse(capital_percentage.5G < 50, NA, year),
                    not.yet.alloc = ifelse(capital_percentage.5G < 100,cost-capital_allocated.5G,0),
                    capital_allocated.5G = capital_allocated.5G + ifelse(capital_percentage.5G < 100,diff(c(0, pmin(cumsum(not.yet.alloc), ann.invest))), 0),
                    capital_percentage.5G = capital_allocated.5G / cost * 100,
                    year = ifelse(is.na(year) & capital_percentage.5G >= 50, paste0("Year.",y), year)) %>%
             select(-cost,-not.yet.alloc)
    }
    

    Note that creating another temporary column cost using mutate_ is only for convenience as the cost column needs to be dynamically selected based on the input y (otherwise, we need to use mutate_ for all computations, which will be somewhat messier).

    With the updated data similarly augmented with initial values for capital_allocated.5G, capital_percentage.5G, and year, Year 1:

    annual.investment <- 500
    all <- alloc.invest(all,annual.investment,1)
    print(all)
    ##   observation pop.d.rank cost.1 cost.2 cost.3 capital_allocated.5G capital_percentage.5G   year
    ##1            1          1    101    102    103                  101             100.00000 Year.1
    ##2            2          2    102    103    104                  102             100.00000 Year.1
    ##3            3          3    103    104    105                  103             100.00000 Year.1
    ##4            4          4    104    105    106                  104             100.00000 Year.1
    ##5            5          5    105    106    107                   90              85.71429 Year.1
    ##6            6          6    106    107    108                    0               0.00000   <NA>
    ##7            7          7    107    108    109                    0               0.00000   <NA>
    ##8            8          8    108    109    110                    0               0.00000   <NA>
    ##9            9          9    109    110    111                    0               0.00000   <NA>
    ##10          10         10    110    111    112                    0               0.00000   <NA>
    

    Year 2: Note that last asset has less than 50% allocated so its year is still NA.

    all <- alloc.invest(all,annual.investment,2)
    print(all)
    ##   observation pop.d.rank cost.1 cost.2 cost.3 capital_allocated.5G capital_percentage.5G   year
    ##1            1          1    101    102    103                  102             100.00000 Year.1
    ##2            2          2    102    103    104                  103             100.00000 Year.1
    ##3            3          3    103    104    105                  104             100.00000 Year.1
    ##4            4          4    104    105    106                  105             100.00000 Year.1
    ##5            5          5    105    106    107                  106             100.00000 Year.1
    ##6            6          6    106    107    108                  107             100.00000 Year.2
    ##7            7          7    107    108    109                  108             100.00000 Year.2
    ##8            8          8    108    109    110                  109             100.00000 Year.2
    ##9            9          9    109    110    111                  110             100.00000 Year.2
    ##10          10         10    110    111    112                   46              41.44144   <NA>
    

    Year 3:

    all <- alloc.invest(all,annual.investment,3)
    print(all)
    ##   observation pop.d.rank cost.1 cost.2 cost.3 capital_allocated.5G capital_percentage.5G   year
    ##1            1          1    101    102    103                  103                   100 Year.1
    ##2            2          2    102    103    104                  104                   100 Year.1
    ##3            3          3    103    104    105                  105                   100 Year.1
    ##4            4          4    104    105    106                  106                   100 Year.1
    ##5            5          5    105    106    107                  107                   100 Year.1
    ##6            6          6    106    107    108                  108                   100 Year.2
    ##7            7          7    107    108    109                  109                   100 Year.2
    ##8            8          8    108    109    110                  110                   100 Year.2
    ##9            9          9    109    110    111                  111                   100 Year.2
    ##10          10         10    110    111    112                  112                   100 Year.3