Search code examples
rlapplysapply

Weight parameter in array based on other parameters


I've got this table/array in CSV:

GroupID Channel Daysbeforelast
1           A   35
1           B   31
1           C   29
1           D   17
1           E   15
1           D   5
1           C   0
2           B   66
2           E   17
2           D   15
2           A   2
2           C   0
2           F   0
2           A   0
4           B   15
4           C   0

And trying to add one more column Weight so the table will look like this:

GroupID Channel Daysbeforelast      Weight  How it is calculated
1           A           35          0.00005         (1-x/2/2-x/2-x)/2
1           B           31          0.00005         (1-x/2/2-x/2-x)/2
1           C           29          0.0833          (x/2/2)/3
1           D           17          0.0833          (x/2/2)/3
1           E           15          0.0833          (x/2/2)/3
1           D           5           0.25            (x/2)
1           C           0           0.5             (x)
2           B           66          0.125           (1-x/2/2-x/2-x)
2           E           17          0.0625          (x/2/2)/2
2           D           15          0.0625          (x/2/2)/2
2           A           2           0.25            (x/2)
2           C           0           0.25            (x)/3
2           F           0           0.25            (x)/3
2           A           0           0.5             (x)/3
4           B           15          0.5             (1-x)
4           C           0           0.5             (x)

Below it is explained:

Each group can have one or more subgroups depending on the data:

If Daysbeforelast=0 then Subgroup1;

If 0<Daysbeforelast<=7 then Subgroup2;

If 7<Daysbeforelast<=14 then Subgroup3;

If 14<Daysbeforelast<=30 then Subgroup4;

Else Subgroup5.

First subgroup has weight = x (x=0.5 for example). This weight is distributed evenly across all rows in subgroup in the group. Let's say group Y has 3 rows in SubgroupZ. In this case each row will have weight equal (SubgroupZ weight)/3.

Subgroup2 has weight = x/2 and it is distributed across all rows in that subgroup the same way.

Each subgroup until the last one will receive weight equal to weight of previous subgroup divided by 2.

Last subgroup in the group (it is not always Subgroup 5) will receive weight = 1 - sum of all previous weights in the group.

This is to check that sum of Weights for each group is equal to 1.

Note that if we for example don't have any rows in Subgroup2 then Subgroup3 will receive weight x/2 (not x/2/2).

We have Subgroup1 for all groups but all or any other of subgroups can be missing.

The easiest way to do this perhaps using variables lapply but I got stack with it as I'm quite new to R.


Solution

  • Here is the solution: For each group, identify subgroups using cut and drop the absentee subgroups using droplevels. Allocate weights as (x/2^n)/freq. Then identify the minimum weights and adjust them such that sum of weights in a group add upto 1.

    dat <- read.table("clipboard", header = T)
    groupIDs <- unique(dat$GroupID)
    x = 0.5
    for (i in groupIDs)
    {
      rows = which(dat$GroupID == i)
      Subgroups <- cut(dat[rows,3], c(-Inf,0,7,14,30,Inf), labels = 1:5)
      a<-droplevels(Subgroups)
      wts <- sapply(a,function(y)(x/(2^(as.numeric(y)-1)))/(table(a)[which(levels(a) == y)]))
    
      ind <- wts==min(wts)
      wts[ind] <- (1-sum(wts[!ind]))/sum(ind)
      dat[rows,4] <- wts    
    }
    names(dat)[4] <- "Weight"