Search code examples
rapproximationsummarization

approximate proportions preserving sum (1 = 100%) in R


I have a big table in which I have calculated the number of counts by subcategory countsperc (subcategory names not shown) for every category (id), then the total of observations per category (id) in column sumofcounts, and the proportion of subcategory to the total (counsperc/sumofcounts) in apppropor (approx. proportions), that needs to be approximate (3 decimal).
The problem is, the sum of approximate proportions (old_sum) for categories (id) has to be 1.000 instead of 0.999, etc.
So, I would like to ask for a method to add or subtract 0.001, on any sub-item of column apppropor in order to get 1.000 always as the sum. For example, in row1 the number could be 0.334 instead of 0.333
EDIT: The goal of the task is not to produce solely a exact sum of 1, which has no utility, but to produce an input to other program, which will consider the column apppropor as is (requiring it will sum 1.000 per id, see error message below).

text1<-"
id    countsperc sumofcounts   apppropor     
item1          1           3       0.333     
item1          1           3       0.333     
item1          1           3       0.333     
item2          1         121       0.008     
item2        119         121       0.983     
item2          1         121       0.008     
item3          1          44       0.023    
item3          1          44       0.023     
item3         41          44       0.932     
item3          1          44       0.023     
item4          1          29       0.034     
item4          3          29       0.103      
item4          1          29       0.034   
item4         24          29       0.828"
table1<-read.table(text=text1,header=T)
library(data.table)
sums<-as.data.frame(setDT(table1)[, sum(`apppropor`), by = .(id)][,.(id, old_sum = V1)])
table1<-merge(table1,sums)
table1

chromEvol Version: 2.0. Last updated December 2013

The count probabilities for taxa Ad_mic not sum to 1.0 chromEvol: errorMsg.cpp:41: static void errorMsg::reportError(const string&, int): Assertion `0' failed. Aborted (core dumped)


Solution

  • I found a way.

    table1$dif<-1-table1$old_sum
    table1<-table1[order(table1$id),]
    len<-rle(as.vector(table1$id))[[1]]
    table1$apppropor[cumsum(len)]<-table1$apppropor[cumsum(len)]+table1$dif[cumsum(len)]
    #verify
    library(data.table)
    sums<-as.data.frame(setDT(table1)[, sum(`apppropor`), by = .(id)][,.(id, new_sum = V1)])
    table1<-merge(table1,sums)
    table1