Search code examples
rcountsamplereplicate

How to count number of occurence in a large dataset


I'm trying to count the number of occurence of each "scenarios" that I have (0 to 9) in a data frame over 25 years. Basically, I have 10000 simulations of scenarios named 0 to 9, each scenario having a probability of occurence.

My dataframe is too big to paste in here but here's a preview:

simulation=as.data.frame(replicate(10000,sample(c(0:9),size=25,replace=TRUE,prob=prob)))

simulation2=transpose(simulation)

Note** prob is a vector with the probability to observe each scenario

   v1 v2 v3 v4 v5 v6 ... v25
1   0  0  4  0  2  0      9
2   1  0  0  2  3  0      6
3   0  4  6  2  0  0      0
4
...
10000

This is what I have tried so far:

for (i in c(1:25)){
  for (j in c(0:9)){
f=sum(simulation2[,i]==j);
vect_f=c(vect_f,f)
  }
  vect_f=as.data.frame(vect_f)
}

If I omit the "for (i in c(1:25))", this returns me the right first column of the output desired. Now I am trying to replicate this over 25 years. When I put the second 'for' I do not get the output desired.

The output should look like this :

      (Year) 1  2  3  4  5  6   ... 25
(Scenario)
   0         649
   1         239
   ...
   9          11

649 being the number of times 'scenario 0' is observed the first year over my 10 000 simulations.

Thanks for your help


Solution

  • We can use table

    sapply(simulation2, table)
    
    #    V1   V2   V3   V4   V5 .....
    #0 1023 1050  994 1016 1022 .....
    #1 1050  968  950 1001  981 .....
    #2  997  969 1004  999  949 .....
    #3 1031  977 1001  993 1009 .....
    #4 1017 1054 1020 1003  985 .....
    #......
    

    If there are certain values missing in a column we can convert the numbers to factor including all levels

    sapply(simulation2, function(x) table(factor(x, levels = 0:9)))