Search code examples
rintegralnumerical-integration

Calculating trapezoidal integration from origin to each time point in R


I have a data frame x:

head(x)
#  time         Qfr
#1    1 0.004751271
#2    2 0.005405618
#3    3 0.005785781
#4    4 0.006028213
#5    5 0.006179973
#6    6 0.006263814

I am trying to calculate the numerical integration from time = 0 up to each time point, i.e., the integral:

\integral_{u=0}^t Qfr du

My data set looks like

plot(x, type = "p", cex = 0.2)

enter image description here

So far I have only been able to calculate the integral in total, using the package pracma:

require(pracma)
trapz(x$time, x$Qfr)
# [1] 0.1536843

How do I code for integral from the origin to the time given in that row?

Any help greatly appreciated!


x <- 
structure(list(time = 1:100, Qfr = c(0.00475127142639315, 0.00540561802578535, 
0.00578578141896237, 0.00602821304872631, 0.00617997318815436, 
0.00626381438010966, 0.0062930341038365, 0.00627650284793016, 
0.00622076547748955, 0.00613104312485634, 0.00601175416200995, 
0.00586680072681021, 0.00569973138194467, 0.00551383427584607, 
0.00531218958660475, 0.00509769744944577, 0.00487309097312275, 
0.00464094029551979, 0.0044036514994002, 0.00416346290979426, 
0.00392244046575488, 0.00368247330791138, 0.00344527034180023, 
0.00321235826358148, 0.00298508133306843, 0.00276460302703881, 
0.00255190958997126, 0.0023478154110241, 0.00215297008955578, 
0.00196786700285879, 0.00179285315617775, 0.00162814007427384, 
0.00147381548391774, 0.00132985553610085, 0.00119613732394456, 
0.00107245146585054, 0.000958514542040229, 0.000853981195025623, 
0.000758455729566888, 0.000671503074231956, 0.000592658993812166, 
0.000521439468716574, 0.000457349183339767, 0.000399889089676022, 
0.000348563034666554, 0.00030288345957139, 0.000262376196826176, 
0.000226584404259194, 0.000195071688184451, 0.000167424475817082, 
0.000143253703811788, 0.000122195893694217, 0.000103913686771705, 
8.80959110345906e-05, 7.44572508696844e-05, 6.27375873853488e-05, 
5.27010730706422e-05, 4.4134999643845e-05, 3.68485125344791e-05, 
3.06712197100413e-05, 2.54517366998868e-05, 2.1056203851463e-05, 
1.73668062169167e-05, 1.42803211212964e-05, 1.17067134905708e-05, 
9.56779447711296e-06, 7.79595484853561e-06, 6.33298101979921e-06, 
5.1289585088234e-06, 4.14126496950611e-06, 3.33365277945052e-06, 
2.67541940141655e-06, 2.14066236056745e-06, 1.70761464370064e-06, 
1.35805559020556e-06, 1.07679186575234e-06, 8.51202848467075e-07, 
6.7084467545985e-07, 5.27107259938671e-07, 4.12918764032332e-07, 
3.22492271674051e-07, 2.51109725048683e-07, 1.94938546369442e-07, 
1.50876746740867e-07, 1.16422711484835e-07, 8.95662353428025e-08, 
6.86977528360132e-08, 5.25330624541746e-08, 4.00511738714265e-08, 
3.0443212344273e-08, 2.30705923615172e-08, 1.74309232147824e-08, 
1.31303328068787e-08, 9.86109389078818e-09, 7.38361045310242e-09, 
5.51197296231586e-09, 4.102421614833e-09, 3.044168569724e-09, 
2.25212541292965e-09, 1.66116273443706e-09)), .Names = c("time", 
"Qfr"), class = "data.frame", row.names = c(NA, -100L))

Solution

  • Since the other answer shows you how to use pracma::trapz to achieve your purpose, I can't do that way. I had planned to wrote an answer that way, but since I spent a great deal of time editing your question, @shayaa took the first place. Luckily, I have a much better idea.

    Trapezoidal numerical integral is nothing complicated. You already have time on a regular grid 1, 2, 3, ... 100 with bin size 1, as well as known function values Qfr on the grid. The numerical integral on each bin is just the area of the trapezoid. So, you could compute:

    ## integration on each bin cell
    cell <- with(x, (Qfr[1:99] + Qfr[2:100]) / 2)
    ## Note that precisely I should write
    ## cell <- with(x, (Qfr[1:99] + Qfr[2:100]) / 2 * diff(time))
    ## But as I said, you have equally spaced bin points with bin size 1
    ## `diff(time)` is always 1, hence left out
    ## You need to bear this in mind, once you work on more general cases.
    

    Then, the cumulative integral value you want is just:

    cumsum(c(0, cell))
    

    This method is supper fast! Suppose you have N data points, it has computational costs of O(N), yet it is fully vectorized. The other answer using sapply is not vectorized, and will cost you O(N^2) computation.