I have historical records of the growth (in terms of size) of our database for past couple of years. I am trying to figure out the best way/graph that can show me the future growth of database based on the historical records, of course this won't help if we add a new table and that would grow too, but I am just looking for a way to estimate it. I am open to ideas in Python or R
Here is the size of the database in TB over years:
3.895 - 2012
6.863 - 2013
8.997 - 2014
10.626 - 2015
d <- data.frame(x= 2012:2015,
y = c(3.895, 6.863, 8.997, 10.626))
You can visualize the fit (and its projection): here I'm comparing an additive and a polynomial model. I'm not sure I believe the confidence intervals on the additive model, though:
library("ggplot2"); theme_set(theme_bw())
ggplot(d,aes(x,y))+ geom_point() +
expand_limits(x=2018)+
geom_smooth(method="lm",formula=y~poly(x,2),
fullrange=TRUE,fill="blue")+
geom_smooth(method="gam",formula=y~s(x,k=3),colour="red",
fullrange=TRUE,fill="red")
I'm a little shocked the quadratic relationship is so close.
summary(m1 <- lm(y~poly(x,2),data=d))
## Residual standard error: 0.07357 on 1 degrees of freedom
## Multiple R-squared: 0.9998, Adjusted R-squared: 0.9994
## F-statistic: 2344 on 2 and 1 DF, p-value: 0.0146
Predict:
predict(m1,newdata=data.frame(x=2016:2018),interval="confidence")
## fit lwr upr
## 1 11.50325 8.901008 14.10549
## 2 11.72745 6.361774 17.09313
## 3 11.28215 2.192911 20.37139
Did you make up these numbers, or are they real data?
The forecast()
package would be better for more sophisticated methods.