I have a data set of connection forces based on axial force in N (http://pastebin.com/Huwg4vxv)
Some previous analyses has been undertaken (by another party) and has fitted a Weibull distribution to it, and then predicted that the chances of recording a force of 60N or higher is around 1.2%.
I have to say that eyeballing the data, that doesn't seem likely to me, but I know nothing about this particular distribution.
So far I am able to fit the curve:
force<-read.csv(file="forcestats.csv",header = T)
library(MASS)
fitdistr(force$F, 'weibull')
hist(force$F)
I am trying to understand
Thanks for reading Pete
To address your first item,
is a weibull distro really the best fit for this data ?
conceptually, this is more of a question about statistical inference rather than programming, so you most likely want to tackle that on CrossValidated rather than SO. However, you can certainly inquire about the means of investigating this programmatically, such as comparing the estimated density of the observed data to the theoretical density function or to the density function of random samples from a weibull distribution with your parameter estimates:
library(MASS)
##
Weibull <- read.csv(
"F:/Studio/MiscData/force_in_newtons.txt",
header=TRUE)
##
params <- fitdistr(Weibull$F, 'weibull')
##
Shape <- params[[1]][1]
Scale <- params[[1]][2]
##
set.seed(123)
plot(
density(
rweibull(
500,shape=Shape,scale=Scale)),
col="red",
lwd=2,lty=3,
main="")
##
lines(
density(
Weibull$F),
col="blue",
lty=3,lwd=2)
##
legend(
"topright",
legend=c(
"rweibull(n=500,...)",
"observed data"),
lty=c(3,3),
col=c("red","blue"),
lwd=c(3,3),
bty="n")
Of course, there are many other ways of assessing the fit of your model, this is just a quick sanity check.
As for your second question, you can use the pweibull
function with lower.tail=FALSE
to get probabilities from the theoretical survival function (S(x) = 1 - F(x)):
## Pr(X >= 60)
> pweibull(
60,shape=Shape,scale=Scale,
lower.tail=FALSE)
[1] 0.01268268
As for your final item, I believe that calculating confidence intervals on probabilities (as well as certain other statistical quantities) for an estimated distribution requires using the Delta method; I could be recalling incorrectly though, so you may want to double check on this. If this is the case and you aren't familiar with the Delta method, then unfortunately you will probably have to do a fair amount of reading on the subject because the calculation involved is generally non-trivial - here's another link; the Wikipedia article doesn't give a very in-depth treatment of the subject. Or, you could inquire about this on Cross Validated as well.