I have some data: norm = the actual price
and norm2 = estimated price
. The actual price is something somebody is offering to sell an item for. I ran a Monte Carlo simulation to come up with say 100 estimated prices. I then plot these in the following histogram.
The orange line represents the actual (offering) price, the middle dashed-dotted line represents the mean of the 100 estimated prices from a model and the two lines either side are the 1 standard deviation lines.
I know that there will be a 68% chance that the product will be sold for a price between the 2 standard deviation lines. So, I can say that in this particular case there will be a greater than 68% probability that buying this product will turn a profit (since the orange line is below and outside the 1 standard deviation line).
I want to calculate the overall probability that this product will be profitable (i.e. everything greater than the orange line).
R code:
library(ggplot2)
library(tidyverse)
normalDist = data.frame(
norm = rnorm(1000, mean = 1, sd = 2),
norm2 = rnorm(1000, mean = 0.9, sd = 2)
)
mean = mean(normalDist$norm)
sd = sd(normalDist$norm)
norm2 = normalDist$norm2 %>% sample(1)
normalDist %>%
ggplot() +
geom_histogram(aes(x = norm), bins = 20, fill = "skyblue", color = "black") +
geom_vline(xintercept = c(norm2, mean, mean - sd, mean + sd),
linetype = c("solid", "dotdash", "longdash", "longdash"),
size = c(2, 1, 1, 1),
color = c("darkorange", "darkgreen", "darkred", "darkred")) +
theme_bw()
@GregorThomas (in the comments) is obviously right if the distribution is known to be normal. However, if the distribution is known to be normal (or any particular function form), what's the point of doing a simulation? If your simulation is more general and generates a distribution whose form is unknown, just calculate the average of the expression norm2 > normalDist$norm
. Here's an example with the analytical probabilities and the estimate probabilities from the simulation:
library(dplyr)
set.seed(1234)
normalDist = data.frame(
norm = rnorm(1000, mean = 1, sd = 2),
norm2 = rnorm(1000, mean = 0.9, sd = 2)
)
mean = mean(normalDist$norm)
sd = sd(normalDist$norm)
norm2 = normalDist$norm2 %>% sample(1)
## Calculated from the normal CDF
pnorm(norm2, mean, sd, lower.tail=FALSE)
#> [1] 0.8734893
## Estimated from the Simulation
mean(normalDist$norm > norm2)
#> [1] 0.888
Created on 2023-09-27 with reprex v2.0.2