Search code examples
rdataframeconditional-statementsdistributionsample

Data frames using conditional probabilities to extract a certain range of values


I would like some help answering the following question:

Dr Barchan makes 600 independent recordings of Eric’s coordinates (X, Y, Z), selects the cases where X ∈ (0.45, 0.55), and draws a histogram of the Y values for these cases.

By construction, these values of Y follow the conditional distribution of Y given X ∈ (0.45,0.55). Use your function sample3d to mimic this process and draw the resulting histogram. How many samples of Y are displayed in this histogram?

We can argue that the conditional distribution of Y given X ∈ (0.45, 0.55) approximates the conditional distribution of Y given X = 0.5 — and this approximation is improved if we make the interval of X values smaller.

Repeat the above simulations selecting cases where X ∈ (0.5 − δ, 0.5 + δ), using a suitably chosen δ and a large enough sample size to give a reliable picture of the conditional distribution of Y given X = 0.5.

I know for the first paragraph we want to have the values generated for x,y,z we got in sample3d(600) and then restrict the x's to being in the range 0.45-0.55, is there a way to code (maybe an if function) that would allow me to keep values of x in this range but discard all the x's from the 600 generated not in the range? Also does anyone have any hints for the conditional probability bit in the third paragraph.

sample3d = function(n)
{
  df = data.frame() 

  while(n>0)
  {
    X = runif(1,-1,1) 
    Y = runif(1,-1,1)
    Z = runif(1,-1,1)
    a = X^2 + Y^2 + Z^2 

    if( a < 1 ) 

    {
      b = (X^2+Y^2+Z^2)^(0.5) 

      vector = data.frame(X = X/b, Y = Y/b, Z = Z/b) 
      df = rbind(vector,df)
      n = n- 1
    }
  }
  df
}
sample3d(n)

Any help would be appreciated, thank you.


Solution

  • Your function produces a data frame. The part of the question that asks you to find those values in a data frame that are in a given range can be solved by filtering the data frame. Notice that you're looking for a closed interval (the values aren't included).

    df <- sample3d(600)
    df[df$X > 0.45 & df$X < 0.55,]
    

    Pay attention to the comma.

    You can use a dplyr solution as well, but don't use the helper between(), since it will look at an open interval (you need a closed interval).

    filter(df, X > 0.45 & X < 0.55)
    

    For the remainder of your assignment, see what you can figure out and if you run into a specific problem, stack overflow can help you.