Search code examples
rstatisticsdata-scienceprobability

R notation for declaring vectors - Axioms of Probability


We have this homework assignment.

Problem 1: Suppose we are interested in the buying habits of shoppers at a particular grocery store with regards to whether they purchase apples, milk, and/or bread. Now suppose 30% of all shoppers at this particular grocery store buy apples, 45% buy milk, and 40% buy a loaf of bread. Let 𝐴 be the event that a randomly selected shopper buys apples, 𝐵 be the event that the same randomly selected shopper buys milk, and 𝐶 the event that they buy bread. Suppose we also know (from data collected) the following information:

The probability that the shopper buys apples and milk is 0.20. The probability that the shopper buys milk and bread is 0.25. The probability that the shopper buys apples and bread is 0.12. The probability that the shopper buys all three items is 0.07. Use this information to answer the following questions.

a) For our purposes, we will use a numeric representation for each event. For example, (010) would be an event in the sample space where a zero in the first place represents no apples were bought while a 1 means they were. Similarly, the second place is the presence of milk and the third place of bread. The example given (010) represents the purchase of milk but not apples or bread.

Insert into vector S the events that belong to the sample space. Then insert the events from the sample space that would correspond to 𝐴 occuring into vector A. Repeat this with vector B for 𝐵 and vector C for 𝐶.

According to set notation, the sample space should look like this: 𝑆={…}. However, due to data storage syntax in R, we will be storing these events in vectors. For example, for some arbitrary event 𝑊 would be stored as follows 𝑊=𝑐(010). c() is a command we can use to construct a vector where commas separate each entry. Complete the code below. Do not worry about the order in which events are placed in vector.

My answer: (Obviously wrong)

S = c(111) A = c(100) B = c(010) C = c(001)

To clarify my answer (As specified above):

S (Sample space) A (Apples) B (Milk) C (Bread)

Is anyone able to assist me with how we are meant to declare these variables in R?

I am new to this language and cannot seem to figure out the correct notation for this question.

Thanks.


Solution

  • Admittedly the question and instructions are incredibly unclear, but here's a shot:

    There are 3 possible events. However, multiple events can happen simultaneously. The sample space is the set of all possible combinations of events that (do not) happen. Each element of the sample space is a 3-digit scalar (i.e., a 3-digit numeric value) ### where each # can be either 1 or 0.

    To get the sample space, you'd want something like this:

    # Create sample space vector
    S <- c(111, 110, 101, 011, 100, 001, 010, 000)
    

    Once you have the sample space, you're supposed to pull out the elements of the sample space (subspaces, technically) and assign them to other vectors. It looks like they want you to not simply create a new vector but rather to use the existing vector to do so. Accordingly, that would look something like this:

    # Create sample space vector
    S <- c(111, 110, 101, 011, 100, 001, 010, 000)
    
    # Create new vectors by removing elements of the sample where the event does not happen from the sample space vector and assigning it to a new vector.
    A <- S[c(1,2,3,5)]
    B <- S[c[1,2,4,7)]
    C <- S[c(1,3,4,6)]
    
    # This is equivalent to e.g. 
    A <- c(111, 110, 101, 100)
    

    Here's what's happening:

    • Everything to the right of <- constructs a vector. You construct vectors by adding values (numeric values like 039498 or string/character values like "foo") between the parentheses in c(). The commas indicates that the values represent distinct elements of the vector. Think of the vector with K elements as a train with K total train cars where each car holds a value/person. It's all one train/vector (the length of the train/vector is K), but each element of the train is a single value/train car that might have any number of digits/persons. We can find out how many cars/elements are in our vector using length() per below:
    S <- c(111, 110, 101, 011, 100, 001, 010, 000)
    length(S)
    
    • The <- is telling R to take the vector/train you just created and store it in an object S. Any time you want to look at or do something with (e.g., add or multiply) the elements of the vector/cars, you can work with the S object to do so. For example, we can now pull out the first element of S and print it to see that it's the same as the first element we included within c().

    • The S[] extract any elements of S for which the index is specified. In other words. If we're going to extract multiple elements of the vector...well, we need to put those indices in the vector as well! This is why you see c() within that code.

    # Print the first element of the vector S
    S <- c(111, 110, 101, 011, 100, 001, 010, 000)
    S[1]
    
    # Let's add 8 to it now!
    S[1] + 8