Search code examples
rdata-sciencedata-analysis

Cross table in R to find the relationship of two variables


I am trying to form a cross table for two items in my data frame, but they are not conveniently laid in two columns, rather they are elements inside the columns that have to be filtered out to continue with the crosstables.

e.g.

column titles: Gender, Favourite Fruit
column 1: F,M,M,M,F,M,F,M,M,F
column 2: apple, pear, pear, grapes, apple, banana, peach, apple, pear, grapes

I would like to make a cross-table for female and apple, to see if there is a relationship. How should I go about doing this?

Thank you! Emmy


Solution

  • There are lots of ways to do this, but the workhorse is the table() function.

    Here is some fake data:

    set.seed(123)
    df <- data.frame(gender = sample(c("M", "F"), 1000, replace = T ),
                     fruit = sample(c("apple", "grapes", "banana", "pear"), 1000, replace = T) ) 
    

    The table() function is a great way to create cross tabulations. For example:

    table(df)
          fruit
    gender apple banana grapes pear
         F   134    122    128  109
         M   114    131    127  135
    

    You can do a lot with this function. To get something like what you want, you can do do some create your named logical vector right in the arguments of the function.

    table(Female = df$gender == "F", Apple = df$fruit == "apple")
           Apple
    Female  FALSE TRUE
      FALSE   393  114
      TRUE    359  134