Search code examples
rcontingency

Why are there redundant rows and columns in my contingency table?


I am a novice to R, and I am currently learning contingency table. I want to create a contingency table using the data from "loans_full_schema"(from openintro) with the "application_type" and "homeownership" datas. Below is my code.

library(oibiostat)

data("loans_full_schema")

tab <- table(loans_full_schema$application_type, loans_full_schema$homeownership)
tab

And my outcome is my outcome

Yet, I want to be able to get the outcome as below wanted outcome
So my question is why are there a "Any" column and a blank row in my outcome?


Solution

  • That is because there are empty levels in the data.

    levels(loans_full_schema$homeownership)
    #[1] ""         "ANY"      "MORTGAGE" "OWN"      "RENT"    
    
    levels(loans_full_schema$application_type)
    #[1] ""           "individual" "joint"     
    

    You can drop them with droplevels.

    loans_full_schema <- droplevels(loans_full_schema) 
    
    table(loans_full_schema$application_type, loans_full_schema$homeownership)
                
    #             MORTGAGE  OWN RENT
    #  individual     3839 1170 3496
    #  joint           950  183  362
    

    You may use addmargins to add the totals.

    addmargins(table(loans_full_schema$application_type, loans_full_schema$homeownership))
    
    #             MORTGAGE   OWN  RENT   Sum
    #  individual     3839  1170  3496  8505
    #  joint           950   183   362  1495
    #  Sum            4789  1353  3858 10000