Search code examples
rstatisticsregressioncorrelationcontingency

statistics contingency R


I've got two vectors which are TRUE or FALSE. Basically data on households and whether they own a car and whether they have a gold watch. (Note, "car" and "gold watch" are not the actual categories, but they're effective substitutes for this question).

I want to find out the relationships between car ownership and watch ownership and could use some advice for both the stats and the R in terms of which functions to use.

The idea is to be able to say: "If someone has a car, we can say with 95% confidence that there is a 25% chance they have a gold watch"

I've been messing with Cross.Table and assocscats and basically got myself totally confused for what I think is a standard stats question.

Any quick insights into which tests/functions should be used? I've got a correlation of .265, but want to quantify the confidence.

I've looked around a bunch including at: How do I get a contingency table? Contingency table on logistic regression in R with missing fitted values

Thanks!!


Solution

  • you'd be looking to do a logit / probit regression. Look-up on the usage of glm (short hand for general linear models). Within this class of models, you'd need to specify the family as binomial with a link to probit / logit. Type ?glm, ?family to read descriptions on these functions. They handle missing data with the na.action parameter, which may be set to na.pass. The confidence would be estimated coefficient +- standard error of coefficient * critical value