Search code examples
rglmcontingency

How convert contingency tables (counts) to individuals for GLM


I have my info as in this photograph:

enter image description here

You can download it here: https://drive.google.com/file/d/1pgO51NXtjpVSz-VxQEDNFFuQXVc4jVkt/view?usp=sharing

What i want is to transform this data to individuals,

For example

enter image description here

Will transform into this

enter image description here

Another example

enter image description here

will turn into this

enter image description here

So, if we say that n="sum of all numbers in the original data.frame", i.e., the number of all individuals, the final output will be a data.frame with 6 columns and n rows.

I want to do this in R but i don't have any idea how. Once I have this, what i want to do is apply a generalized linear model with family binomial and link = probit.

Now, this page can explain some of what I tried to do:

https://www.datanalytics.com/libro_r/la-funcion-melt-y-datos-en-formato-largo.html


Solution

  • Okay... I have an answer, but... i was wondering if there exists any generalization. Here it goes:

    library(readxl)
    library(dplyr)
    
    # Información original ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    byssinosis <- read_xls(path = "byssinosis.xls",range = "B4:K27",col_names = F)
    names(byssinosis) <- c("Employment","Smoking","Sex","Race",
                           "W1y","W1n","W2y","W2n","W3y","W3n")
    # View(byssinosis)
    
    # Procesando la información a individuos ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    # Primero pasamos las columnas a una sola.
    datos <- reshape2::melt(byssinosis)
    # Separamos estas columnas en las dos características deseadas.
    datos <- datos %>%
      mutate(Workplace = ifelse(variable %in% c("W1y", "W1n"),1,
                                ifelse(variable %in% c("W2y", "W2n"),2,3)),
             Byssinosis = ifelse(variable %in% c("W1y", "W2y", "W3y"),"yes","no"))
    # Repetimos con base en value.
    individuos=rep(seq_len(nrow(datos)),datos$value)
    datos <- datos[individuos,]
    # Nos quedamos solo las columnas deseadas
    datos <- datos %>% select(-c(variable,value))
    # View(datos)
    
    # Comprobación ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    tabla <-
      table(datos) %>%
      as.data.frame() %>%  
      arrange(Employment, desc(Smoking), desc(Sex), desc(Race), Workplace, desc(Byssinosis))
    # View(tabla)