I have this matrix:
quimio = matrix(c(51,33,16,58,29,13,48,42,30,26,38,16),
nrow = 4, ncol = 3)
colnames(quimio) = c("Pouca", "Média", "Alta")
rownames(quimio) = c("Tipo I", "Tipo II", "Tipo III", "Tipo IV")
Which looks like this:
Pouca Média Alta
Tipo I 51 29 30
Tipo II 33 13 26
Tipo III 16 48 38
Tipo IV 58 42 16
I want to turn it into a tibble such that these row and column names are all dummy variables.
I wanted to make a bar chart and got this:
library(tidyverse)
tipo = c("Tipo I", "Tipo II", "Tipo III", "Tipo IV")
tipos = rep(tipo, 3)
quimiotb = as.tibble(quimio)
quimiotb = gather(quimiotb)
quimiotb$tipo = tipos
quimiotb = rename(quimiotb, reacao = key)
quimiotb$reacao = factor(quimiotb$reacao)
quimiotb$tipo = factor(quimiotb$tipo)
This is what I get:
A tibble: 12 x 3
reacao value tipo
<fct> <dbl> <fct>
1 Pouca 51 Tipo I
2 Pouca 33 Tipo II
3 Pouca 16 Tipo III
4 Pouca 58 Tipo IV
5 Média 29 Tipo I
6 Média 13 Tipo II
7 Média 48 Tipo III
8 Média 42 Tipo IV
9 Alta 30 Tipo I
10 Alta 26 Tipo II
11 Alta 38 Tipo III
12 Alta 16 Tipo IV
And while this is quite ok to use for a bar chart with ggplot2
I can't run any model on it - that would require that tipo
got spread into 4 columns and reacao
in 3. Right now this tibble's first line reads like "51 patients with Tipo I cancer had pouca reacao". I've thought about using spread()
but can't find the proper combination of arguments. Any help would be appreciated.
tl;dr
I need to tidy quimiotb
and don't know how
EDIT: Expected output should be something like this
A tibble: Y x 7
Pouca Media Alta Tipo I Tipo II Tipo III Tipo IV
<fct> <fct> <fct> <fct> <fct> <fct> <fct>
1 0 1 0 0 1 0 0
2 1 0 0 1 0 0 0
The modelling routines will create a model.matrix for you internally without you having to specify it so this should be sufficient.
as.data.frame.table(quimio)
model.matrix
can create a model matrix from that but you don't need it as seen in the code below.
Now you do things like:
DF <- as.data.frame.table(quimio)
fm0 <- lm(Freq ~ Var1, DF) # or maybe you want Var2?
fm1 <- lm(Freq ~ Var1 + Var2, DF)
anova(fm0, fm1) # compare
or look at the t tests of the coefficients of Var2
in the output of summary(fm1)
to see if they are significantly different from zero.
Or maybe you want to do a chi squared test on the original data
chisq.test(quimio)
Anyways there are many modelling functions in R and you now have the data in the form you need and can explore them.