Search code examples
rtidyrmelt

Long form using colnames


Suppose I have the following data

 A <- c(4,4,4,4,4)
 B <- c(1,2,3,4,4)
 C <- c(1,2,4,4,4)
 D <- c(3,2,4,1,4)
 E <- c(4,4,4,4,5)

data <- data.frame(A,B,C,D,E)
data<- t(data)
colnames(data) = c("num1","freq1","freq2","freq3","totfreq")

> data
  num1 freq1 freq2 freq3 totfreq
A    4     4     4     4       4
B    1     2     3     4       4
C    1     2     4     4       4
D    3     2     4     1       4
E    4     4     4     4       5

I am trying to plot a grouped bar chart. The x-axis on both should be my variables A:E, and y is the values for freq1, freq2, freq3 for each letter. I also need to keep the capability to plot variables A:E by values in totfreq.

I know I need to convert to long form but I'm having trouble with how my data is set up. Somehow I need A, B, C, D, E need to stack into a column, another column that stacks freq1, freq2, freq3, totfreq, and then a last column with the values. Any advice how to accomplish this?

I'm looking to plot preferably in plotly, but ggplot would work too


Solution

  • First off, you have a matrix but probably want a data frame. Making it a tibble will drop the row names, which is where your letters are stored, so

    as.data.frame(data) %>% rownames_to_column("id")
    

    will get you a data frame with a column id of letters.

    You want to put this data into a long format by gathering all the freq columns. I'm then adding a column that gives the type of observation; this isn't necessary, but since you say you want to filter easily for one of two types—either the groups freq1, etc, or totfreq—this is a handy setup that I often use.

    library(tidyverse)
    
    A <- c(4,4,4,4,4)
    B <- c(1,2,3,4,4)
    C <- c(1,2,4,4,4)
    D <- c(3,2,4,1,4)
    E <- c(4,4,4,4,5)
    
    data <- data.frame(A,B,C,D,E)
    data<- t(data)
    colnames(data) = c("num1","freq1","freq2","freq3","totfreq")
    
    data_long <- as.data.frame(data) %>%
      rownames_to_column("id") %>%
      gather(key = var, value = value, freq1:totfreq) %>%
      mutate(type = ifelse(var == "totfreq", "total", "by_group"))
    
    head(data_long)
    #>   id num1   var value     type
    #> 1  A    4 freq1     4 by_group
    #> 2  B    1 freq1     2 by_group
    #> 3  C    1 freq1     2 by_group
    #> 4  D    3 freq1     2 by_group
    #> 5  E    4 freq1     4 by_group
    #> 6  A    4 freq2     4 by_group
    

    With the type column, it's really easy to filter by type for plotting. This would let you either pipe a filtered data frame into something like ggplot, or gives you a column to use for faceting or mapping onto an aesthetic.

    # for grouped bar chart
    data_long %>% filter(type == "by_group")
    #>    id num1   var value     type
    #> 1   A    4 freq1     4 by_group
    #> 2   B    1 freq1     2 by_group
    #> 3   C    1 freq1     2 by_group
    #> 4   D    3 freq1     2 by_group
    #> 5   E    4 freq1     4 by_group
    #> 6   A    4 freq2     4 by_group
    #> 7   B    1 freq2     3 by_group
    #> 8   C    1 freq2     4 by_group
    #> 9   D    3 freq2     4 by_group
    #> 10  E    4 freq2     4 by_group
    #> 11  A    4 freq3     4 by_group
    #> 12  B    1 freq3     4 by_group
    #> 13  C    1 freq3     4 by_group
    #> 14  D    3 freq3     1 by_group
    #> 15  E    4 freq3     4 by_group
    
    # for total freqs
    data_long %>% filter(type == "total")
    #>   id num1     var value  type
    #> 1  A    4 totfreq     4 total
    #> 2  B    1 totfreq     4 total
    #> 3  C    1 totfreq     4 total
    #> 4  D    3 totfreq     4 total
    #> 5  E    4 totfreq     5 total
    

    Created on 2018-05-17 by the reprex package (v0.2.0).