Search code examples
rloopssubsetcategoriesbinning

How do I bin a variable across a number of observations for each specimen?


New R user. I have measured the color (hue) for a bunch of corporate logos. The number of observations for each logo can be different. My data is formatted like this:

Industry <- c("Fossil", "Fossil", "Fossil", "Fossil", "Fossil", "Renewable", "Renewable", "Renewable")
Logo <- c("Petrox", "Petrox", "Petrox", "Petrox", "Petrox", "Windo", "Windo", "Windo")
Hue <- c(36, 37, 43, 185, 190, 356, 310, 25)
df <- data.frame(Industry, Logo, Hue)

I've been trying to bin the df$Hue variable for each logo in my sample, using cut().

# set up cut-off values 
breaks <- c(0,45,90,135,180,225,270,315,360)

# specify interval/bin labels
labels <- c("[0-45)","[45-90)", "[90-135)", "[135-180)", "[180-225)", "[225-270)","[270-315)", "[315-360)")

I want to arrive at a data frame with one line per logo and one column per bin, which counts the number of times observations within an interval occurs for each logo, like this:

Ind Logo [0-45) [45-90) [90-135) [135-180) [180-225) [225-270) [270-315) [315-360)
Fossil Petrol 3 0 0 0 2 0 0 0
Renewable Wind 1 0 0 0 0 0 1 1

I've searched for good solutions, but so far without finding a useful answer. Is there a simple way I can subset() or split() with the cut() function? My searches for solutions have so far gotten me nowhere. I'm sure it's a very simple thing I need.


Solution

  • You can use cut to divide the data into categories, complete the sequence and get data in wide format using pivot_wider.

    library(dplyr)  
    library(tidyr)
    
      
    df %>%
      count(Industry, Logo, Hue = cut(Hue, breaks, labels)) %>%
      complete(Industry, Hue = labels, fill = list(n = 0)) %>%
      fill(Logo) %>%
      arrange(match(Hue, labels)) %>%
      pivot_wider(names_from = Hue, values_from = n)
    
    #   Industry  Logo   `[0-45)` `[45-90)` `[90-135)` `[135-180)` `[180-225)` `[225-270)` `[270-315)` `[315-360)`
    #  <chr>     <chr>     <dbl>     <dbl>      <dbl>       <dbl>       <dbl>       <dbl>       <dbl>       <dbl>
    #1 Fossil    Petrox        3         0          0           0           2           0           0           0
    #2 Renewable Windo         1         0          0           0           0           0           1           1