New R user. I have measured the color (hue) for a bunch of corporate logos. The number of observations for each logo can be different. My data is formatted like this:
Industry <- c("Fossil", "Fossil", "Fossil", "Fossil", "Fossil", "Renewable", "Renewable", "Renewable")
Logo <- c("Petrox", "Petrox", "Petrox", "Petrox", "Petrox", "Windo", "Windo", "Windo")
Hue <- c(36, 37, 43, 185, 190, 356, 310, 25)
df <- data.frame(Industry, Logo, Hue)
I've been trying to bin the df$Hue
variable for each logo in my sample, using cut()
.
# set up cut-off values
breaks <- c(0,45,90,135,180,225,270,315,360)
# specify interval/bin labels
labels <- c("[0-45)","[45-90)", "[90-135)", "[135-180)", "[180-225)", "[225-270)","[270-315)", "[315-360)")
I want to arrive at a data frame with one line per logo and one column per bin, which counts the number of times observations within an interval occurs for each logo, like this:
Ind | Logo | [0-45) | [45-90) | [90-135) | [135-180) | [180-225) | [225-270) | [270-315) | [315-360) |
---|---|---|---|---|---|---|---|---|---|
Fossil | Petrol | 3 | 0 | 0 | 0 | 2 | 0 | 0 | 0 |
Renewable | Wind | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 1 |
I've searched for good solutions, but so far without finding a useful answer. Is there a simple way I can subset()
or split()
with the cut()
function? My searches for solutions have so far gotten me nowhere. I'm sure it's a very simple thing I need.
You can use cut
to divide the data into categories, complete
the sequence and get data in wide format using pivot_wider
.
library(dplyr)
library(tidyr)
df %>%
count(Industry, Logo, Hue = cut(Hue, breaks, labels)) %>%
complete(Industry, Hue = labels, fill = list(n = 0)) %>%
fill(Logo) %>%
arrange(match(Hue, labels)) %>%
pivot_wider(names_from = Hue, values_from = n)
# Industry Logo `[0-45)` `[45-90)` `[90-135)` `[135-180)` `[180-225)` `[225-270)` `[270-315)` `[315-360)`
# <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 Fossil Petrox 3 0 0 0 2 0 0 0
#2 Renewable Windo 1 0 0 0 0 0 1 1