Search code examples
rggplot2dplyrlubridatesatellite

Dplyr Summarise on Multiple Conditions


So I am working on some data visualisations of satellite populations, I would like to display for each month and altitude, how many satellites were launched into each orbit.

I have a list (snippet below, which is fake data). I am trying to create a bubble plot, where for each date (grouped by month) on the X axis, the altidude (grouped by 100km) and then the size of the bubble would be the count (shown in the second table).

COSPAR_ID LAUNCH_DATE ALTITUDE
IRIDIUM 180 2019-01-01 1150
IRIDIUM 176 2019-02-01 1250
RISESAT 2019-04-06 1150
RAPIS-1 2019-03-01 1375
MICRODRAGON 2019-05-01 400
NEXUS (FO-99) 2019-04-01 459
ALE-1 2019-05-01 1000
IRIDIUM 167 2019-04-01 900
IRIDIUM GSAT-31 2019-0-01 666
IRIDIUM 188 2019-06-01 1000
IRIDIUM 111 2019-02-01 1250
IRIDIUM 123 2019-01-01 1150
LAUNCH_DATE ALTITUDE COUNT
Jan-19 0-500 10
Jan-19 500-1000 100
Jan-19 1000-1500 150
Feb-19 0-500 20
Feb-19 500-1000 90
Feb-19 1000-1500 150

So far, I am getting quite lost. I am using dplyr to be able to summarise first by the month, and then starting to count altitudes.

df <- df %>% 
  group_by(month = lubridate::floor_date(LAUNCH_DATE, 'month')) %>%
  summarize(sum = sum(count), 
            sumA = n(ALTITUDE < 100))

My next steps would be to group the altidudes first, then summarise by the date? I am hitting a brick wall. So not sure where to go next? Can anyone point me in the right direction?

Happy to add the original dataset of satellites, just quite a large file.


Solution

  • You can use cut() to make your bins, defining your breaks and labels beforehand:

    library(dplyr)
    library(lubridate)
    
    alti_breaks <- seq(0, by = 500, length.out = ceiling(df$ALTITUDE / 500) + 1)
    alti_labs <- paste(head(alti_breaks, -1), tail(alti_breaks, -1), sep = "-")
    
    df <- df %>%
      count(
        LAUNCH_DATE = floor_date(LAUNCH_DATE, 'month'),
        ALTITUDE = cut(ALTITUDE, alti_breaks, alti_labs),
        name = "COUNT"
      )
    
    df
    
    # A tibble: 9 × 3
      LAUNCH_DATE ALTITUDE  COUNT
      <date>      <fct>     <int>
    1 2019-01-01  1000-1500     2
    2 2019-02-01  1000-1500     2
    3 2019-03-01  1000-1500     1
    4 2019-04-01  0-500         1
    5 2019-04-01  500-1000      1
    6 2019-04-01  1000-1500     1
    7 2019-05-01  0-500         1
    8 2019-05-01  500-1000      2
    9 2019-06-01  500-1000      1
    

    And the bubble plot:

    library(ggplot2)
    
    ggplot(df, aes(LAUNCH_DATE, ALTITUDE)) +
      geom_point(aes(size = COUNT), color = blues9[[6]], show.legend = FALSE) +
      theme_minimal() +
      theme(panel.grid.minor.x = element_blank())