Search code examples
rvisualizationdata-manipulation

R: Guessing the "Format" of a Dataset?


I am working with the R programming language.

I am trying to follow this tutorial here https://rpubs.com/chidungkt/505486, but the dataset required for this tutorial appears to be no longer available. Therefore, I am trying to "guess" the format of the dataset and try to simulate a fake dataset in a similar format - thus allowing me to continue the tutorial.

I spent some time trying to analyze the structure of the code and tried to infer the format of the dataset - this is what I came up with:

Age = c("0-10", "0-10", "11-20", "11-20", "21-30", "21-30", "31-40", "31-40", "41-50", "41-50", "51-60", "51-60")

Gender = c("M", "F", "M", "F", "M", "F", "M", "F", "M", "F", "M", "F")

Value = as.integer(rnorm(12, 100,10))

vn_2018_pop = data.frame(Age, Gender, Value)

     Age Gender Value
1   0-10      M   125
2   0-10      F   103
3  11-20      M    84
4  11-20      F   105
5  21-30      M    96
6  21-30      F    88
7  31-40      M    88
8  31-40      F   120
9  41-50      M   106
10 41-50      F   118
11 51-60      M   105
12 51-60      F   112

Based on this dataset, I tried to run the R code from the tutorial:

# Load some packages for scrapping data and data manipulation: 
library(rvest)
library(magrittr)
library(tidyverse)
library(extrafont)

my_colors <- c("#2E74C0", "#CB454A")
my_font <- "Roboto Condensed"

vn_2018_pop %>% ggplot(aes(Age, Value, fill = Gender)) + 
  geom_col(position = "stack") + 
  coord_flip() + 
  scale_y_continuous(breaks = seq(-5000000, 5000000, 1000000), 
                     limits = c(-5000000, 5000000), 
                     labels = c(paste0(seq(5, 0, -1), "M"), paste0(1:5, "M"))) + 
  theme_minimal() + 
  scale_fill_manual(values = my_colors, name = "", labels = c("Female", "Male")) + 
  guides(fill = guide_legend(reverse = TRUE)) + 
  theme(panel.grid.major.x = element_line(linetype = "dotted", size = 0.2, color = "grey40")) + 
  theme(panel.grid.major.y = element_blank()) + 
  theme(panel.grid.minor.y = element_blank()) + 
  theme(panel.grid.minor.x = element_blank()) + 
  theme(legend.position = "top") + 
  theme(plot.title = element_text(family = my_font, size = 28)) + 
  theme(plot.subtitle = element_text(family = my_font, size = 13, color = "gray40")) + 
  theme(plot.caption = element_text(family = my_font, size = 12, colour = "grey40", face = "italic")) + 
  theme(plot.margin = unit(c(1.2, 1.2, 1.2, 1.2), "cm")) + 
  theme(axis.text = element_text(size = 13, family = my_font)) + 
  theme(legend.text = element_text(size = 12, face = "bold", color = "grey30", family = my_font)) + 
  labs(x = NULL, y = NULL, 
       title = "Population Pyramids of Vietnam in 2018",
       subtitle = "A population pyramid illustrates the age-sex structure of a country's population and may provide insights about\npolitical and social stability, as well as economic development. Countries with young populations need to\ninvest more in schools, while countries with older populations need to invest more in the health sector.",
       caption = "Data Source: https://www.census.gov")

The code seems to run - but an empty plot is produced:

enter image description here

Can someone please show me what I am doing wrong and what I can do to fix this problem?

Thanks!


Solution

  • Fix the scale of your data by multiplying the values by e.g. 4e4 and make the values for males negative:

    library(tidyverse)
    library(extrafont)
    
    set.seed(123)
    
    vn_2018_pop$Value <- 4e4 * vn_2018_pop$Value
    vn_2018_pop$Value[vn_2018_pop$Gender == "M"] <- -vn_2018_pop$Value[vn_2018_pop$Gender == "M"]
    
    my_colors <- c("#2E74C0", "#CB454A")
    my_font <- "Roboto Condensed"
    
    vn_2018_pop %>% ggplot(aes(Age, Value, fill = Gender)) +
      geom_col(position = "stack") +
      coord_flip() +
      scale_y_continuous(
        breaks = seq(-5000000, 5000000, 1000000),
        limits = c(-5000000, 5000000),
        labels = c(paste0(seq(5, 0, -1), "M"), paste0(1:5, "M"))
      ) +
      theme_minimal() +
      scale_fill_manual(values = my_colors, name = "", labels = c("Female", "Male")) +
      guides(fill = guide_legend(reverse = TRUE)) +
      theme(panel.grid.major.x = element_line(linetype = "dotted", size = 0.2, color = "grey40")) +
      theme(panel.grid.major.y = element_blank()) +
      theme(panel.grid.minor.y = element_blank()) +
      theme(panel.grid.minor.x = element_blank()) +
      theme(legend.position = "top") +
      theme(plot.title = element_text(family = my_font, size = 28)) +
      theme(plot.subtitle = element_text(family = my_font, size = 13, color = "gray40")) +
      theme(plot.caption = element_text(family = my_font, size = 12, colour = "grey40", face = "italic")) +
      theme(plot.margin = unit(c(1.2, 1.2, 1.2, 1.2), "cm")) +
      theme(axis.text = element_text(size = 13, family = my_font)) +
      theme(legend.text = element_text(size = 12, face = "bold", color = "grey30", family = my_font)) +
      labs(
        x = NULL, y = NULL,
        title = "Population Pyramids of Vietnam in 2018",
        subtitle = "A population pyramid illustrates the age-sex structure of a country's population and may provide insights about\npolitical and social stability, as well as economic development. Countries with young populations need to\ninvest more in schools, while countries with older populations need to invest more in the health sector.",
        caption = "Data Source: https://www.census.gov"
      )
    

    enter image description here