I am using the url link to download this dataset:
https://files.hawaii.gov/dbedt/census/census_2020/data/redistricting/PLtable1_2020-county.xlsx
So in R I am coding it as:
url_dbedt_dicennial <- "https://files.hawaii.gov/dbedt/census/census_2020/data/redistricting/PLtable1_2020-county.xlsx"
# download the xls to a temporary file
temp <- tempfile(fileext = ".xlsx")
download.file(url = url_dbedt_dicennial, destfile = temp, mode = "wb")
# data from dbedt dicennial (look at each step to understand)
data_in_dbedt_dicennial <- temp %>%
readxl::read_excel(
range = cellranger::as.cell_limits("A6:H15"),) %>%
t() %>%
The generated output is the following:
What I am struggling right now after transpose is to how relabel the columns as "time", "HI", "HON", "HAW", "KAU", "MAU" and then to eliminate V1, V3, V8, and V9. I know I can eliminate columns manually one-by-one but there is a clever way of doing it? County should be relabeled as time.
Eventually I want to use the mutate function for the time variable, that is,
mutate(time)
and convert the data into time series with
tsbox::ts_long()
State of Hawaii should be labeled as "HI", Hawaii County as "HAW", City and County of Honolulu as "HON", Kauai County as "KAU", and Maui County 1/ as "MAU"
So this turned out to be a little more complicated than I first thought, in part because of t()
, which is really designed to work with matrices. Fortunately, I was able to find some guidance elsewhere on SO, where I found transpose_df()
. Though this works, I imagine this could be cleaned up a bit.
data_in_dbedt_dicennial <- temp %>%
readxl::read_excel(
range = cellranger::as.cell_limits("A6:H15"),) %>%
na.omit()
transpose_df <- function(df) {
t_df <- data.table::transpose(df)
colnames(t_df) <- rownames(df)
rownames(t_df) <- colnames(df)
t_df <- t_df %>%
tibble::rownames_to_column(.data = .) %>%
tibble::as_tibble(.)
return(t_df)
}
data_in_dbedt_dicennial <- transpose_df(data_in_dbedt_dicennial) %>%
.[-1,] %>%
rename(
Year = rowname, HI = `1`, HAW = `2`,
HON = `3`, KAU = `4`, MAU = `5`
) %>%
mutate(across(everything(), as.integer))
Output:
# A tibble: 7 × 6 Year HI HAW HON KAU MAU 1 1960 632772 61332 500409 28176 42855 2 1970 769913 63468 630528 29761 46156 3 1980 964691 92053 762565 39082 70991 4 1990 1108229 120317 836231 51177 100504 5 2000 1211537 148677 876156 58463 128241 6 2010 1360301 185079 953207 67091 154924 7 2020 1455271 200629 1016508 73298 164836