First, see this url for a small example of the type of data I am formatting. You'll note I have highlighted two areas in the worksheet that I am trying to select. The selection range needs to be dynamic as projects move in and out of the datasets. For the first part, I think this code will suffice:
library(tidyverse)
library(readxl)
filename <- "MyDataset.xlsx"
#obtain first section of my excel spreadsheet
project_codes <- read_excel(
path = filename,
sheet = "Jan18",
range = "A10:B1000",
col_names = c("proj_num", "name")
) %>%
drop_na() %>%
filter(grepl("-", project_codes$proj_num))
The second section is where I am getting tripped up... I want to ensure that I have selected the exact same subset of rows as my 'project_codes' in the other highlighted area of the spreadsheet.
I have many worksheets formatted in this exact same way (and naming convention is consistent - Jan18, Feb18, Mar18), so bonus points if someone can assist me in iterating through the worksheets after solving part 1.
library(tidyverse)
library(readxl)
filename <- "MyDataset.xlsx"
excel_sheets(filename) %>%
set_names() %>%
map_df(~ read_excel(
path = filename,
sheet = .,
range = "A6:N1000"
) %>%
filter(str_detect(`#`, "-")) %>%
select(`#`, `PROJECT NAME`, `Status`, `Cumulative Billings`, `Billing Adjustment`, `Contract Value`), .id = "Month")
# A tibble: 8 x 7
Month `#` `PROJECT NAME` Status `Cumulative Billings` `Billing Adjustment` `Contract Value`
<chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 Jan18 1-11-0010 Project 1 715 6723 2138 1977
2 Jan18 1-11-0011 Project 2 1717 8330 8283 2588
3 Jan18 1-11-0012 Project 3 3332 2908 4938 7734
4 Jan18 1-11-0013 Project 4 2589 8714 6034 1476
5 Jan18 1-11-0014 Project 5 588 4969 2161 3334
6 Jan18 1-11-0015 Project 6 820 7688 4243 4293
7 Jan18 1-11-0020 Project 20 7287 333 9100 3078
8 Jan18 1-11-0030 Project 30 1564 487 7249 5508
map_df()
will iterate through each sheet (whose names are extracted using excel_sheets()
) and create one data frame with a column indicating which sheet the data comes from.
I had to rely on this answer to apply the name of the sheet as its own column.