pretty new to R and i've come across a weird CSV my boss needs me to simplify for use.
I have a .csv that has multiple sets of data. Usually fine but each set of data is separated with a empty line and then the headers for the next set of data (the headers are always the same). Only problem is that there is no identifying column to let me just remove these rows. So for each set of data I need to remove the empty Row and Headers whilst giving each set a unique value in a new column. I hope this makes sense.
Honestly I've not got a clue handle this situation and I can't find any questions I could adapt. At least not at my current knowledge.
Any help will be greatly appreciated and might help me convince my boss to drop his mandraulic ways...
Edited because I missed the desired "set" column..
If your data is as you say it is from the picture you might be able to just... It worked for me on a small sample I built in excel made up to be a bit like yours. I'm assuming "Current" and "Desired" are not actually in the file and the first row starts with the header
library(dplyr)
yourdata <- read.csv("yourcsv.csv")
yourdata %>%
mutate(Set = cumsum(!duplicated(Location == "Location"))) %>%
filter(Location != "Location", Location != "")
#> Location Feature Height Volume Set
#> 1 U104 3 104 99 1
#> 2 U104 5 123 456 1
#> 3 U104 6 45 999 2
#> 4 U105 6 45 999 2
Based on data
structure(list(Location = c("U104", "U104", "", "Location", "U104",
"U105"), Feature = c("3", "5", "", "Feature", "6", "6"), Height = c("104",
"123", "", "Height", "45", "45"), Volume = c("99", "456", "",
"Volume", "999", "999")), class = "data.frame", row.names = c(NA,
-6L))