I have a data frame which looks like this
df
Country Year col3. col4. col5
USA2018 10 50 13 NA
UK 2018 4 12 6 NA
China 2018 15 4 1
Malta NA 2018 25 8
I would like to split the strings of "Country" column by the pattern "2018" for the rows where 2018 is merged into the first column and shifted to the Year column for the rows where the Year is NA and have this output:
df
Country Year col3. col4. col5
USA 2018 10 50 13
UK 2018 4 12 6
China 2018 15 4 1
Malta 2018 25 8 NA
Any suggestion?
Edit: This data is result of PDF scraping.Link to PDF, and the code below:
# install.packages("pdftools")
# install.packages("readr")
library(pdftools)
library(readr)
epi <- pdf_text("malaria_epi.pdf")
epi_df <- epi %>%
read_lines() %>%
grep('^\\s{2}\\w', ., value = TRUE) %>%
paste(collapse = '\n') %>% read_fwf(fwf_empty(.))
Here is a solution. It was a bit tricky, but I think it cover your case. This solution could be problematic if you have NA in the middle of a row, but I didn't manage to find a better way yet.
df <- read.table(header=TRUE,
text="
Country Year col3. col4. col5
USA2018 10 50 13 NA
UK2018 4 12 6 NA
China 2018 15 4 1
Malta NA 2018 25 8")
tmpN <- names(df) # save the colnames
df = cbind(df[,1],df) # duplicate the first column
df[,c(1,2)] <- lapply(df[,c(1,2)], as.character)
df[,1] = sub('[[:digit:]]+','',df[,1]) # remove date in first column
df[,2] = sub('[[:alpha:]]+','',df[,2]) # remove city in second column
df[df==''] <- NA # replace empty cells with NA
# push all NA to the right side
df2 = as.data.frame(t(apply(df,1, function(x) { return(c(x[!is.na(x)],x[is.na(x)]) )} )))
df2 <- df2[,!(colSums(is.na(df2))==nrow(df2))] # remove column full of NA
colnames(df2) <- tmpN # replace colnames