I am using Energy Performance Certificate Data to identify the heating fuel types of buildings in an area however, they are split into over 60 different subsets of 9 main fuel types. I would like add another column for fuel type so that they can be grouped by the 9 main fuel types.
An example of the relevant columns of the data is:
BuildingID <- c(1,2,3,4,5,6,7,8,9,10)
MainHeatDesc <- c("Boiler and radiators, mains gas", "Boiler and radiators, oil", "Room heaters, electric", "Room heaters, LPG", "Air source heat pump, underfloor heating, electric", "Air source heat pump, fan coil units, electric", "Ground source heat pump, mains gas", "Electric storage heaters", "Room heaters, wood logs", "Boilers and radiators, wood chips")
data <- data.frame(BuildingID, MainHeatDesc)
This is a small example taking some of the subsets of the original data. In this example I would like to create another column for main fuel type grouping them as: Mains gas, Oil, Electric, LPG and wood.
The end result should look like this:
# BuildingID MainHeatDesc MainFuelType
# 1 Boiler and radiators, mains gas Mains gas
# 2 Boiler and radiators, oil Oil
# 3 Room heaters, electric Electric
# 4 Room heaters, LPG LPG
# 5 Air source heat pump, underfloor heating, electric Electric
# 6 Air source heat pump, fan coil units, electric Electric
# 7 Ground source heat pump, mains gas Mains Gas
# 8 Electric storage heaters Electric
# 9 Room heaters, wood logs Wood
# 10 Boilers and radiators, wood chips Wood
If anyone is able to help me out it would be greatly appreciated. Let me know if you have any queries or need any more information.
Thanks!
Similar logic as @tmfmnk but in base R using sub
.
types <- c('Mains Gas', 'Oil', 'Electric', 'LPG', 'Wood')
data$MainFuelType <- sub(paste0(".*(?i)(", paste0("\\b", types, "\\b",
collapse = "|"), ").*"), "\\1", data$MainHeatDesc)
data
# BuildingID MainHeatDesc MainFuelType
#1 1 Boiler and radiators, mains gas mains gas
#2 2 Boiler and radiators, oil oil
#3 3 Room heaters, electric electric
#4 4 Room heaters, LPG LPG
#5 5 Air source heat pump, underfloor heating, electric electric
#6 6 Air source heat pump, fan coil units, electric electric
#7 7 Ground source heat pump, mains gas mains gas
#8 8 Electric storage heaters Electric
#9 9 Room heaters, wood logs wood
#10 10 Boilers and radiators, wood chips wood
where the regex which is generated dynamically looks like :
paste0(".*(?i)(", paste0("\\b", types, "\\b", collapse = "|"), ").*")
#[1] ".*(?i)(\\bMains Gas\\b|\\bOil\\b|\\bElectric\\b|\\bLPG\\b|\\bWood\\b).*"
where (?i)
is for case-insensitivity.