Search code examples
rtidyverseunit-conversion

Convert character variable of height (5ft6in) into numeric variable in inches?


I would like to convert a height variable I have from character type to numeric. for context, this is so I can use the values to calculate body mass index.

Looking at the below example data frame, I would like to convert Height_1 into Height_2 (whereby Height_2 is in inches):

# Height_1    Height_2
# 5ft6in      66 
# XftXin      XXXX
# XftXin      XXXX 
# XftXin      XXXX
# XftXin      XXXX

I have tried a few things using the "tidyverse" and "measurements" packages but have not been able to create a variable like Height_2 above. For example:

library(dplyr)
library(tidyr)

df %>%
  separate(Height_1,c('feet', 'inches'), sep = 'ft', convert = TRUE, remove = FALSE) %>%
  mutate(Height_2 = 12*feet + inches)

I think this is because the above doesn't address the fact that there is "in" at the end of the values.


Solution

  • You can use regex to extract feet and inches data from Height_1 and then perform the calculation.

    library(dplyr)
    library(tidyr)
    
    df %>%
      extract(Height_1, c('feet', 'inches'), '(\\d+)ft(\\d+)in', convert = TRUE, remove = FALSE) %>%
      transmute(Height_1, 
                Height_2 = 12*feet + inches)
    
    #  Height_1 Height_2
    #1   5ft6in       66
    #2   4ft9in       57
    #3  5ft12in       72
    #4   4ft9in       57
    #5   6ft2in       74
    

    In base R -

    transform(strcapture('(\\d+)ft(\\d+)in', df$Height_1, 
               proto = list(feet = numeric(), inches = numeric())), 
              Height_2 = 12*feet + inches)
    

    data

    df <- structure(list(Height_1 = c("5ft6in", "4ft9in", "5ft12in", "4ft9in", "6ft2in")), row.names = c(NA, -5L), class = "data.frame")