Search code examples
rregextidyverse

Use regex to transform height column into total inches column


I am trying to "use regular expressions to use the height column to create a total_inches column that has the total height in inches and is recorded as a numeric variable.

I came up with a solution... but I don't think it fits the task... (language is R)

bball_data$total_inches <- bball_data$height %>% 
  str_extract(regex("(^\\d+) (-) (\\d+$)", comments = TRUE))


bball_data <- bball_data %>% separate(total_inches, c("feet", "inches"), "-", convert = TRUE) %>%
    mutate(total_inches = (12*feet + inches))

The top line is essentially useless other than using the regex... it's been about 3 hours lol what am I missing?

*update...

bball_data$total_inches <- str_replace(bball_data$height,regex("(^\\d+)(-)(\\d+$)", comments = TRUE), "\\1+\\3")

this is getting me pretty close to the end result.. but as you may guess i am unable to add \1 to \3... i tried as.numeric and as.int... but neither work.. is there another way to add this replacement value? dput output is ...

structure(list(name = c("Alaa Abdelnaby", "Zaid Abdul-Aziz", "Kareem Abdul-Jabbar", "Mahmoud Abdul-Rauf", "Tariq Abdul-Wahad", "Shareef Abdur-Rahim"), year_start = c(1991L, 1969L, 1970L, 1991L, 1998L, 1997L), year_end = c(1995L, 1978L, 1989L, 2001L, 2003L, 2008L), position = c("F-C", "C-F", "C", "G", "F", "F"), height = c("6-10", "6-9", "7-2", "6-1", "6-6", "6-9"), weight = c(240L, 235L, 225L, 162L, 223L, 225L), birth_date = c("June 24, 1968", "April 7, 1946", "April 16, 1947", "March 9, 1969", "November 3, 1974", "December 11, 1976" ), college = c("Duke University", "Iowa State University", "University of California, Los Angeles", "Louisiana State University", "San Jose State University", "University of California" )), row.names = c(NA, 6L), class = "data.frame")


Solution

  • My apologies for posting a poor question format... i'll do better next time... here is the answer i came up with after about 5.5 hours!

    bball_data$height <- str_extract_all(bball_data$height, "\\d+", simplify = TRUE)
    
    bball_data$total_inches <- as.numeric(bball_data$height[,1])*12 + as.numeric(bball_data$height[,2])
    

    this code will take the height then put it into a matrix. the 2nd line will take that value as.numeric and then compute the value i needed "total inches" and put it into a total_inches column within bball_data.