Search code examples
rvectordata-sciencedata-processing

How do I extract the number from a string vector and convert it to a integer vector?


I have a string vector in R as follows:

[1] Type 1 Type 2 Type 4 Type 3 Type 4 Type 6 Type 2 Type 5 
[9] Type 2 Type 3 Type 7 

Also note:

str(data)
# Factor w/ 7 levels "Type 1","Type 2",..: 1 2 1 3 4 1 2 4 2 3 ...

I want to convert this to an integer vector to be able to perform cluster analysis (get cluster performance index). Because I am getting the following error: argument 'part' must be an integer vector

What would be the most effective solution?


Solution

  • The str output shows that you have a factor, not a vector of character strings. It also shows that the level labels are Type 1, Type 2 and so on. A factor will represent the first level internally as 1, the second as 2 and so on. So, supposing we have data shown reproducibly in the Note at the end, to convert it to an integer vector we only have to use as.integer:

    as.integer(data)
    ## [1] 1 2 1 3 4 1 2 4 2 3
    

    If the label levels are not actually Type 1, Type 2 so that, for example, the third level is represented by Type 93, say, rather than Type 3 then we can implicitly convert to character and remove the non-digit characters and finally convert the rest to an integer vector.

    as.integer(gsub("\\D", "", data))
    ## [1] 1 2 1 3 4 1 2 4 2 3
    

    Note

    data <- structure(c(1L, 2L, 1L, 3L, 4L, 1L, 2L, 4L, 2L, 3L), .Label = c("Type 1", 
    "Type 2", "Type 3", "Type 4"), class = "factor")