Search code examples
rdataframeextractchr

Extract the Chr number from the column


I have a data frame that has a column containing the chromosome details (1 to 22). I would like to create another column with only Chr numbers enter image description here


Solution

  • Using stringr package and regex you may achieve what you are searching for but you need to know all possibilities. Maybe if there is only underscore between what you want and annoying information, you can solve your problem using str_split and "_" as pattern parameter.

    library(stringr)
    df <- data.frame(chromosome = c("chr6_GL000253v2_alt", "chr6_GL000254v2_alt",
                                    "chr6_GL000255v2_alt", "chr6_GL000256v2_alt", "chr4", "chr11",
                                    "chr8", "chr12", "chr2", "chr12", "chr4", "chr6", "chr15", "chr4",
                                    "chr2"))
    df$chromosome_fixed=str_split(df$chromosome,"_",simplify = T)[,1]