Search code examples
rstring-matching

Adding a dash to a string


i think I have a simple question, but I did not get it. I have something like this:

df <- data.frame(identifier = c("9562231945200505501901190109-5405303
", "190109-8731478", "1901098260031", " 
.9..43675190109-3690341", "-1103214010200000190109-8841419", "-190109-5232506-.08001234-111",
                                "190109-2018362-","51770217835901218103304190109-9339765
"), true_values = c("190109-5405303","190109-8731478","190109-8260031","190109-3690341","190109-8841419",
                    "190109-5232506","190109-2018362","190109-9339765"))

I used the following function and it almost worked, but I do not know how too avoid the last dash.

I tried str_replace and sth else, but it did not work.


Solution

  • You can try substr with paste after removing unwanted parts with gsub.

    tt <- gsub("-\\..*", "", df$identifier)
    tt <- gsub("[^0-9]", "", tt)
    tt <- substring(tt, nchar(tt)-12)
    paste0(substr(tt, 1, 6), "-", substring(tt, 7))
    #[1] "190109-5405303" "190109-8731478" "190109-8260031" "190109-3690341"
    #[5] "190109-8841419" "190109-5232506" "190109-2018362" "190109-9339765"