Search code examples
rsubstringread.csvleading-zero

R Remove leading 0's from a factor string


I have imported multiple excel files into R using the read.csv() function.

On the smaller files, the leading 0's in the uniqueID column have been kept e.g. 085405, 021X1B, 0051012

However on the larger files, the leading 0's have been dropped from the uniqueID's where they only contain numbers e.g. 85405, 021X1B, 51012

I would like to drop the leading 0's from all uniqueID's so I am able to merge.

I have tried using the following code:

Test$UniqueID2 <- substr(Dataset$UniqueID,regexpr("[^0]",Dataset$UniqueID,nchar(Dataset$UniqueID))

This generated the following error:

Error in nchar(Dataset$UniqueID) : 
  'nchar()' requires a character vector

A solution which will allow me to drop leading 0's in R would be much appreciated.


Solution

  • We can use sub for this to match a zero (0) at the start (^) of the string followed by zero or more numbers ([0-9]*) until the end ($) of the string, which got captured as a group and replaced by the backreference (\\1) of the captured group

    sub("^0+([0-9]*)$", "\\1", str1)
    #[1] "85405"  "021X1B" "51012"
    

    If we want to remove from all the IDs

    sub("^0+", "", str1)
    

    Or we can use the as.numeric approach

    v1 <- as.numeric(str1)
    v1[is.na(v1)] <- str1[is.na(v1)]
    

    data

    str1 <- c("085405", "021X1B", "0051012")