Search code examples
rregexstringrstringi

Add a white-space between number and special character condition R


I'm trying to use stringr or R base calls to conditionally add a white-space for instances in a large vector where there is a numeric value then a special character - in this case a $ sign without a space. str_pad doesn't appear to allow for a reference vectors.

For example, for:

$6.88$7.34

I'd like to add a whitespace after the last number and before the next dollar sign:

$6.88 $7.34

Thanks!


Solution

  • This will work if you are working with a vectored string:

    mystring<-as.vector('$6.88$7.34 $8.34$4.31')
    
    gsub("(?<=\\d)\\$", " $", mystring, perl=T)
    
    [1] "$6.88 $7.34 $8.34 $4.31"
    

    This includes cases where there is already space as well.

    Regarding the question asked in the comments:

    mystring2<-as.vector('Regular_Distribution_Type† Income Only" "Distribution_Rate 5.34%" "Distribution_Amount $0.0295" "Distribution_Frequency Monthly')
    
    gsub("(?<=[[:alpha:]])\\s(?=[[:alpha:]]+)", "_", mystring2, perl=T)
    
    [1] "Regular_Distribution_Type<U+2020> Income_Only\" \"Distribution_Rate 5.34%\" \"Distribution_Amount $0.0295\" \"Distribution_Frequency_Monthly"
    

    Note that the \ appears due to nested quotes in the vector, should not make a difference. Also <U+2020> appears due to encoding the special character.

    Explanation of regex:

    (?<=[[:alpha:]]) This first part is a positive look-behind created by ?<=, this basically looks behind anything we are trying to match to make sure what we define in the look behind is there. In this case we are looking for [[:alpha:]] which matches a alphabetic character.

    We then check for a blank space with \s, in R we have to use a double escape so \\s, this is what we are trying to match.

    Finally we use (?=[[:alpha:]]+), which is a positive look-ahead defined by ?= that checks to make sure our match is followed by another letter as explained above.

    The logic is to find a blank space between letters, and match the space, which then is replaced by gsub, with a _

    See all the regex here