Search code examples
rwhitespacetrimremoving-whitespacer-faq

How can I trim leading and trailing white space?


I am having some trouble with leading and trailing white space in a data.frame.

For example, I look at a specific row in a data.frame based on a certain condition:

> myDummy[myDummy$country == c("Austria"),c(1,2,3:7,19)] 



[1] codeHelper     country        dummyLI    dummyLMI       dummyUMI       

[6] dummyHInonOECD dummyHIOECD    dummyOECD      

<0 rows> (or 0-length row.names)

I was wondering why I didn't get the expected output since the country Austria obviously existed in my data.frame. After looking through my code history and trying to figure out what went wrong I tried:

> myDummy[myDummy$country == c("Austria "),c(1,2,3:7,19)]
   codeHelper  country dummyLI dummyLMI dummyUMI dummyHInonOECD dummyHIOECD
18        AUT Austria        0        0        0              0           1
   dummyOECD
18         1

All I have changed in the command is an additional white space after Austria.

Further annoying problems obviously arise. For example, when I like to merge two frames based on the country column. One data.frame uses "Austria " while the other frame has "Austria". The matching doesn't work.

  1. Is there a nice way to 'show' the white space on my screen so that I am aware of the problem?
  2. And can I remove the leading and trailing white space in R?

So far I used to write a simple Perl script which removes the whites pace, but it would be nice if I can somehow do it inside R.


Solution

  • Probably the best way is to handle the trailing white spaces when you read your data file. If you use read.csv or read.table you can set the parameterstrip.white=TRUE.

    If you want to clean strings afterwards you could use one of these functions:

    # Returns string without leading white space
    trim.leading <- function (x)  sub("^\\s+", "", x)
    
    # Returns string without trailing white space
    trim.trailing <- function (x) sub("\\s+$", "", x)
    
    # Returns string without leading or trailing white space
    trim <- function (x) gsub("^\\s+|\\s+$", "", x)
    

    To use one of these functions on myDummy$country:

     myDummy$country <- trim(myDummy$country)
    

    To 'show' the white space you could use:

     paste(myDummy$country)
    

    which will show you the strings surrounded by quotation marks (") making white spaces easier to spot.