I am having some trouble with leading and trailing white space in a data.frame.
For example, I look at a specific row
in a data.frame
based on a certain condition:
> myDummy[myDummy$country == c("Austria"),c(1,2,3:7,19)]
[1] codeHelper country dummyLI dummyLMI dummyUMI
[6] dummyHInonOECD dummyHIOECD dummyOECD
<0 rows> (or 0-length row.names)
I was wondering why I didn't get the expected output since the country Austria obviously existed in my data.frame
. After looking through my code history and trying to figure out what went wrong I tried:
> myDummy[myDummy$country == c("Austria "),c(1,2,3:7,19)]
codeHelper country dummyLI dummyLMI dummyUMI dummyHInonOECD dummyHIOECD
18 AUT Austria 0 0 0 0 1
dummyOECD
18 1
All I have changed in the command is an additional white space after Austria.
Further annoying problems obviously arise. For example, when I like to merge two frames based on the country column. One data.frame
uses "Austria "
while the other frame has "Austria"
. The matching doesn't work.
So far I used to write a simple Perl script which removes the whites pace, but it would be nice if I can somehow do it inside R.
Probably the best way is to handle the trailing white spaces when you read your data file. If you use read.csv
or read.table
you can set the parameterstrip.white=TRUE
.
If you want to clean strings afterwards you could use one of these functions:
# Returns string without leading white space
trim.leading <- function (x) sub("^\\s+", "", x)
# Returns string without trailing white space
trim.trailing <- function (x) sub("\\s+$", "", x)
# Returns string without leading or trailing white space
trim <- function (x) gsub("^\\s+|\\s+$", "", x)
To use one of these functions on myDummy$country
:
myDummy$country <- trim(myDummy$country)
To 'show' the white space you could use:
paste(myDummy$country)
which will show you the strings surrounded by quotation marks (") making white spaces easier to spot.