I am trying to split the first column range into two separate columns as so
I have tried the following code:
testDF$IncomeLowerRange <- strsplit(gsub("[^-]+-", "", testDF$IncomeRange), ";")[1]
testDF$IncomeUpperRange <- strsplit(gsub("[^-]+-", "", testDF$IncomeRange), ";")[2]
I do not think there is a one-liner for this as you need to extract some of the data, replace fully or partially in other cases.
I suggest using a simple logic for this:
> test <- c("$25,000-49,000","Not displayed", "$100,000+")
> df <- data.frame(test)
> df$col1 <- sub("^\\$?([0-9,]+).*", "\\1", df$test)
> df$col1[df$col1=="Not displayed"] <- "NA"
> df$col2 <- sub("^[^-]+-(.*)", "\\1", df$test)
> df$col2[df$col2=="Not displayed"] <- "NA"
> df$col2[df$col2=="$100,000+"] <- "250,000"
>
> df
test col1 col2
1 $25,000-49,000 25,000 49,000
2 Not displayed NA NA
3 $100,000+ 100,000 250,000
where col1
will be your IncomeLowerRange
and col2
will be your IncomeUpperRange
.
The ^\\$?([0-9,]+).*
regex is used to extract the first number from a range, and also 100,000
into the first column (^
matches the start of string, \$?
matches 1 or 0 $
symbols, ([0-9,]+)
captures digits and commas into Group 1 that is kept, and the rest (also, all after the number - .*
) is removed.