I'm following the instructions here Dummy variables from a string variable to try to convert a column of strings (words separated by spaces) into dummy variables (0-1 to indicate a word being notused/used in the string in that row) using concat.split.expanded but get a bunch of the below error:
In lapply(listOfValues, as.integer) : NAs introduced by coercion
preceded by one of
Error in seq.default(min(vec), max(vec)) : 'from' cannot be NA, NaN or infinite
I'm pretty sure there aren't any NAs in the column to be converted, let alone that many. Not sure how to go about fixing this. Thanks!
command I've been running that produces the problem:
concat.split.expanded(dataset, "stringvarname", sep = " ", mode = "binary", drop = false)
Produces the problem with or without fill=
You need to specify that you are splitting concatenated strings ("var2" in the sample data below) and not numeric values concatenated as strings ("var3" in the sample data below).
Here's an example that reproduces your error and shows the working solution:
df = data.frame(var1 = 1:2, var2 = c("a b c", "a c d"), var3 = c("1 2 3", "1 2 5"))
library(splitstackshape)
cSplit_e(df, "var3", sep = " ")
# var1 var2 var3 var3_1 var3_2 var3_3 var3_4 var3_5
# 1 1 a b c 1 2 3 1 1 1 NA NA
# 2 2 a c d 1 2 5 1 1 NA NA 1
## Will give you an error
cSplit_e(df, "var2", sep = " ")
# Error in seq.default(min(vec), max(vec)) :
# 'from' cannot be NA, NaN or infinite In addition: Warning messages:
# 1: In lapply(listOfValues, as.integer) : NAs introduced by coercion
# 2: In lapply(listOfValues, as.integer) : NAs introduced by coercion
cSplit_e(df, "var2", sep = " ", type = "character")
# var1 var2 var3 var2_a var2_b var2_c var2_d
# 1 1 a b c 1 2 3 1 1 1 NA
# 2 2 a c d 1 2 5 1 NA 1 1
Why? cSplit_e
uses seq
, and seq
is for numeric input.
> seq("a", "c")
Error in seq.default("a", "c") : 'from' cannot be NA, NaN or infinite