There seems to be something wrong with the data.frame
I get from the cSplit
function.
I can't extract columns without NAs
from using the code below:
data_places <- data_table[ , colSums(is.na(data_table)) == 0 ]
The output is a Named logi
vector rather than a data.frame
that doesn't have columns that possess rows with NAs.
The issue is mainly due to the data.frame
output of the cSplit
function of of the splitstackshape
package. The issue also appears using the data.table
package.
I tried creating a new data.frame
that extracts the columns of the data.frame
output of the cSplit
function and the code above works fine.
Any ideas what's wrong with cSplit
's data.frame
output?
Here's a sample of my code:
library(splitstackshape)
data <- data.frame(V1=c("Place1-Place1-Place1-Place1-Place3-Place5",
"Place1-Place4-Place2-Place3-Place3-Place5-Place5",
"Place6-Place6",
"Place1-Place2-Place3-Place4"))
data_table <- cSplit(data, "V1", sep="-", direction = "wide")
data_places <- data_table[ , colSums(is.na(data_table)) == 0 ]
data_places
str(data_places)
We need to use with=FALSE
as the output of cSplit
is a data.table
object.
data_table[ , colSums(is.na(data_table)) == 0 , with=FALSE]
# V1_1 V1_2
#1: Place1 Place1
#2: Place1 Place4
#3: Place6 Place6
#4: Place1 Place2
If we look at the ?data.table
with - By default with=TRUE and j is evaluated within the frame of x; column names can be used as variables. When with=FALSE j is a character vector of column names or a numeric vector of column positions to select, and the value returned is always a data.table. with=FALSE is often useful in data.table to select columns dynamically.
Another option would be to use Filter
Filter(function(x) all(!is.na(x)), data_table)