Search code examples
rdataframedata.tablesplitstackshapecsplit

Can't remove columns from a dataframe, output turns into a logical vector


There seems to be something wrong with the data.frame I get from the cSplit function.

I can't extract columns without NAs from using the code below:

data_places <- data_table[ , colSums(is.na(data_table)) == 0 ]

The output is a Named logi vector rather than a data.frame that doesn't have columns that possess rows with NAs.

The issue is mainly due to the data.frame output of the cSplit function of of the splitstackshape package. The issue also appears using the data.table package.

I tried creating a new data.frame that extracts the columns of the data.frame output of the cSplit function and the code above works fine.

Any ideas what's wrong with cSplit's data.frame output?

Here's a sample of my code:

library(splitstackshape)
data <- data.frame(V1=c("Place1-Place1-Place1-Place1-Place3-Place5",
          "Place1-Place4-Place2-Place3-Place3-Place5-Place5",
          "Place6-Place6",
          "Place1-Place2-Place3-Place4"))

data_table <- cSplit(data, "V1", sep="-", direction = "wide")
data_places <- data_table[ , colSums(is.na(data_table)) == 0 ]
data_places
str(data_places)

Solution

  • We need to use with=FALSE as the output of cSplit is a data.table object.

    data_table[ , colSums(is.na(data_table)) == 0 , with=FALSE]
    #      V1_1   V1_2
    #1: Place1 Place1
    #2: Place1 Place4
    #3: Place6 Place6
    #4: Place1 Place2
    

    If we look at the ?data.table

    with - By default with=TRUE and j is evaluated within the frame of x; column names can be used as variables. When with=FALSE j is a character vector of column names or a numeric vector of column positions to select, and the value returned is always a data.table. with=FALSE is often useful in data.table to select columns dynamically.


    Another option would be to use Filter

    Filter(function(x) all(!is.na(x)), data_table)