Search code examples
rdataframedplyrsubsetr-sf

Subsetting column of data frame by rows where the column is an empty list


I have an sf object data frame in R with a column that is a list of characters and character lists itself. For some rows the column is an empty list. I want to subset to only those rows where the column in question is an empty list.

Here's a screenshot of the data structure:

the sf object in question with

I have tried to subset it multiple ways but all have failed, returning a data.frame with 0 rows:

a <- character(0)
subset(dt, identical(a, neighborhood))
Simple feature collection with 0 features and 2 fields
Bounding box:  xmin: NA ymin: NA xmax: NA ymax: NA
Geodetic CRS:  WGS84(DD)
[1] POP20        geometry     neighborhood
<0 rows> (or 0-length row.names)

subset(dt, is_empty(neighborhood))
Simple feature collection with 0 features and 2 fields
Bounding box:  xmin: NA ymin: NA xmax: NA ymax: NA
Geodetic CRS:  WGS84(DD)
[1] POP20        geometry     neighborhood
<0 rows> (or 0-length row.names)

> subset(dt, length(neighborhood[[1]])==0)
Simple feature collection with 0 features and 2 fields
Bounding box:  xmin: NA ymin: NA xmax: NA ymax: NA
Geodetic CRS:  WGS84(DD)
[1] POP20        geometry     neighborhood
<0 rows> (or 0-length row.names)

subset(dt, is.na(neighborhood))
Simple feature collection with 0 features and 2 fields
Bounding box:  xmin: NA ymin: NA xmax: NA ymax: NA
Geodetic CRS:  WGS84(DD)
[1] POP20        geometry     neighborhood
<0 rows> (or 0-length row.names)

subset(dt, length(neighborhood[[1]])==0)
Simple feature collection with 0 features and 2 fields
Bounding box:  xmin: NA ymin: NA xmax: NA ymax: NA
Geodetic CRS:  WGS84(DD)
[1] POP20        geometry     neighborhood
<0 rows> (or 0-length row.names)

I'm at a loss for subsetting this.


Solution

  • First a small example:

    test <- data.frame(
        "POP20" = 1:4,
        "neighb" = I(list("O'Hare", c("Garfield", "Cleaning"), character(0),  "O'hare"))
    )
    

    Then you just have to remember that length return length for a object. If you want the same for a list you should use lengths (<-- see the S).

    subset(test, lengths(test$neighb) == 0)