Search code examples
rsubsetmissing-datar-sp

Define coordinates using sp: error message about missing values, but no NA data. Why?


I have data with coordinates, without missing values. I would like to define them as coordinates using sp, but for a subset of the data. When I use

subset_of_data <- data[data$variable == x, ]
coordinates_from_data = subset(subset_of_data, select=c("S_X", "S_Y"))
coordinates(coordinates_from_data) <- c("S_X", "S_Y")

I get:

Error in `coordinates<-`(`*tmp*`, value = c("S_X", "S_Y")) : 
coordinates are not allowed to contain missing values

But when I use subset, there is no problem:

subset_of_data <- subset(data, data$variable == x)
coordinates_from_data = subset(subset_of_data, select=c("S_X", "S_Y"))
coordinates(coordinates_from_data) <- c("S_X", "S_Y")

I don't get the error.

Any idea why it is so?


Solution

  • It has nothing to do with sp; it is just how subsetting works in R. Let's take an example:

    df <- data.frame(city = c("Paris", "Berlin", NA),
                     x_coordinate = c(48.8589507, 52.5069312, 50.8550625), 
                     y_coordinate = c(2.27702, 13.1445501, 4.3053501))
    df
        city x_coordinate y_coordinate
    1  Paris     48.85895      2.27702
    2 Berlin     52.50693     13.14455
    3   <NA>     50.85506      4.30535
    

    If we turn this dataframe into coordinates, it works, since there is no NA:

    coordinates(df) <- c("x_coordinate", "y_coordinate")
    

    Let's imagine now that we want to transform in coordinates only a subset of df, e.g., only Paris. If we do:

    sub_df = df[df$city == "Paris", ]
    

    We get:

        city x_coordinate y_coordinate
    1  Paris     48.85895      2.27702
    NA  <NA>           NA           NA
    

    In this case, transforming into coordinates doesn't work anymore, since the subsetting variable contains NA values and subsetting creates NA values in the coordinates variables.

    coordinates(sub_df) <- c("x_coordinate", "y_coordinate")
    Error in `coordinates<-`(`*tmp*`, value = c("X_coordinate", "Y_coordinate" : 
      coordinates are not allowed to contain missing values
    

    The way subset works is different:

    sub_df_2 = subset(df, df$city == "Paris")
    sub_df_2
              coordinates  city
    1 (48.85895, 2.27702) Paris
    

    Another option is to be more specific when using [:

    sub_df_3 = df[df$city == "Paris" & !is.na(df$city), ]
    sub_df_3
              coordinates  city
    1 (48.85895, 2.27702) Paris
    

    For Python users

    It's quite different from Pandas' [ operator:

    import pandas as pd
    import numpy as np
    
    df = pd.DataFrame({'city': ['Paris', 'Berlin', np.NaN],
                       'x_coordinate': [48.8589507, 52.5069312, 50.8550625],
                       'y_coordinate': [2.27702, 13.1445501, 4.3053501]})
    
    print(df[df["city"] == 'Paris'])
    
        city  x_coordinate  y_coordinate
    0  Paris     48.858951       2.27702