I have data with coordinates, without missing values. I would like to define them as coordinates using sp, but for a subset of the data. When I use
subset_of_data <- data[data$variable == x, ]
coordinates_from_data = subset(subset_of_data, select=c("S_X", "S_Y"))
coordinates(coordinates_from_data) <- c("S_X", "S_Y")
I get:
Error in `coordinates<-`(`*tmp*`, value = c("S_X", "S_Y")) :
coordinates are not allowed to contain missing values
But when I use subset, there is no problem:
subset_of_data <- subset(data, data$variable == x)
coordinates_from_data = subset(subset_of_data, select=c("S_X", "S_Y"))
coordinates(coordinates_from_data) <- c("S_X", "S_Y")
I don't get the error.
Any idea why it is so?
It has nothing to do with sp
; it is just how subsetting works in R. Let's take an example:
df <- data.frame(city = c("Paris", "Berlin", NA),
x_coordinate = c(48.8589507, 52.5069312, 50.8550625),
y_coordinate = c(2.27702, 13.1445501, 4.3053501))
df
city x_coordinate y_coordinate
1 Paris 48.85895 2.27702
2 Berlin 52.50693 13.14455
3 <NA> 50.85506 4.30535
If we turn this dataframe into coordinates, it works, since there is no NA:
coordinates(df) <- c("x_coordinate", "y_coordinate")
Let's imagine now that we want to transform in coordinates only a subset of df, e.g., only Paris. If we do:
sub_df = df[df$city == "Paris", ]
We get:
city x_coordinate y_coordinate
1 Paris 48.85895 2.27702
NA <NA> NA NA
In this case, transforming into coordinates doesn't work anymore, since the subsetting variable contains NA values and subsetting creates NA values in the coordinates variables.
coordinates(sub_df) <- c("x_coordinate", "y_coordinate")
Error in `coordinates<-`(`*tmp*`, value = c("X_coordinate", "Y_coordinate" :
coordinates are not allowed to contain missing values
The way subset
works is different:
sub_df_2 = subset(df, df$city == "Paris")
sub_df_2
coordinates city
1 (48.85895, 2.27702) Paris
Another option is to be more specific when using [
:
sub_df_3 = df[df$city == "Paris" & !is.na(df$city), ]
sub_df_3
coordinates city
1 (48.85895, 2.27702) Paris
It's quite different from Pandas' [
operator:
import pandas as pd
import numpy as np
df = pd.DataFrame({'city': ['Paris', 'Berlin', np.NaN],
'x_coordinate': [48.8589507, 52.5069312, 50.8550625],
'y_coordinate': [2.27702, 13.1445501, 4.3053501]})
print(df[df["city"] == 'Paris'])
city x_coordinate y_coordinate
0 Paris 48.858951 2.27702