Search code examples
rgisshapefile

How do I take a sample of features from a shapefile in R?


I have a shapefile with thousands of features (called x). I only want, say, 10% of those features. How can I do a random sample of those features? I've tried:

    final_x <- sample(x,nrow(x)/10) 

but this returns an error.

I've also tried:

    x_samp <- sample(x_samp$OBJECTID,nrow(x)/10)
    x_samp <- as.data.frame(x_samp)
    final_x <- x[x$OBJECTID==x_samp$x_samp]

I've managed to get what I want by:

    x_samp <- sample(x$OBJECTID,nrow(x)/10)
    for (i in x_samp) {
    x1 <- x[x$OBJECTID==i,]
    final_x <- rbind(x,x1)
    }
     

The above just feels a little clunky and not very elegant. Is there a better solution?

Many thanks


Solution

  • Read your shapefile as an sf object, and then you can use bracketted subsetting.

    library(sf)
    
    set.seed(432)
    # reading example nc data as sf
    my_sf <- read_sf(system.file('shape/nc.shp', package = 'sf'))
    nrow(my_sf)
    #> [1] 100
    
    my_sf_sampled <- my_sf[sample(nrow(my_sf), size = nrow(my_sf)/10), ]
    nrow(my_sf_sampled)
    #> [1] 10
    

    Created on 2021-02-26 by the reprex package (v1.0.0)

    Or, use dplyr::sample_frac:

    library(dplyr)
    library(sf)
    
    my_sf <- read_sf(system.file('shape/nc.shp', package = 'sf'))
    sample_frac(my_sf, .1)
    
    #> Simple feature collection with 10 features and 14 fields
    #> geometry type:  MULTIPOLYGON
    #> dimension:      XY
    #> bbox:           xmin: -81.9856 ymin: 34.82742 xmax: -76.02605 ymax: 36.54606
    #> geographic CRS: NAD27
    #> # A tibble: 10 x 15
    #>     AREA PERIMETER CNTY_ CNTY_ID NAME  FIPS  FIPSNO CRESS_ID BIR74 SID74 NWBIR74
    #>  * <dbl>     <dbl> <dbl>   <dbl> <chr> <chr>  <dbl>    <int> <dbl> <dbl>   <dbl>
    #>  1 0.104      1.55  2065    2065 Leno… 37107  37107       54  3589    10    1826
    #>  2 0.154      1.68  2030    2030 Harn… 37085  37085       43  3776     6    1051
    #>  3 0.172      1.84  2090    2090 Cumb… 37051  37051       26 20366    38    7043
    #>  4 0.098      1.26  2097    2097 Hoke  37093  37093       47  1494     7     987
    #>  5 0.078      1.38  2034    2034 Linc… 37109  37109       55  2216     8     302
    #>  6 0.109      1.32  1841    1841 Pers… 37145  37145       73  1556     4     613
    #>  7 0.18       2.14  1973    1973 Chat… 37037  37037       19  1646     2     591
    #>  8 0.044      1.16  1887    1887 Chow… 37041  37041       21   751     1     368
    #>  9 0.134      1.76  1958    1958 Burke 37023  37023       12  3573     5     326
    #> 10 0.099      1.41  1963    1963 Tyrr… 37177  37177       89   248     0     116
    #> # … with 4 more variables: BIR79 <dbl>, SID79 <dbl>, NWBIR79 <dbl>,
    #> #   geometry <MULTIPOLYGON [°]>
    

    Created on 2021-02-26 by the reprex package (v1.0.0)