Search code examples
rgeonames

Downloading Geonames


I am interested in downloading Lake Geonames for Canada. Max. rows that can be downloaded per day is 1000. When I run the below code, few records are being missed and some records are overlapped. Is there a way to get total number of lake geonames records available and download the record only once without any overlap ?

library(geonames); GN_lake <- GNsearch(featureCode='LK', country='CA',startRow=1,maxRows = 1000) 

GN_lake <- GNsearch(featureCode='LK', country='CA',startRow=1000, maxRows=1000)


Solution

  • Why not just work with the CA database locally?

    library(httr)
    library(tidyverse)
    
    # Get CA database
    httr::GET(
      url = "http://download.geonames.org/export/dump/CA.zip",
      httr::write_disk("CA.zip"),
      httr::progress()
    ) -> res
    
    # unzip it
    unzip("CA.zip")
    
    read.csv( # readr::read_tsv doesn't like this file at least when I read it
      file = "CA.txt",
      header = FALSE,
      sep = "\t",
      col.names = c(
        "geonameid", "name", "asciiname", "alternatenames", "latitude",
        "longitude", "feature_class", "feature_code", "country", "cc2",
        "admin1_code1", "admin2_code", "admin3_code", "admin4_code",
        "population", "elevation", "dem", "timezone", "modification_date"
      ),
      stringsAsFactors = FALSE
    ) %>% tbl_df() -> ca_geo
    
    filter(ca_geo, feature_code == "LK")
    ## # A tibble: 104,663 x 19
    ##    geonameid name          asciiname     alternatenames latitude longitude
    ##        <int> <chr>         <chr>         <chr>             <dbl>     <dbl>
    ##  1   5881640 101 Mile Lake 101 Mile Lake ""                 51.7    -121. 
    ##  2   5881642 103 Mile Lake 103 Mile Lake ""                 51.7    -121. 
    ##  3   5881644 105 Mile Lake 105 Mile Lake ""                 51.7    -121. 
    ##  4   5881647 108 Mile Lake 108 Mile Lake ""                 51.7    -121. 
    ##  5   5881660 130 Mile Lake 130 Mile Lake ""                 51.9    -122. 
    ##  6   5881666 16 1/2 Mile … 16 1/2 Mile … ""                 52.7    -118. 
    ##  7   5881668 180 Lake      180 Lake      ""                 57.4    -130. 
    ##  8   5881673 {1}útsaw Lake {1}utsaw Lake ""                 62.7    -137. 
    ##  9   5881680 24 Mile Lake  24 Mile Lake  ""                 46.5     -82.0
    ## 10   5881683 28 Mile Lake  28 Mile Lake  ""                 54.8    -124. 
    ## # ... with 104,653 more rows, and 13 more variables: feature_class <chr>,
    ## #   feature_code <chr>, country <chr>, cc2 <chr>, admin1_code1 <int>,
    ## #   admin2_code <chr>, admin3_code <int>, admin4_code <chr>,
    ## #   population <int>, elevation <int>, dem <int>, timezone <chr>,
    ## #   modification_date <chr>