I have built up a somewhat large PostgreSQL database that contains primarily geospatial data. I am currently trying to export some of the data into a GeoJSON format so that I can tile it and use it with some maps (Mapbox).
For the first data set that I am working with, I wrote up a quick script that uses the following ogr2ogr
command to export the data into the GeoJSON format.
ogr2ogr -f GeoJSON \
-progress \
nhd_$2.json \
"PG:dbname=$PG_DB host=$PG_HOST port=$PG_PORT user=$PG_USERNAME password=$PG_PASSWORD" \
-sql "select resolution, geom from nhd_hr_$2"
Due to the large size of the GeoJSON file that was exported, I then used geojsplit
to break up the large GeoJSON file into smaller subfiles and I was then able to use the Mapbox tiling tool to create my tiles and then create a layer for my map.
However, now that I've moved on to the larger databases I keep running into issues where my connection to the database will timeout after only downloading 12-15GB of the data.
My initial thought was to split up my query into sub-queries, but I'm not entirely sure how I could do that. Are there any ways that I could change my approach to exporting the data? Or is there a way for me to break up this query into more manageable chunks?
Use seq
to generate numbers, then change your query to query a range of id values.
I've changed $2
to $suffix
in this example, and IRL you'll drop the echo
once you're happy with the output to actually run the commands (or you could just copy/paste the output into another terminal). You'll want to change 2000
to something safely above the max ogc_fid (select ogc_fid from whatever order by 1 desc limit 1
to find it), and might need to change 1000
to some other chunk size depending on how large each row is. Don't forget to also change the 1000
in the WHERE clause if you do, though.
$ seq 0 500 1500
0
500
1000
1500
$ for suffix in foo bar; do
> for each in $(seq 0 1000 2000); do
> echo ogr2ogr -f GeoJSON -progress nhd_$suffix.json "PG:dbname=$PG_DB host=$PG_HOST port=$PG_PORT user=$PG_USERNAME password=$PG_PASSWORD" \
> -sql "select resolution, geom from nhd_hr_$suffix where ogc_fid>=$each and ogc_fid < ($each + 1000)"
> done
> done
ogr2ogr -f GeoJSON -progress nhd_foo.json PG:dbname= host= port= user= password= -sql select resolution, geom from nhd_hr_foo where ogc_fid>=0 and ogc_fid < (0 + 1000)
ogr2ogr -f GeoJSON -progress nhd_foo.json PG:dbname= host= port= user= password= -sql select resolution, geom from nhd_hr_foo where ogc_fid>=1000 and ogc_fid < (1000 + 1000)
ogr2ogr -f GeoJSON -progress nhd_foo.json PG:dbname= host= port= user= password= -sql select resolution, geom from nhd_hr_foo where ogc_fid>=2000 and ogc_fid < (2000 + 1000)
ogr2ogr -f GeoJSON -progress nhd_bar.json PG:dbname= host= port= user= password= -sql select resolution, geom from nhd_hr_bar where ogc_fid>=0 and ogc_fid < (0 + 1000)
ogr2ogr -f GeoJSON -progress nhd_bar.json PG:dbname= host= port= user= password= -sql select resolution, geom from nhd_hr_bar where ogc_fid>=1000 and ogc_fid < (1000 + 1000)
ogr2ogr -f GeoJSON -progress nhd_bar.json PG:dbname= host= port= user= password= -sql select resolution, geom from nhd_hr_bar where ogc_fid>=2000 and ogc_fid < (2000 + 1000)
Heres the command in one line for easier copy/paste to your shell:
for suffix in foo bar; do for each in $(seq 0 1000 2000); do echo ogr2ogr -f GeoJSON -progress nhd_$suffix.json "PG:dbname=$PG_DB host=$PG_HOST port=$PG_PORT user=$PG_USERNAME password=$PG_PASSWORD" -sql "select resolution, geom from nhd_hr_$suffix where ogc_fid>=$each and ogc_fid < ($each + 1000)"; done; done