Search code examples
pythonvectorrastergdalqgis

Quickest way to mask raster using polygons from gml file


I am building a headless script to preprocess Sentinel images. For info the session is setup as follows:

import sys
import os
import qgis
from qgis.core import *
#from PyQt4.QtGui import *

QgsApplication.setPrefixPath("/usr", False)
app = QgsApplication([], False)
app.initQgis()
sys.path.append('/usr/share/qgis/python/plugins')
from processing.core.Processing import Processing
Processing.initialize()
import processing as p

One of the steps consists of getting the supplied cloud mask file (.gml), translating it to a raster and use it to mask the Sentinel image. As follows:

translate gml to shapefile

ogr2ogr.main(["","-f", "ESRI Shapefile", "-s_srs", "EPSG:32630", cpv, clMpath])

rasterize cloud mask to obtain raster with only 0 (no cloud) and 1(cloud)

# parameters to rasterize
fName=[field.name() for field in QgsVectorLayer(cpv, "cl", "ogr").pendingFields()][0]
p.runalg("gdalogr:rasterize", cpv, fName, 1, 10,10, extImg, False, 5,0,4,75, 6,1,False,0,"-burn 1 -a_srs 'EPSG:32630'",cloudRast)

mask out cloudy pixels

rMaskPath=wod+"clMaskNDVI.tif"
    p.runalg("gdalogr:rastercalculator", img, "1", cloudRast, "1", None, "1", None, "1", None, "1", None, "1", "A*((-1*B)+1)", "", 5, "", rMaskPath)

This operation is rather quick done interactively (~1 minute) but when I launch it through a script it takes around 40 minutes (mostly in rasterizing the polygon layer). I have also tried other ways: using directly the GML file, or grass7 modules, but this does not speed up the operation.

WOuld you know why there is such difference in time? Can you suggest a way to speed up rasterizing the shapefile?

Thanks


Solution

  • So the solution was actually to let Gdal handle the rastertization directly: so first I build a string folowing standard gdal syntax:

    cmd="gdal_rasterize -burn 0 -a_nodata 1000 -a_srs %s -te %s -tr %s %s %s %s" %(imgCrs.authid(), extImg.replace(","," "), cloudRes, cloudRes, cpv, cloudRast)
    

    Then I let the system run it with:

      os.system(cmd)
    

    This has proven much quicker than all other solutions found. It might be less elegant or "pytonic" but so far I couldn't find any drawback to this.