Search code examples
pythondjangoimportshapefilegeodjango

Django LayerMapping: How to filter shape-file before saving to database


I have a shape-file which I want to import to a django database using the Django-LayerMapping module (Link to module), because it transforms the spatial data into GeoDjango models. The vanilla way to import a shape-file to the db according to the tutorial is as follows:

lm = LayerMapping(table, path, mapping, transform=True , encoding='utf-8') # load shape-file
lm.save(verbose=True) # save entire file to db

But in my case, the table to which I want to import the shape-file data is not empty. I only want to add those rows (or features in shape-file lingo) to the db which are not already present. However, the LayerMapping only provides a method to save an entire shape-file to the db, not single entries, which would result in duplicates in my case.

Thus, my question is: how can I filter the entries of a layer mapping-object before saving it?

Until now, I thought about two possible solutions:

  1. Filter the entries in the layer mapping object and saving the entire object with the .save()-method that is provided. But I don't know how to delete single entries from a layer mapping object.

  2. Iterate through all entries in the layer mapping object and check for each if it is already present in the database and only save it if it is not present. However, I didn't find a layer-mapping-method to save single entries to the db. It would be possible to just read the attributes and create the objects myself but then I wouldn't have access to the coordinate-transformation which was the initial reason to use layer mapping-module.

So the question remains the same: How can I filter this layer-mapping object before saving it?


Solution

  • An option worth trying with LayerMapping is the unique argument that:

    Setting this to the name, or a tuple of names, from the given model, will create models unique only to the given name(s). Geometries from each feature will be added into the collection associated with the unique model. Forces the transaction mode to be 'autocommit'.

    Checking the code executed in case of an existing unique name we can see that it tries to append the given geometry to any existing records:

    if self.unique:
        # If we want unique models on a particular field, handle the
        # geometry appropriately.
        try:
            # Getting the keyword arguments and retrieving
            # the unique model.
            u_kwargs = self.unique_kwargs(kwargs)
            m = self.model.objects.using(self.using).get(**u_kwargs)
            is_update = True
    
            # Getting the geometry (in OGR form), creating
            # one from the kwargs WKT, adding in additional
            # geometries, and update the attribute with the
            # just-updated geometry WKT.
            geom_value = getattr(m, self.geom_field)
            if geom_value is None:
                geom = OGRGeometry(kwargs[self.geom_field])
            else:
                geom = geom_value.ogr
                new = OGRGeometry(kwargs[self.geom_field])
                for g in new:
                    geom.add(g)
                setattr(m, self.geom_field, geom.wkt)
        except ObjectDoesNotExist:
            # No unique model exists yet, create.
            m = self.model(**kwargs)
    

    If this is what fits your needs as functionality, then you can try the unique option as follows:

    lm = LayerMapping(
        table, 
        path, 
        mapping, 
        transform=True , 
        unique=('field_name_1', 'field_name_2', ...), 
        encoding='utf-8'
    )
    

    If the above does not fit the needs of your project, then the options you mention will work fine.