Search code examples
pythondjangogisgeojsongeodjango

How to do performance optimization while serializing lots of GeoDjango geometry fields?


I'm developing a GeoDjango app which use the provided WorldBorder model in the tutorial. I also created my own Region model which is tied to WorldBorder. So a WorldBorder/Country can have multiple Regions which has borders (MultiPolygon field) in it too.

I made the API for it using DRF but it's so slow, it takes 16 seconds to load all WorldBorder and Regions in GeoJSON format. The returned JSON size is 10MB though. Is that reasonable?

I even change the serializer to serpy which is way much faster than the DRF GIS serializer but only offers 10% performance improvement.

Turns out after profiling, most of the time is spent in the GIS functions to convert data type in the database to list of coordinates instead of WKT. If I use WKT, the serialization is much faster (1.7s compared to 11.7s, the WKT is only for WorldBorder MultiPolygon, everything else is still in GeoJson)

I also tried to compress the MultiPolygon using ST_SimplifyVW with low tolerance (0.005) to preserve the accuracies, which brings down the JSON size to 1.7 MB. This makes the total load to 3.5s. Of course I can still find which is the best tolerance to balance accuracy and speed.

Below is the profiling data (the sudden increase of queries in the simplified MultiPolygon is due to bad usage of Django QS API to get use of ST_SimplifyVW)

enter image description here

EDIT: I fixed the DB query so the query calls stays the same at 75 queries and as expected, it does not increase the performance significantly.

EDIT: I continued to improve my DB queries. I reduced it to just 8 queries now. As expected, it does not improve that much performance.

enter image description here

Below is profiling for the function calls. I highlight the part which took most of the time. This one is using vanilla DRF GIS implementation.enter image description here

Below is when I use WKT for one of the MultiPolygon field without ST_SimplifyVW. enter image description here

Here's the models as requested by @Udi

class WorldBorderQueryset(models.query.QuerySet):
    def simplified(self, tolerance):
        sql = "SELECT ST_SimplifyVW(mpoly, %s) AS mpoly"
        return self.extra(
            select={'mpoly': sql},
            select_params=(tolerance,)
        )


class WorldBorderManager(models.Manager):
    def get_by_natural_key(self, name, iso2):
        return self.get(name=name, iso2=iso2)

    def get_queryset(self, *args, **kwargs):
        qs = WorldBorderQueryset(self.model, using=self._db)
        qs = qs.prefetch_related('regions',)
        return qs

    def simplified(self, level):
        return self.get_queryset().simplified(level)


class WorldBorder(TimeStampedModel):
    name = models.CharField(max_length=50)
    area = models.IntegerField(null=True, blank=True)
    pop2005 = models.IntegerField('Population 2005', default=0)
    fips = models.CharField('FIPS Code', max_length=2, null=True, blank=True)
    iso2 = models.CharField('2 Digit ISO', max_length=2, null=True, blank=True)
    iso3 = models.CharField('3 Digit ISO', max_length=3, null=True, blank=True)
    un = models.IntegerField('United Nations Code', null=True, blank=True)
    region = models.IntegerField('Region Code', null=True, blank=True)
    subregion = models.IntegerField('Sub-Region Code', null=True, blank=True)
    lon = models.FloatField(null=True, blank=True)
    lat = models.FloatField(null=True, blank=True)

    # generated from lon lat to be one field so that it can be easily
    # edited in admin
    center_coordinates = models.PointField(blank=True, null=True)

    mpoly = models.MultiPolygonField(help_text='Borders')

    objects = WorldBorderManager()

    def save(self, *args, **kwargs):
        if not self.center_coordinates:
            self.center_coordinates = Point(x=self.lon, y=self.lat)
        super().save(*args, **kwargs)

    def natural_key(self):
        return self.name, self.iso2

    def __str__(self):
        return self.name

    class Meta:
        verbose_name = 'Country'
        verbose_name_plural = 'Countries'
        ordering = ('name',)


class Region(TimeStampedModel):
    name = models.CharField(max_length=100, unique=True)
    country = models.ForeignKey(WorldBorder, related_name='regions')
    mpoly = models.MultiPolygonField(help_text='Areas')
    center_coordinates = models.PointField()

    moment_category = models.ForeignKey('moment.MomentCategory',
                                        blank=True, null=True)

    objects = RegionManager()
    no_joins = models.Manager()

    def natural_key(self):
        return (self.name,)

    def __str__(self):
        return self.name


# TODO might want to have separate table for ActiveCity for performance
# improvement since we have like 50k cities
class City(TimeStampedModel):
    country = models.ForeignKey(WorldBorder, on_delete=models.PROTECT,
                                related_name='cities')
    region = models.ForeignKey(Region, blank=True, null=True,
                               related_name='cities',
                               on_delete=models.SET_NULL)

    name = models.CharField(max_length=255)
    accent_city = models.CharField(max_length=255)
    population = models.IntegerField(blank=True, null=True)
    is_capital = models.BooleanField(default=False)

    center_coordinates = models.PointField()

    # is active marks that this city is a destination
    # only cities with is_active True will be put up to the frontend
    is_active = models.BooleanField(default=False)

    objects = DefaultSelectOrPrefetchManager(
        prefetch_related=(
            'yes_moment_beacons__activity__verb',
            'social_beacons',
            'video_beacons'
        ),
        select_related=('region', 'country')
    )
    no_joins = models.Manager()

    def natural_key(self):
        return (self.name,)

    def __str__(self):
        return self.name

    class Meta:
        verbose_name_plural = 'Cities'

class Beacon(TimeStampedModel):
    # if null defaults to city center coordinates
    coordinates = models.PointField(blank=True, null=True)
    is_fake = models.BooleanField(default=False)

    # can use city here, but the %(class)s gives no space between words
    # and it looks ugly

    def validate_activity(self):
        # activities in the region
        activities = self.city.region.moment_category.activities.all()
        if self.activity not in activities:
            raise ValidationError('Activity is not in the Region')

    def clean(self):
        self.validate_activity()

    def save(self, *args, **kwargs):
        # doing a full clean is needed here is to ensure code correctness
        # (not user),
        # because if someone use objects.create, clean() will never get called,
        # cons is validation will be done twice if the object is
        # created e.g. from admin
        self.full_clean()

        if not self.coordinates:
            self.coordinates = self.city.center_coordinates
        super().save(*args, **kwargs)

    class Meta:
        abstract = True


class YesMomentBeacon(Beacon):
    activity = models.ForeignKey('moment.Activity',
                                 on_delete=models.CASCADE,
                                 related_name='yes_moment_beacons')
    # ..........
    # other fields

    city = models.ForeignKey('world.City', related_name='yes_moment_beacons')

    objects = DefaultSelectOrPrefetchManager(
        select_related=('activity__verb',)
    )

    def __str__(self):
        return '{} - {}'.format(self.activity, self.coordinates)

# other beacon types.......

Here's my serializer as request by @Udi

class RegionInWorldSerializer(GeoFeatureModelSerializer):
    yes_moment_beacons = serializers.SerializerMethodField()
    social_beacons = serializers.SerializerMethodField()
    video_beacons = serializers.SerializerMethodField()

    center_coordinates = GeometrySerializerMethodField()

    def get_center_coordinates(self, obj):
        return obj.center_coordinates

    def get_yes_moment_beacons(self, obj):
        count = 0

        # don't worry, it's already prefetched in the manager
        # (including the below methods) so len is used instead of count
        cities = obj.cities.all()

        for city in cities:
            beacons = city.yes_moment_beacons.all()
            count += len(beacons)
        return count

    def get_social_beacons(self, obj):
        count = 0

        cities = obj.cities.all()

        for city in cities:
            beacons = city.social_beacons.all()
            count += len(beacons)
        return count

    def get_video_beacons(self, obj):
        count = 0

        cities = obj.cities.all()

        for city in cities:
            beacons = city.video_beacons.all()
            count += len(beacons)
        return count

    class Meta:
        model = Region
        geo_field = 'center_coordinates'
        fields = ('name', 'yes_moment_beacons', 'video_beacons',
                  'social_beacons')


class WorldSerializer(GeoFeatureModelSerializer):
    center_coordinates = GeometrySerializerMethodField()

    regions = RegionInWorldSerializer(many=True, read_only=True)

    def get_center_coordinates(self, obj):
        return obj.center_coordinates

    class Meta:
        model = WorldBorder
        geo_field = 'mpoly'

        fields = ('name', 'iso2', 'center_coordinates', 'regions')

This is the main query

def get_queryset(self):
    tolerance = self.request.GET.get('tolerance', None)
    if tolerance is not None:
        tolerance = float(tolerance)
        return WorldBorder.objects.simplified(tolerance)
    else:
        return WorldBorder.objects.all()

Here's a slice of the API response (1 of 236 objects) using ST_SimplifyVW with a high tolerance. If I don't use it, Firefox hangs because it can't handle 10 MB of JSON I think. This particular country borders data is small compared to other countries. The JSON returned here is compressed from 10MB to 750kb due to ST_SimplifyVW. Even with only 750KB of JSON, it took 4.5s in my local machine.

{
  "type": "FeatureCollection",
  "features": [
    {
      "type": "Feature",
      "geometry": {
        "coordinates": [
          [
            [
              [
                74.915741,
                37.237328
              ],
              [
                74.400543,
                37.138962
              ],
              [
                74.038315,
                36.814682
              ],
              [
                73.668304,
                36.909637
              ],
              [
                72.556641,
                36.821266
              ],
              [
                71.581131,
                36.346443
              ],
              [
                71.18779,
                36.039444
              ],
              [
                71.647766,
                35.419991
              ],
              [
                71.496094,
                34.959435
              ],
              [
                70.978592,
                34.504997
              ],
              [
                71.077209,
                34.052216
              ],
              [
                70.472214,
                33.944153
              ],
              [
                70.002777,
                34.052773
              ],
              [
                70.323318,
                33.327774
              ],
              [
                69.561096,
                33.08194
              ],
              [
                69.287491,
                32.526382
              ],
              [
                69.328247,
                31.940365
              ],
              [
                69.013885,
                31.648884
              ],
              [
                68.161102,
                31.830276
              ],
              [
                67.575546,
                31.53194
              ],
              [
                67.778046,
                31.332218
              ],
              [
                66.727768,
                31.214996
              ],
              [
                66.395538,
                30.94083
              ],
              [
                66.256653,
                29.85194
              ],
              [
                65.034149,
                29.541107
              ],
              [
                64.059143,
                29.41444
              ],
              [
                63.587212,
                29.503887
              ],
              [
                62.484436,
                29.406105
              ],
              [
                60.868599,
                29.863884
              ],
              [
                61.758331,
                30.790276
              ],
              [
                61.713608,
                31.383331
              ],
              [
                60.85305,
                31.494995
              ],
              [
                60.858887,
                32.217209
              ],
              [
                60.582497,
                33.066101
              ],
              [
                60.886383,
                33.557213
              ],
              [
                60.533882,
                33.635826
              ],
              [
                60.508331,
                34.140274
              ],
              [
                60.878876,
                34.319717
              ],
              [
                61.289162,
                35.626381
              ],
              [
                62.029716,
                35.448601
              ],
              [
                62.309158,
                35.141663
              ],
              [
                63.091934,
                35.432495
              ],
              [
                63.131378,
                35.865273
              ],
              [
                63.986107,
                36.038048
              ],
              [
                64.473877,
                36.255554
              ],
              [
                64.823044,
                37.138603
              ],
              [
                65.517487,
                37.247215
              ],
              [
                65.771927,
                37.537498
              ],
              [
                66.302765,
                37.323608
              ],
              [
                67.004166,
                37.38221
              ],
              [
                67.229431,
                37.191933
              ],
              [
                67.765823,
                37.215546
              ],
              [
                68.001389,
                36.936104
              ],
              [
                68.664154,
                37.274994
              ],
              [
                69.246643,
                37.094154
              ],
              [
                69.515823,
                37.580826
              ],
              [
                70.134995,
                37.529045
              ],
              [
                70.165543,
                37.871719
              ],
              [
                70.71138,
                38.409866
              ],
              [
                70.97998,
                38.470459
              ],
              [
                71.591934,
                37.902618
              ],
              [
                71.429428,
                37.075829
              ],
              [
                71.842758,
                36.692101
              ],
              [
                72.658508,
                37.021202
              ],
              [
                73.307205,
                37.462753
              ],
              [
                73.819717,
                37.228058
              ],
              [
                74.247208,
                37.409546
              ],
              [
                74.915741,
                37.237328
              ]
            ]
          ]
        ],
        "type": "MultiPolygon"
      },
      "properties": {
        "name": "Afghanistan",
        "iso2": "AF",
        "center_coordinates": {
          "coordinates": [
            65.216,
            33.677
          ],
          "type": "Point"
        },
        "regions": {
          "type": "FeatureCollection",
          "features": [
            {
              "type": "Feature",
              "geometry": {
                "coordinates": [
                  66.75292967820785,
                  34.52466146754814
                ],
                "type": "Point"
              },
              "properties": {
                "name": "Central Afghanistan",
                "yes_moment_beacons": 0,
                "video_beacons": 0,
                "social_beacons": 0
              }
            },
            {
              "type": "Feature",
              "geometry": {
                "coordinates": [
                  69.69726561529792,
                  35.96022296494905
                ],
                "type": "Point"
              },
              "properties": {
                "name": "Northern Highlands",
                "yes_moment_beacons": 0,
                "video_beacons": 0,
                "social_beacons": 0
              }
            },
            {
              "type": "Feature",
              "geometry": {
                "coordinates": [
                  63.89541422401191,
                  32.27442932956255
                ],
                "type": "Point"
              },
              "properties": {
                "name": "Southwestern Afghanistan",
                "yes_moment_beacons": 0,
                "video_beacons": 0,
                "social_beacons": 0
              }
            }
          ]
        }
      }
    },
    ........
}

So the point here, is GeoDjango is not fast as I expected or that performance numbers is expected? What can I do to improve performance while still outputting GeoJSON i.e. not WKT. Is fine tuning tolerance is the only way? I might also separate the endpoint for getting the regions though.


Solution

  • Since your geographic data does not change frequently, try caching all region/country polygons in pre-calculated geojsons. I.e., create a /country/123.geojson API call or static file with the geo data for all regions in this country, probably simplified in advance.

    Your other API calls should return only the numeric data, without geographic polygons, leaving the combining task to the client.