Search code examples
pythondjangopostgresqlmany-to-manypostgis

Querying objects using attribute of member of many-to-many


I have the following models:

class Member(models.Model):
    ref = models.CharField(max_length=200)
    # some other stuff
    def __str__(self):
        return self.ref

class Feature(models.Model):
    feature_id = models.BigIntegerField(default=0)
    members = models.ManyToManyField(Member)
    # some other stuff

A Member is basically just a pointer to a Feature. So let's say I have Features:

  • feature_id = 2, members = 1, 2
  • feature_id = 4
  • feature_id = 3

Then the members would be:

  • id = 1, ref = 4
  • id = 2, ref = 3

I want to find all of the Features which contain one or more Members from a list of "ok members." Currently my query looks like this:

# ndtmp is a query set of member-less Features which Members can point to
sids = [str(i) for i in list(ndtmp.values('feature_id'))]
# now make a query set that contains all rels and ways with at least one member with an id in sids
okmems = Member.objects.filter(ref__in=sids)
relsways = Feature.geoobjects.filter(members__in=okmems)
# now combine with nodes
op = relsways | ndtmp

This is enormously slow, and I'm not even sure if it's working. I've tried using print statements to debug, just to make sure anything is actually being parsed, and I get the following:

print(ndtmp.count())
>>> 12747
print(len(sids))
>>> 12747
print(okmems.count())

... and then the code just hangs for minutes, and eventually I quit it. I think that I just overcomplicated the query, but I'm not sure how best to simplify it. Should I:

  1. Migrate Feature to use a CharField instead of a BigIntegerField? There is no real reason for me to use a BigIntegerField, I just did so because I was following a tutorial when I began this project. I tried a simple migration by just changing it in models.py and I got a "numeric" value in the column in PostgreSQL with format 'Decimal:( the id )', but there's probably some way around that that would force it to just shove the id into a string.

  2. Use some feature of Many-To-Many Fields which I don't know abut to more efficiently check for matches

  3. Calculate the bounding box of each Feature and store it in another column so that I don't have to do this calculation every time I query the database (so just the single fixed cost of calculation upon Migration + the cost of calculating whenever I add a new Feature or modify an existing one)?

Or something else? In case it helps, this is for a server-side script for an ongoing OpenStreetMap related project of mine, and you can see the work in progress here.

EDIT - I think a much faster way to get ndids is like this:

ndids = ndtmp.values_list('feature_id', flat=True)

This works, producing a non-empty set of ids. Unfortunately, I am still at a loss as to how to get okmems. I tried:

okmems = Member.objects.filter(ref__in=str(ndids))

But it returns an empty query set. And I can confirm that the ref points are correct, via the following test:

Member.objects.values('ref')[:1]
>>> [{'ref': '2286047272'}]
Feature.objects.filter(feature_id='2286047272').values('feature_id')[:1]
>>> [{'feature_id': '2286047272'}]

Solution

  • Ultimately, I was wrong to set up the database using a numeric id in one table and a text-type id in the other. I am not very familiar with migrations yet, but as some point I'll have to take a deep dive into that world and figure out how to migrate my database to use numerics on both. For now, this works:

    # ndtmp is a query set of member-less Features which Members can point to
    # get the unique ids from ndtmp as strings
    strids = ndtmp.extra({'feature_id_str':"CAST( \
        feature_id AS VARCHAR)"}).order_by( \
        '-feature_id_str').values_list('feature_id_str',flat=True).distinct()
    # find all members whose ref values can be found in stride
    okmems = Member.objects.filter(ref__in=strids)
    # find all features containing one or more members in the accepted members list
    relsways = Feature.geoobjects.filter(members__in=okmems)
    # combine that with my existing list of allowed member-less features
    op = relsways | ndtmp
    # prove that this set is not empty
    op.count()
    # takes about 10 seconds
    >>> 8997148 # looks like it worked!
    

    Basically, I am making a query set of feature_ids (numerics) and casting it to be a query set of text-type (varchar) field values. I am then using values_list to make it only contain these string id values, and then I am finding all of the members whose ref ids are in that list of allowed Features. Now I know which members are allowed, so I can filter out all the Features which contain one or more members in that allowed list. Finally, I combine this query set of allowed Features which contain members with ndtmp, my original query set of allowed Features which do not contain members.