Search code examples
djangodjango-queryset

Complex query using Django QuerySets


I am working on a personal project and I am trying to write a complex query that:

  1. Gets every device that belongs to a certain user

  2. Gets every sensor belonging to every one of the user's devices

  3. Gets the last recorded value and timestamp for each of the user's devices sensors.

I am using Sqlite, and I managed to write the query as plain SQL, however, for the life of me I cannot figure out a way to do it in django. I looked at other questions, tried going through the documentation, but to no avail.

My models:

class User(AbstractBaseUser):
    email = models.EmailField()

class Device(models.Model):
    user = models.ForeignKey(User)
    name = models.CharField()

class Unit(models.Model):
    name = models.CharField()

class SensorType(models.Model):
    name = models.CharField()
    unit = models.ForeignKey(Unit)

class Sensor(models.Model):
    gpio_port = models.IntegerField()
    device = models.ForeignKey(Device)
    sensor_type = models.ForeignKey(SensorType)

class SensorData(models.Model):
    sensor = models.ForeignKey(Sensor)
    value = models.FloatField()
    timestamp = models.DateTimeField()

And here is the SQL query:

SELECT acc.email, 
           dev.name as device_name, 
           stype.name as sensor_type,
           sen.gpio_port as sensor_port,
           sdata.value as sensor_latest_value, 
           unit.name as sensor_units, 
           sdata.latest as value_received_on
FROM devices_device as dev
INNER JOIN accounts_user  as acc on dev.user_id = acc.id
INNER JOIN devices_sensor  as sen on sen.device_id = dev.id
INNER JOIN devices_sensortype as stype on stype.id = sen.sensor_type_id
INNER JOIN devices_unit as unit on unit.id = stype.unit_id
LEFT JOIN (
            SELECT MAX(sd.timestamp) latest, sd.value, sensor_id
            FROM devices_sensordata as sd
            INNER JOIN devices_sensor as s ON s.id = sd.sensor_id
        GROUP BY sd.sensor_id) as sdata on sdata.sensor_id= sen.id
WHERE acc.id = 1
ORDER BY dev.id

I have been playing with the django shell in order to find a way to implement this query with the QuerySet API, but I cannot figure it out...

The closest I managed to get is with this:

>>> sub = SensorData.objects.values('sensor_id', 'value').filter(sensor_id=OuterRef('pk')).order_by('-timestamp')[:1]
>>> Sensor.objects.annotate(data_id=Subquery(sub.values('sensor_id'))).filter(id=F('data_id')).values(...)

However it has two problems:

  1. It does not include the sensors that do not yet have any values in SensorsData
  2. If i include the SensorData.values field into the .values() I start to get previously recorded values of the sensors

If someone could please show me how to do it, or at least tell me what I am doing wrong I will be very grateful!

EDIT: Based on the answers I should clarify: I only want the latest sensor value for each sensor. For example I have In sensordata:

id | sensor_id | value | timestamp|
1  |  1             |  2       |  <today>   |
2  |  1             |  5       | <yesterday>|
3  |  2             |  3       | <yesterday>|

Only the latest should be returned for each sensor_id:

id |   sensor_id    |   value  |  timestamp |
1  |  1             |  2       |  <today>   |
3  |  2             |  3       | <yesterday>|

Or if the sensor does not yet have any data in this table, I waant the query to return a record of it with "null" for value and timestamp (basically the left join in my SQL query).

EDIT2:

Based on @ivissani 's answer, I managed to produce this:

>>> latest_sensor_data = Sensor.objects.annotate(is_latest=~Exists(SensorData.objects.filter(sensor=OuterRef('id'),timestamp__gt=OuterRef('sensordata__timestamp')))).filter(is_latest=True)
>>> user_devices = latest_sensor_data.filter(device__user=1)
>>> for x in user_devices.values_list('device__name','sensor_type__name', 'gpio_port','sensordata__value', 'sensor_type__unit__name', 'sensordata__timestamp').order_by('device__name'):
...     print(x)

Which seems to do the job.

This is the SQL it produces:

    SELECT
  "devices_device"."name",
  "devices_sensortype"."name",
  "devices_sensor"."gpio_port",
  "devices_sensordata"."value",
  "devices_unit"."name",
  "devices_sensordata"."timestamp"
FROM
  "devices_sensor"
  LEFT OUTER JOIN "devices_sensordata" ON (
    "devices_sensor"."id" = "devices_sensordata"."sensor_id"
  )
  INNER JOIN "devices_device" ON (
    "devices_sensor"."device_id" = "devices_device"."id"
  )
  INNER JOIN "devices_sensortype" ON (
    "devices_sensor"."sensor_type_id" = "devices_sensortype"."id"
  )
  INNER JOIN "devices_unit" ON (
    "devices_sensortype"."unit_id" = "devices_unit"."id"
  )
WHERE
  (
    NOT EXISTS(
      SELECT
        U0."id",
        U0."sensor_id",
        U0."value",
        U0."timestamp"
      FROM
        "devices_sensordata" U0
      WHERE
        (
          U0."sensor_id" = ("devices_sensor"."id")
          AND U0."timestamp" > ("devices_sensordata"."timestamp")
        )
    ) = True
    AND "devices_device"."user_id" = 1
  )
ORDER BY
  "devices_device"."name" ASC

Solution

  • Actually your query is rather simple, the only complex part is to establish which SensorData is the latest for each Sensor. I would go by using annotations and an Exists subquery in the following way

    latest_data = SensorData.objects.annotate(
        is_latest=~Exists(
            SensorData.objects.filter(sensor=OuterRef('sensor'),
                                      timestamp__gt=OuterRef('timestamp'))
        )
    ).filter(is_latest=True)
    

    Then it's just a matter of filtering this queryset by user in the following way:

    certain_user_latest_data = latest_data.filter(sensor__device__user=certain_user)
    

    Now as you want to retrieve the sensors even if they don't have any data this query will not suffice as only SensorData instances are retrieved and the Sensor and Device must be accessed through fields. Unfortunately Django does not allow for explicit joins through its ORM. Therefore I suggest the following (and let me say, this is far from ideal from a performance perspective).

    The idea is to annotate the Sensors queryset with the specific values of the latest SensorData (value and timestamp) if any exists in the following way:

    latest_data = SensorData.objects.annotate(
        is_latest=~Exists(
            SensorData.objects.filter(sensor=OuterRef('sensor'),
                                      timestamp__gt=OuterRef('timestamp'))
        )
    ).filter(is_latest=True, sensor=OuterRef('pk'))
    
    sensors_with_value = Sensor.objects.annotate(
        latest_value=Subquery(latest_data.values('value')),
        latest_value_timestamp=Subquery(latest_data.values('timestamp'))
    )  # This will generate two subqueries...
    
    certain_user_sensors = sensors_with_value.filter(device__user=certain_user).select_related('device__user')
    
    

    If there aren't any instances of SensorData for a certain Sensor then the annotated fields latest_value and latest_value_timestamp will simply be set to None.