Search code examples
mysqldjangodjango-ormdjango-aggregation

Django 1.6 + MySQL : Type Cast MySQL variable to search for Max, Avg


My Model is sort of like

class ServiceUtilization(models.Model):
device_name = models.CharField()
service_name = models.CharField()
data_source = models.CharField()
current_value = models.CharField()
sys_timestamp = models.IntegerField()

Now, here current_value represents the value in Float stored as VarChar, w.r.t the time stored as unixtime

While trying to get Max and Average values of current_value I am getting unexpected results, because for Max, MySQL would do a string based comparision, where in '100' value < '9.99' which in incorrect w.r.t values taken in Float.

I tried :

perf = ServiceUtilization.objects.filter(
        device_name__in=devices,
        service_name__in=services,
        data_source__in=data_sources,
        sys_timestamp__gte=start_date,
        sys_timestamp__lte=end_date
    ).values(
        'device_name',
        'service_name',
        'data_source'
    ).annotate(
        max_val=Max('current_value'),
        avg_val=Avg('current_value')
    )

It provides the incorrect results.

Then looking at : HOW select min from cast varchar to int in mysql

I thought about providing query set with extra

perf = ServiceUtilization.objects.extra(
       select={
          'max_val': "MAX(CAST(current_value AS SIGNED))",
          'avg_val': "AVG(CAST(current_value AS SIGNED))"
       }
       ).filter(
        device_name__in=devices,
        service_name__in=services,
        data_source__in=data_sources,
        sys_timestamp__gte=start_date,
        sys_timestamp__lte=end_date
    ).values(
        'device_name',
        'service_name',
        'data_source',
        'max_val',
        'avg_val'
    )

But this just provides a single value and not desired results. This translates to SQL as

SELECT (MAX(CAST(current_value AS SIGNED))) AS `max_val`, (AVG(CAST(current_value AS SIGNED))) AS `avg_val`, `performance_utilizationstatus`.`device_name`, `performance_utilizationstatus`.`service_name`, `performance_utilizationstatus`.`data_source`

FROM performance_utilizationstatus ORDER BY performance_utilizationstatus.sys_timestamp DESC;

But the working code would require a GROUP BY on (device_name, service_name, data_source)

SELECT (MAX(CAST(current_value AS SIGNED))) AS `max_val`, (AVG(CAST(current_value AS SIGNED))) AS `avg_val`, `performance_utilizationstatus`.`device_name`, `performance_utilizationstatus`.`service_name`, `performance_utilizationstatus`.`data_source`  FROM `performance_utilizationstatus` 

GROUP BY performance_utilizationstatus.device_name, performance_utilizationstatus.service_name, performance_utilizationstatus.data_source ORDER BY performance_utilizationstatus.sys_timestamp DESC;

How to add the GROUP BY CLAUSE ?

Using annotate would not work here

1111, 'Invalid use of group function'

or

ERROR 1056 (42000): Can't group on 'max_val'

Would RAW SQL be the last resort here ?


Solution

  • I think you must go with .raw because using .extra wont be possible here.

    The problem is because Django don't have .group_by the only way go group by something is to use .values and .annotate after that. (as you have done it in the first attempt)

    so.. why you can't use .extra ? Because:

    Any extra() call made after a values() call will have its extra selected fields ignored.

    and

    If you use a values() clause after an extra() call, any fields defined by a select argument in the extra() must be explicitly included in the values() call.

    so the only way to get the .extra fields is to add them into .values but this will cause to group by this fields which is an undesired behavior.