Python: Get DISTINCT column values from inside CASE function in pandasql

I am trying to write a query in Python using pandasql. My code is as below,

import pandas as pd
from pandasql import *

data = pd.read_csv('registerlog.csv')

q = """
SELECT
    a.RegistrationMonth, COUNT(DISTINCT a.UserID) AS UserSize,
    COUNT(
        CASE a.MonthDifference
            WHEN 0.0 THEN DISTINCT a.UserID ELSE NULL
        END
    ) AS MonthZero
FROM
    data) AS a
GROUP BY
    a.RegistrationMonth
"""

print sqldf(q, locals())

But this gives the following error,

print sqldf(q, locals()) File "C:\Python27\lib\site-packages\pandasql\sqldf.py", line 156, in sqldf return PandaSQL(db_uri)(query, env) File "C:\Python27\lib\site-packages\pandasql\sqldf.py", line 63, in call raise PandaSQLException(ex) PandaSQLException: (sqlite3.OperationalError) near "DISTINCT": syntax error

But if I use WHEN 0.0 THEN a.user_id ELSE NULL then it works. Also the normal way of COUNT(DISTINCT a.user_id) also works fine.

But I want to get only the DISTINCT values inside the CASE. Is there a way to achieve this to get the DISTINCT count value inside the CASE?

Solution

In the SQL grammer, DISTINCT does not belong to any values (expressions), but to the SELECT or the aggregate function (here: COUNT). So you have to write it directly after the SELECT or the opening parenthesis:

SELECT ..., COUNT(DISTINCT CASE ... END) ...