Search code examples
pythonsqlsqlitecountdistinct

Counting distinct values across columns using sqlite3 in python


I'm trying to count distinct values across columns using sqlite3 in python but cant seem to get the correct results. Am only able to get the count of distinct values in 1 column.

I've create a database and imported the csv file as a table

conn = sqlite3.connect('test.db')
curr = conn.cursor()

curr.execute('DROP TABLE IF EXISTS test')
curr.execute('CREATE TABLE social_table (from_id INTEGER, to_id INTEGER)')
conn.commit()

with open ('test.csv') as f:
    reader = csv.reader(f)
    next(reader, None)

    for row in reader:
        from_id = row[0]
        to_id = row[1]

        curr.execute('INSERT INTO test (from_id, to_id) VALUES (?, ?)',
                    (from_id, to_id))
    conn.commit()
from_id to_id
0 1
0 2
0 3
0 4
0 5
0 6
0 7
0 8
0 10
0 11

The unique users in this table across columns should be 11 but my code

curr.execute("Select Count(*) from (Select DISTINCT from_id, to_id from test)")

gives me 10 instead.

Anyone able to help me with this?


Solution

  • It seems that your query is not correct, instead try this:

    curr.execute("""
        SELECT COUNT(DISTINCT user_id) 
        FROM (
            SELECT from_id as user_id FROM social_table 
            UNION 
            SELECT to_id as user_id FROM social_table
        )
    """)
    

    Just select all distinct values of from_id and then unioning them with all distinct values of to_id, then count the number of distinct values in this unioned set, which should give you the total number of unique users across both columns.