I'm trying to count distinct values across columns using sqlite3 in python but cant seem to get the correct results. Am only able to get the count of distinct values in 1 column.
I've create a database and imported the csv file as a table
conn = sqlite3.connect('test.db')
curr = conn.cursor()
curr.execute('DROP TABLE IF EXISTS test')
curr.execute('CREATE TABLE social_table (from_id INTEGER, to_id INTEGER)')
conn.commit()
with open ('test.csv') as f:
reader = csv.reader(f)
next(reader, None)
for row in reader:
from_id = row[0]
to_id = row[1]
curr.execute('INSERT INTO test (from_id, to_id) VALUES (?, ?)',
(from_id, to_id))
conn.commit()
from_id | to_id |
---|---|
0 | 1 |
0 | 2 |
0 | 3 |
0 | 4 |
0 | 5 |
0 | 6 |
0 | 7 |
0 | 8 |
0 | 10 |
0 | 11 |
The unique users in this table across columns should be 11 but my code
curr.execute("Select Count(*) from (Select DISTINCT from_id, to_id from test)")
gives me 10 instead.
Anyone able to help me with this?
It seems that your query is not correct, instead try this:
curr.execute("""
SELECT COUNT(DISTINCT user_id)
FROM (
SELECT from_id as user_id FROM social_table
UNION
SELECT to_id as user_id FROM social_table
)
""")
Just select all distinct values of from_id
and then unioning them with all distinct values of to_id
, then count the number of distinct values in this unioned set, which should give you the total number of unique users across both columns.