I have a question related to the SQL query. The data looks like this:
Name1 | Name2 | Category | Total |
---|---|---|---|
ABC | NULL | Category1 | 100 |
DEF | ABC | Category1 | 20 |
GHI | ABC | Category3 | 300 |
XYZ | DEF | Category2 | 60 |
XYZ | GHI | Category3 | 80 |
NULL | NULL | Category2 | 90 |
I am trying to merge the Name1
and Name2
and grouping them based on Category
with the Total
having the sum if same name have same categories.
Thanks in advance
The method I tried were this (kinda similar to these two queries asked in stackoverflow): Match two columns and add the third column in SQL SQL: GROUP BY column B for groups in column a
pysqldf = lambda q1: sqldf(q1, globals())
q1 = """SELECT Name, SUM(Total) AS Total
FROM
(
SELECT Name1 AS Name, SUM(Total) AS Total
FROM df
WHERE Name1 IS NOT NULL
GROUP BY Name1
UNION ALL
SELECT Name2, SUM(Total)
FROM df
WHERE Name2 IS NOT NULL
GROUP BY Name2
) t
GROUP BY Name
ORDER BY Name"""
df2 = pysqldf(q1)
Tried in pandas also but it gives different results with the following command:
df2 = df.groupby(['investigator','second_investigator','requested_test']).agg({'Count': 'sum'})
The output required is like this:
Name | Category | Total |
---|---|---|
ABC | Category1 | 120 |
ABC | Category3 | 300 |
DEF | Category1 | 20 |
DEF | Category2 | 60 |
GHI | Category3 | 380 |
XYZ | Category2 | 60 |
XYZ | Category3 | 80 |
SELECT Name, Category, SUM(Total) AS Total
FROM
(
SELECT Name1 AS Name, Category, SUM(Total) AS Total
FROM df
WHERE Name1 IS NOT NULL
GROUP BY 1, 2
UNION ALL
SELECT Name2, Category, SUM(Total)
FROM df
WHERE Name2 IS NOT NULL
GROUP BY 1, 2
) t
GROUP BY 1, 2
ORDER BY 1