I have been trying to work on this dataset that includes quantities for two types of sales (0,1) for different counties across different dates. Some dates, however, include both type 1 and type 0 sales. How can I merge type 1 and 0 sales for the same date and same id? The dataset has over 40k rows and I have no idea where to start. I was thinking about creating an if loop but I have no idea how to write it. It can be in python or R.
Essentially, I have a table that looks like this:
Date | City | Type | Quantity |
---|---|---|---|
2020-01-01 | Rio | 1 | 10 |
2020-01-01 | Rio | 0 | 16 |
2020-03-01 | Rio | 0 | 23 |
2020-03-01 | Rio | 1 | 27 |
2020-05-01 | Rio | 1 | 29 |
2020-08-01 | Rio | 0 | 36 |
2020-01-01 | Sao Paulo | 0 | 50 |
2020-01-01 | Sao Paulo | 1 | 62 |
2020-03-01 | Sao Paulo | 0 | 30 |
2020-04-01 | Sao Paulo | 1 | 32 |
2020-05-01 | Sao Paulo | 0 | 65 |
2020-05-01 | Sao Paulo | 1 | 155 |
I want to combine, for example, Rio's quantities for both type 1 and 0 on 2020-01-01, as well as 2020-03-01, and the same thing for Sao Paulo and all subsequent counties. I want to aggregate types 1 and 0 quantities but still preserve the date and city columns.
Try something like this:
import pandas as pd
df = pd.read_csv('your_file_name.csv')
df.pivot_table(values='Sales', index=['Date', 'City'], aggfunc='sum')