Search code examples
sqlsql-serveraggregateperiodgaps-in-data

Fill month gaps based on multiple columns and aggregate data


I need to aggregate data on a monthly basis and also fill the gaps of months for the whole year with 0 Amounts. The problem is that the aggregation must happen on multiple columns and this makes it tricky (i.e. using date table and left join does not work for me as other data in dataset has matches within different periods that this dataset is missing)

The dataset I have is:

Period Account CompanyType Amount
20220101 11111 Internal 100
20220201 11111 Internal 200
20220601 11111 Internal 300
20221001 11111 Internal 800
20221001 11111 External 100
20221001 11111 External 300
20221201 11111 Internal 100
20220101 22222 External 20
20220601 22222 External 50
20221201 22222 External 30

What i'm looking for is:

Period Account Type Amount
20220101 11111 Internal 100
20220201 11111 Internal 200
20220301 11111 Internal 0
20220401 11111 Internal 0
20220501 11111 Internal 0
20220601 11111 Internal 300
20220701 11111 Internal 0
20220801 11111 Internal 0
20220901 11111 Internal 0
20221001 11111 Internal 800
20221101 11111 Internal 0
20221201 11111 Internal 100
20220101 11111 External 0
20220201 11111 External 0
20220301 11111 External 0
20220401 11111 External 0
20220501 11111 External 0
20220601 11111 External 0
20220701 11111 External 0
20220801 11111 External 0
20220901 11111 External 0
20221001 11111 External 400
20221101 11111 External 0
20221201 11111 External 0
20220101 22222 Internal 20
20220201 22222 Internal 0
20220301 22222 Internal 0
20220401 22222 Internal 0
20220501 22222 Internal 0
20220601 22222 External 50
20220701 22222 External 0
20220801 22222 External 0
20220901 22222 External 0
20221001 22222 External 0
20221101 22222 External 0
20221201 22222 External 30

Lines that are bolded should be created in the result set.

I have tried multiple solutions but just cannot seem to get it to work due to the reason that the dataset has multiple columns and should be aggregated based on all of them (except Amount).

Update: I oversimplified the problem in my example. The sample dataset with all the fields that i am looking into is provided here: Dbfiddle I do have Dates (Calendar) table present and can use it by the way. Note that CompanyType field is tied to Company field and is not relevant in this case.

The result that i am looking for is following: For every combination of ActiveMark, Account, Company(CompanyType), Currency there must be 12 records in total (from January to December). Missing periods should have Amount set as 0


Solution

  • Suppose you have what you've shown to us - one table with data as is.

    To get desired result you can use such kind of query:

    WITH months(month) AS (
        SELECT * FROM (VALUES (1), (2), (3), (4), (5), (6), (7), (8), (9), (10), (11), (12)) AS m(m)
    ),
    accounts(account) AS (
        SELECT DISTINCT account FROM test
    ),
    companyTypes(companyType) AS (
        SELECT DISTINCT companyType FROM test
    ),
    periods(period) AS (
        SELECT DATEFROMPARTS(2022, month, 01) FROM months WHERE NOT EXISTS (
            SELECT 1 FROM test WHERE MONTH(period) = month
        ) UNION ALL (SELECT DISTINCT period FROM Test)
    )
    SELECT p.period, a.account, ct.companyType, SUM(COALESCE(t.amount, 0)) 
    FROM periods p
    CROSS JOIN accounts as a
    CROSS JOIN companyTypes as ct
    LEFT JOIN test as t ON t.period = p.period AND t.account = a.account AND t.companyType = ct.companyType
    GROUP BY p.period, a.account, ct.companyType
    ORDER BY a.account, ct.companyType, p.period
    

    Please, check working demo