Search code examples
sqlgroupingadvantage-database-server

SQL Grouping by year gives incorrect results


I am trying to summerize sales date, by month, sales region and type. The problem is, the results change when I try to group by year.

My simplified query is as follows:

SELECT
DAB700.DATUM,DAB000.X_REGION,DAB700.BELEG_ART, // the date, sales region, order type
   // calculate the number of orders per month
   COUNT (DISTINCT CASE WHEN MONTH(DAB700.DATUM) =  1 THEN DAB700.BELEG_NR END) as jan,
   COUNT (DISTINCT CASE WHEN MONTH(DAB700.DATUM) =  2 THEN DAB700.BELEG_NR END) as feb,
   COUNT (DISTINCT CASE WHEN MONTH(DAB700.DATUM) =  3 THEN DAB700.BELEG_NR END) as mar
FROM "DAB700.ADT" DAB700
left join "DAB050.ADT" DAB050 on DAB700.BELEG_NR = DAB050.ANUMMER // join to table 050, to pull in order info
left join "DF030000.DBF" DAB000 on DAB050.KDNR = DAB000.KDNR // join table 000 to table 050, to pull in customer info
left join "DAB055.ADT" DAB055 on DAB050.ANUMMER = left (DAB055.APNUMMER,6)// join table 055 to table 050, to pull in product info
WHERE (DAB700.BELEG_ART = 10 OR DAB700.BELEG_ART = 20) AND (DAB700.DATUM>={d '2021-01-01'}) AND (DAB700.DATUM<={d '2021-01-11'}) AND DAB055.ARTNR <> '999999' AND DAB055.ARTNR <> '999996' AND DAB055.TERMIN <> 'KW.22.22' AND DAB055.TERMIN <> 'KW.99.99' AND DAB050.AUF_ART = 0
group by DAB700.DATUM,DAB000.X_REGION,DAB700.BELEG_ART   

This returns the following data, which is correct (manually checked):

| DATUM      | X_REGION | BELEG_ART | jan | feb | mar |
|------------|----------|-----------|-----|-----|-----|
| 04.01.2021 | 1        | 10        | 3   | 0   | 0   |
| 04.01.2021 | 3        | 10        | 2   | 0   | 0   |
| 04.01.2021 | 4        | 10        | 1   | 0   | 0   |
| 04.01.2021 | 4        | 20        | 1   | 0   | 0   |
| 04.01.2021 | 6        | 20        | 2   | 0   | 0   |
| 05.01.2021 | 1        | 10        | 1   | 0   | 0   |
and so on....

The total number of records for Jan is 117 (correct).

Now I now want to summerize the data in one row (for example, data grouped by region and type)..

so I change my code so that I have:

SELECT
YEAR(DAB700.DATUM),

and

group by YEAR(DAB700.DATUM)

the rest of the code stays the same.

Now my results are:

| EXPR | X_REGION | BELEG_ART | jan | feb | mar |
|------|----------|-----------|-----|-----|-----|
| 2021 | 1        | 10        | 16  | 0   | 0   |
| 2021 | 1        | 20        | 16  | 0   | 0   |
| 2021 | 2        | 10        | 19  | 0   | 0   |
| 2021 | 2        | 20        | 22  | 0   | 0   |
| 2021 | 3        | 10        | 12  | 0   | 0   |
| 2021 | 3        | 20        | 6   | 0   | 0   |

Visually it is correct. But, the total count for January is now 116. A difference of 1. What am I doing wrong?

How can I keep the results from the first code - but have it presented as per the 2nd set?


Solution

  • You count distinct BELEG_NR. This is what makes the difference. Let's look at an example. Let's say your table contains four rows:

    DATUM X_REGION BELEG_ART BELEG_NR
    04.01.2021 1 10 100
    04.01.2021 1 10 200
    05.01.2021 1 10 100
    05.01.2021 1 10 300

    That gives you per day, region and belegart:

    DATUM X_REGION BELEG_ART DISTINCT COUNT BELEG_NR
    04.01.2021 1 10 2
    05.01.2021 1 10 2

    and per year, region and belegart

    YEAR X_REGION BELEG_ART DISTINCT COUNT BELEG_NR
    2021 1 10 3

    The BELEG_NR 100 never appears more than once per day, so every instance gets counted. But it appears twice for the year, so it gets counted once instead of twice.