Search code examples
sqlsql-serverssms

Join tables and sum specific data based on a mapping


I'm trying to learn SSMS and I've got this thing I'm trying to do. I've got this table called CompanyGroup:

| SourceCompanyId | TargetCompanyId  |
| ---  | ------ |
| 4626 | 359468 |
| 4626 | 7999   |
| 56167| 11947  |

This table represents the companies that are part of the same group. For example, if I am company 4626 then my company group will be formed of [4626, 359468, 7999] and if I was company 7999 my company group would still be [4626, 359468, 7999]. Therefore, if I were company 56167, my company group would be [56167, 11947]

Then there is this other table called Sales:

| CompanyId| Product| Sales|
|:-----|:-------| ------|
| 4626  | Monitors| 1000|
| 4626  | Locks   | 300|
| 359468| Chairs  | 500|
| 359468| Lights  | 300|
| 7999  | Chairs  | 500|
| 4626  | Locks   | 300|
| 56167 | Locks   | 1000|
| 11947 | Tables  | 300|

This table represents the sales of a company.

I'm working on this query to get the total amount of sales per company group when you have "selected" either company in that group, for example: for the group 4626 is in the total sales would be 2300.

Here is my query so far:

SELECT
    s.CompanyId,
    SUM(s.Sales) AS 'Sales'
    FROM Sales AS s
    JOIN CompanyGroup AS cg ON s.SourceCompanyId = cg.CompanyId
    WHERE s.CompanyId = 4626
    GROUP BY s.CompanyId

which has this result

|CompanyId | Sales  |
| -------- | ------ |
| 4626     | 1600   |

But I would like to get just the total sum of sales for the entire group company, for example:

| Sales  |
| ------ |
| 2900   |

I'm just not sure how to add make it sum the data from the other companies in the current group and add them together.


Solution

  • Although you are new to SQL, your data structure as it is will be nothing but trouble. Can the query be done? Yes, but harder. I would first like to suggest an alternative to what you have to identify "groups". Create a second table of groups, then have all companies associated with said group. You could even have some clear-text content of the group such as

    CompanyGroups
    CompanyGroupID  CompanyGroupName
    1               Eastern Group
    2               Northern Group
    3               Technical Group
    4               Furniture Group
    
    Then the companies
    SourceCompanyId  CompanyGroupID
    4626             3
    359468           3
    7999             3
    56167            4
    11947            4
    

    So, there is one record per company and the known group associated.

    If a company can possibly be associated with multiple groups, you could have additional records per company and alternate group as well.

    Now, back to the regularly scheduled program and your query. You need to have one "common" group so all targets are associated, including the underlying source company in the group, such as your 4626 was the source, and the other two of 359468, 7999 are in the same. It expands on the other answer, but forces the left-most ID into a primary position.

    select distinct
          SourceCompanyID as GrpParent,
          SourceCompanyID as IncludedCompany
       from
          CompanyGroup cg
    UNION
    select
          cgParent.SourceCompanyID as GrpParent,
          cgTarget.TargetCompanyId as IncludedCompany
       from
          CompanyGroup cgParent
             JOIN CompanyGroup cgTarget
                on cgParent.SourceCompanyID = cgTarget.SourceCompanyID
    

    Notice the first part of the query is getting the source once even if they are associated with five other targets. We don’t want to duplicate counts because of duplicate sources. It holds its own ID as both the parent and the company to be included as part-of the group.

    The second starts again with the same parent, but gets the target as the included company. So, based on your data

    SourceCompanyId  TargetCompanyId
    4626             359468
    4626             7999
    56167            11947
    

    Would result as

    GrpParent   IncludedCompany
    -- first the distinct portion before union
    4626        4626
    56167       56167
    -- now the union portion
    4626        359468
    4626        7999
    56167       11947
    

    And you can see the five total records and the 4626 "Group" shows all three company IDs including itself on the right-side, similarly for 56167 having two entries with each respective on the included companies right-side.

    Now with this, you should be able to join the summation of data by the GROUP and not cause duplicated aggregations.

    select
          CompGrps.GrpParent,
          sum( CompSales.Sales ) as GroupTotalSales
       from
          ( select distinct
                  SourceCompanyID as GrpParent,
                  SourceCompanyID as IncludedCompany
               from
                  CompanyGroup cg
            UNION
            select
                  cgParent.SourceCompanyID as GrpParent,
                  cgTarget.TargetCompanyId as IncludedCompany
               from
                  CompanyGroup cgParent
                     JOIN CompanyGroup cgTarget
                        on cgParent.SourceCompanyID = cgTarget.SourceCompanyID
           ) as CompGrps
             JOIN
             ( SELECT
                     s.CompanyId,
                     SUM(s.Sales) AS Sales
                  FROM
                     Sales s
                 group by
                     s.CompanyId ) CompSales
                on CompGrps.IncludedCompany = CompSales.CompanyID
       group by
          CompGrps.GrpParent
       order by
          sum( CompSales.Sales ) desc
    

    So notice the first query getting distinct group companies, and the secondary querying from its own per-company sales can be joined on the company ID of itself, but summed based on the common group parent, thus giving totals the the outer level per GROUP.

    I also tacked on a simple order by to get largest sales sorted at the top. As you can see, it's a bit messier with existing structure, but can be done.

    The output should look something like

    GrpParent  GroupTotalSales
    4626       2900       (4626 had 1600, 359468 had 800, and 7999 had 500)
    56167      1300       (56167 had 1000, 11947 had 300)