I am pretty new to databases and SQL in general and have a pretty important task that I do not know how to solve.
Given that I have a Company (Company A) with three different subsidiaries (Sub A, Sub B, Sub C). All three subsidiaries produce the same products which can be divided into three product categories:
My task is to create a star scheme that allows me to get the following information: Total Revenues of Sub A and Sub B for every Product from Product Category B.
The needed format of the result is a table containing three columns "Subsidiary Name", "Product Name" and "Revenue".
What I have been trying to do so far, is to create the following Fact Table, Dimension and Keys. The Primary Keys of the Dimensions are connected to the Foreign Keys of my Fact Table.
So my questions are:
Am I even on the right track or is this approach completely wrong?
How would I actually "fill" my tables with example data to be able to write a query that answers the given question above?
How would such a query look like?
For now, I am using a local MySQL server created with XAMPP and am connecting to this using IntelliJ IDEA Ultimate 2016. The diagramm above has also bee created using this software.
I hope that someone is able and willing to help me.
Also, please forgive any mistakes (or please tell me what I did wrong) as I am both new to SQL and especially new to Stackoverflow.
Kind regards
Edit:
I have used a different tool to create and visualize my tables. Also, as fenix mentioned, I have added several different Dimensions such as a Time/Date Table. However, in order to solve my task, I suppose I only need the two Dimensions dimCompany (Which represents the different subsidiaries) and dimProduct.
RANDBETWEEN
function.SELECT dp.ProductName,sum(fs.revenue) as total_revenue
FROM fact_sales fs
JOIN dim_product dp ON fs.FK_Product = dp.PK_Product
JOIN dim_company dc ON fs.FK_Company = dc.PK_Company
WHERE dp.Product_Type = 'Product Category B'
AND dc.CompanyName IN ('Sub A','Sub B')
GROUP BY dp.PK_Product;
Unfortunately, some of Data Marts are not well structured so it doesn't contain referential integrity (anti-pattern), so you would have to go with LEFT JOIN
instead of INNER JOIN
. Note that INNER JOIN
is same as JOIN
. A bit about JOIN
you can visualise here.
Edit:
Regarding your comment, it's hard to make good balance not to make this answer too broad because this QA site should serve for specific questions and answers.
Since DWH is still a databases, it inherited most of databases principles. One of those are table keys. Primary Key is unique row identifier, and when database design allows (in DWH world it always should be) keys are integers (big integers exactly). See this for more info.
Purpose? You can have two companies with same names but different states, so you need some key to distinct them.
Why it should be integer? It's more technical question, and one of the answers is of query optimisation. CPU will match two integers (4 or 8 bytes) much faster rather than strings (VARCHAR
in database world) with 100+ bytes (or however you specify).
Why someone uses VARCHAR
as primary key? Well, sometimes, being as native as possible is good approach. So, VARCHAR
primary key for product could be combination of product brand and counter (from supplier), and if you are re-seller, ERP could give Microsoft Office 2016 key like MCR-OFF-123. In DWH it is recommended to store native key (whatever it is, int or varchar) and to assign surrogate primary key.
Sub A has the key '1', Sub B has the key '2
quote from your comment.
Yes, you're right.