I am working on a project in power bi when I making the data modeling, one of the columns has multiple values about country code, I want to separate them and then transform them into a dimension table.
[enter image description here]
but when I merge it with the origin table which considered to be a fact, I have a problem of increasing the size of the fact table as an example from 10000 rows to 84000 rows, also it causes many to many relationship if I add an index for the fact table, is this way will be right?
can you suggest any other way much optimized and memory size saving?
I attached a screenshot for the main column and screenshots for the two dimension tables I created from it.
[enter image description here]
First, we need to map each [codCountry]
to the respective individual [Country-Code]
. This is best performed in PowerQuery because we will be parsing strings.
[CodCountry]
. This will remove all columns but [CodCountry]
, and will remove all duplicate values in that column. Generally the group-by function is better than the distinct function because it will query fold in more data sources. Remove the required aggregation string in step editor.Text.Split()
function.Text.Split( [codCountry], "," )
.
Reference this new table to make a new table (like in step 1). This will be your country dimension table if you don't already have one. If you do, skip this step. After creating the new table, just remove all columns but country-code and remove duplicates from it.
Load these changes to your model and create relationships between the tables. The relationship between the fact table and the country code mapping table should be many-to-many (bi-directional).
This will allow you to filter your fact table by country code while keeping your mapping table as minimal as possible.