Search code examples
powerbidaxpowerbi-desktopmdaxstudio

How do I plot frequency distribution if I know the lower limit and upper limit of the weight of products?


I have a bunch of data which has the weight of different products with and without lid.

Location Product Date of manufacturing Weight without lid Weight with lid Qty
US Food 04/20/2024 3 5 2
US Food 04/14/2024 72 79 1
US Food 04/20/2024 79 84 2
US Food 04/19/2024 82 86 3
US Drinks 04/20/2024 69 76 1
US Drinks 04/14/2024 8 13 5
US Drinks 04/17/2024 51 56 3
US Drinks 04/15/2024 4 6 4
Australia Food 04/19/2024 13 14 2
Australia Food 04/18/2024 8 17 5
Australia Food 04/21/2024 89 97 2
Australia Food 04/19/2024 5 13 4
Australia Stationery 04/20/2024 63 67 4
Australia Stationery 04/19/2024 74 76 6
Australia Stationery 04/20/2024 72 80 5
Australia Stationery 04/12/2024 32 34 3

What I'm trying to plot is the frequency distribution of the weights. On x-axis, I have weight ranging from 0 to 100 and y-axis is my frequency distribution. If, for example, I have a product whose weight with and without product is 72 and 79 respectively, the y-axis should be 1 for all weight on the x-axis between 72 and 79.

I'm expecting that I can plot a graph with data similar to below.

Weight Distribution
80 7
81 2
82 5
83 5
84 5
85 3
86 3
87
88
89 2
90 2

Any help would be greatly appreciated. Thanks.


Solution

  • Update #03: 10-05-2024

    Given the new requirement to add the total quantity instead of counting the products, the new measure should be:

    FrequencyDist_02 = 
    VAR CurrentWeight = SELECTEDVALUE(FP_RANGE[Parameter])
    VAR _filteredTable = FILTER(
        'Table',
        [Weight without lid] <= CurrentWeight &&
        [Weight with lid] >= CurrentWeight
    )
    VAR totalQuantity = SUMX(
        _filteredTable,
        [Quantity]
    )
    
    VAR _result = IF(ISEMPTY(_filteredTable), 0, totalQuantity)
    RETURN _result
    
    • I had to create a new table, so it is now 'Table' instead of SampleTable.

    Final result: enter image description here


    Update #02: 08-05-2024 After hearing the new requirements, you could work with a disconnected table that generates the values between 0 and 100, and an additional measure that iterates over each value between 0 and 100, filters the table to get all rows that have that current value between 'Weight without lid' and 'Weight with lid', and return the count of that.

    Disconnected table:

    FP_RANGE = GENERATESERIES(0, 100, 1)
    

    Measure

    FrequencyDist = 
    VAR CurrentWeight = SELECTEDVALUE(FP_RANGE[Parameter])
    VAR _filteredTable = FILTER(
        SampleTable,
        [Weight without lid] <= CurrentWeight &&
        [Weight with lid] >= CurrentWeight
    )
    VAR counter = IF(ISEMPTY(_filteredTable), 0, COUNTROWS(_filteredTable))
    RETURN counter
    
    • The variable CurrentWeight will select the current selected value of the 0-100 series. In a table visual, it will just go row-by-row, so 0, 1, etc.
    • Afterwards, the table will be filtered whether that particular value is within a particular range of your products without lid and with lid values. I have added a screenshot below to get an idea of the value "13".
      • Note: if you slice later on by 'Product' or 'Location', the original 'SampleTable' will be different, and all calculations will be different.
    • Finally, it returns 0 if there are no rows, otherwise it returns the amount of rows.

    Illustration of filtered table with value "13" enter image description here

    Final result: enter image description here


    Frequency distributions are not that easy in Power BI when you have gaps and the amount of sample data is small. Perhaps you could go for a third-party way using R or Python scripting that plots a density distribution (graph on the left) or histogram (graph on the right) for any specific product filter? When a particular product filter is selected, the graph gets recalculated.

    This is my end result: enter image description here

    Setup: [Image]

    Using the R visual, you need to include the values you would like to do calculations with. In this case, I have added the 'Weight with lid' and 'Weight without lid' columns. Afterwards, you need to create the R code that would output a plot in R (which is then shown in Power BI). I have used the following code:

    # The following code to create a dataframe and remove duplicated rows is always executed and acts as a preamble for your script: 
    
    # dataset <- data.frame(Weight with lid, Weight without lid)
    # dataset <- unique(dataset)
    
    # Paste or type your script code here:
    dataset <- unlist(dataset, use.names=FALSE)
    plot(density(dataset)) # OR use histogram(dataset)
    
    • The unlist function flattens the dataframe into a single vector, making it easy to calculate a density graph or a histogram.
    • The plot(density(dataset)) line plots the density distribution of the data according to the R algorithm. The default output is, in my opinion, quite ugly, but it can be customised or you could use third-party packages like ggplot2 (again, read the documentation on how to use it in Power BI).

    At the same time, it also fits your requirement to have a dynamic X-axis, but this is also customisable inside R.