Search code examples
powerbidaxdata-analysisanalysis

Not sure why my answer is not correct when calculating relationships in dataset


I'm trying to find the average number of movies done together(as in per actor/director pair) per year of director's longevity (without modifying the original data or using calculated columns).

Thanks to the helpful people here, I have this code:

AvgMoviesLong = 

VAR T1 =
    SUMMARIZE (
        FILTER (
            'netflix',
            [director] <> BLANK ()
                && [cast] <> BLANK ()
                && [show_id] <> BLANK ()
                && [director] <> [cast]
                && RELATED('Content'[category]) = "Movie"
        ),
        'netflix'[director],
        'netflix'[cast],
        "Longevity",
            CALCULATE (
                MAX ( 'Content'[release_year] ) - MIN ( 'Content'[release_year] ) + 1,
                ALLEXCEPT ( 'netflix', 'Netflix'[cast] )
            ),
        "# of movies", COUNT ( 'netflix'[cast] )
    )
RETURN
    DIVIDE ( SUMX ( T1, [# of movies] ), SUMX ( T1, [Longevity] ) )

The problem is that the values are between 15 and 20x smaller than the correct answer. My original code came to the same answer, but was much messier so I feel like this is the right method, but there's another angle I haven't thought of that is causing. I'm relatively new to DAX/Power BI so there might be something obvious I'm missing.

Possible helpful information:

  1. Current total with this calculation is 0.04, correct answer is 0.67.
  2. Blanks have been filtered out for Director, Cast, and show_id.
  3. We only want movies, so TV Shows are filtered out.

Solution

  • I imagine there must be simpler methods, though this is how I approached the problem.

    AvgMoviesLong =
    VAR T1 =
        SUMMARIZE (
            FILTER (
                'netflix',
                [director] <> BLANK ()
                    && [cast] <> BLANK ()
                    && [show_id] <> BLANK ()
                    && [director] <> [cast]
                    && RELATED ( 'Content'[category] ) = "Movie"
            ),
            'netflix'[director],
            'netflix'[cast],
            "Longevity",
                CALCULATE (
                    MAX ( 'Content'[release_year] ) - MIN ( 'Content'[release_year] ) + 1,
                    ALLEXCEPT ( Netflix, 'Netflix'[director] ),
                    CROSSFILTER ( 'Content'[show_id], Netflix[show_id], BOTH )
                ),
            "# of movies", COUNT ( 'netflix'[cast] )
        )
    RETURN
        DIVIDE ( SUMX ( T1, [# of movies] ), SUMX ( T1, [Longevity] ) )