Search code examples
sqlpercentagedifferencesql-function

lag function doesn't return value to the right direction SQL


I am trying to find the fastest growing countries based on the number of downloads between 2018 - 2020.

I tried to use the lag function to find the highest growth rate based on the total percentage of increase. However, my result returns all the lag values for previous_year_downloads in the wrong order and generated some unknown values, which resulted in the percentage change value being wrong as well.

I have attached my code as shown below:

select country,
        year,
        sum(total_downloads) as totals,
        sum(total_downloads) - (lag(sum(total_downloads),1) over(order by year asc)) as previous_year_downloads,
        round((sum(total_downloads) - lag(sum(total_downloads),1) over(order by year asc)) / (lag(sum(total_downloads),1) over (order by year asc)),2) 
        as percentage_change
  from cte
  group by country, year
# i also tried order by year but doesn't work either

Below is the sample data generated from google bigquery enter image description here

With just one country, all values are in the right order, but with two or more countries, values just appear everywhere and all values except row 1 are not even in the original data frame.

Thank you so much for the help, really appreciate it.


Solution

  • I don't know BigQuery but this is how you would do it in sql server.

    You probably want year and country in the window function (lag) - that is, partition by countries, then order by years. But perhaps also in the results you will want final ordering so you can see the data as you like to see it (by country, year or by year, country)

    -- ----------------------------------------------------------------------
    -- sample data (the "cte")
    declare @cte table (
        country nvarchar(100),
        year int,
        total_downloads int);
    
    insert into @cte values
        ('de', 2018, 10),
        ('fr', 2018, 40),
        ('en', 2020, 20),
        ('fr', 2020, 80),
        ('de', 2020, 25),
        ('en', 2018, 10),
        ('de', 2019, 20),
        ('fr', 2019, 30),
        ('en', 2019, 15);
    
    
    -- ----------------------------------------------------------------------
    -- ugly version
    select country,
            year,
            total_downloads as new,
            lag(total_downloads, 1) over(partition by country order by year asc) as old,
            total_downloads - lag(total_downloads, 1) over(partition by country order by country, year asc) as new_minus_old,
            (cast((total_downloads - lag(total_downloads, 1) over(partition by country order by country, year asc)) as decimal(12,3)) / (lag(total_downloads, 1) over(partition by country order by year asc))) as PctChange
    from @cte;
    
    
    -- ----------------------------------------------------------------------
    -- cleaner version
    select
        T.country,
        T.year,
        T.new,
        T.old,
        T.new - T.old as new_minus_old,
        (T.new - T.old)/cast(T.old as decimal(12,4)) as PctChange
    from 
    (
        select 
            country, 
            year,
            total_downloads as new,
            lag(total_downloads, 1) over(partition by country order by year asc) as old
        from @cte
    ) T;
    

    Result:

    country year new old new_minus_old PctChange
    de 2018 10 NULL NULL NULL
    de 2019 20 10 10 1.00
    de 2020 25 20 5 0.25
    en 2018 10 NULL NULL NULL
    en 2019 15 10 5 0.50
    en 2020 20 15 5 0.33
    fr 2018 40 NULL NULL NULL
    fr 2019 30 40 -10 -0.25
    fr 2020 80 30 50 1.66