I am trying to find the fastest growing countries based on the number of downloads between 2018 - 2020.
I tried to use the lag function to find the highest growth rate based on the total percentage of increase. However, my result returns all the lag values for previous_year_downloads
in the wrong order and generated some unknown values, which resulted in the percentage change value being wrong as well.
I have attached my code as shown below:
select country,
year,
sum(total_downloads) as totals,
sum(total_downloads) - (lag(sum(total_downloads),1) over(order by year asc)) as previous_year_downloads,
round((sum(total_downloads) - lag(sum(total_downloads),1) over(order by year asc)) / (lag(sum(total_downloads),1) over (order by year asc)),2)
as percentage_change
from cte
group by country, year
# i also tried order by year but doesn't work either
Below is the sample data generated from google bigquery
With just one country, all values are in the right order, but with two or more countries, values just appear everywhere and all values except row 1 are not even in the original data frame.
Thank you so much for the help, really appreciate it.
I don't know BigQuery but this is how you would do it in sql server.
You probably want year and country in the window function (lag) - that is, partition by countries, then order by years. But perhaps also in the results you will want final ordering so you can see the data as you like to see it (by country, year or by year, country)
-- ----------------------------------------------------------------------
-- sample data (the "cte")
declare @cte table (
country nvarchar(100),
year int,
total_downloads int);
insert into @cte values
('de', 2018, 10),
('fr', 2018, 40),
('en', 2020, 20),
('fr', 2020, 80),
('de', 2020, 25),
('en', 2018, 10),
('de', 2019, 20),
('fr', 2019, 30),
('en', 2019, 15);
-- ----------------------------------------------------------------------
-- ugly version
select country,
year,
total_downloads as new,
lag(total_downloads, 1) over(partition by country order by year asc) as old,
total_downloads - lag(total_downloads, 1) over(partition by country order by country, year asc) as new_minus_old,
(cast((total_downloads - lag(total_downloads, 1) over(partition by country order by country, year asc)) as decimal(12,3)) / (lag(total_downloads, 1) over(partition by country order by year asc))) as PctChange
from @cte;
-- ----------------------------------------------------------------------
-- cleaner version
select
T.country,
T.year,
T.new,
T.old,
T.new - T.old as new_minus_old,
(T.new - T.old)/cast(T.old as decimal(12,4)) as PctChange
from
(
select
country,
year,
total_downloads as new,
lag(total_downloads, 1) over(partition by country order by year asc) as old
from @cte
) T;
Result:
country | year | new | old | new_minus_old | PctChange |
---|---|---|---|---|---|
de | 2018 | 10 | NULL | NULL | NULL |
de | 2019 | 20 | 10 | 10 | 1.00 |
de | 2020 | 25 | 20 | 5 | 0.25 |
en | 2018 | 10 | NULL | NULL | NULL |
en | 2019 | 15 | 10 | 5 | 0.50 |
en | 2020 | 20 | 15 | 5 | 0.33 |
fr | 2018 | 40 | NULL | NULL | NULL |
fr | 2019 | 30 | 40 | -10 | -0.25 |
fr | 2020 | 80 | 30 | 50 | 1.66 |