I have a question related to my previous one.
What I have is a database that looks like:
category price date
-------------------------
Cat1 37 2019-03
Cat2 65 2019-03
Cat3 34 2019-03
Cat1 45 2019-03
Cat2 100 2019-03
Cat3 60 2019-03
This db has hundred of categories and comes from another one that has different attributes for each observation.
With this code:
WITH table AS
(
SELECT
category, price, date,
substring(date, 1, 4) AS year,
substring(date, 6, 2) as month
FROM
original_table
WHERE
(year = "2019" or year = "2020")
AND (month = "03")
AND product = "XXXXX"
ORDER BY
anno
)
-- I get this from a bigger table, but prefer to make small steps
-- that anyone in the fute can understand where this comes from as
-- the original table is expected to grow fast
SELECT
category,
ROUND(1.0 * next_price/ price - 1, 2) Pct_change,
SUBSTR(Date, 1, 4) || '-' || SUBSTR(next_date, 1, 4) Period,
tipo_establecimiento
FROM
(SELECT
*,
LEAD(Price) OVER (PARTITION BY category ORDER BY year) next_price,
LEAD(year) OVER (PARTITION BY category ORDER BY year) next_date,
CASE
WHEN (category_2>= 35) AND (category_2 <= 61)
THEN 'S'
ELSE 'N'
END 'tipo_establecimiento'
FROM
table)
WHERE
next_date IS NOT NULL AND Pct_change >= 0
ORDER BY
Pct_change DESC
This code gets me a view of the data that looks like:
category Pct_change period
cat1 0.21 2019-2020
cat2 0.53 2019-2020
cat3 0.76 "
This is great! But my next view has to take this one and provide me with a range that shows how many categories are in each range.
It should look like:
range avg num_cat_in
[0.1- 0.4] 0.3 3
This last table is just an example of what I expect
I have been trying with a code that looks like this but i get nothing
WITH table AS (
SELECT category, price, date, substring(date, 1, 4) AS year, substring(date, 6, 2) as month
FROM original_table
WHERE (year= "2019" or year= "2020") and (month= "03") and product = "XXXXX"
order by anno
)
-- I get this from a bigger table, but prefer to make small steps that anyone in the future can understand where this comes from as the original table is expected to grow fast
SELECT category,
ROUND(1.0 * next_price/ price - 1, 2) Pct_change,
SUBSTR(Date, 1, 4) || '-' || SUBSTR(next_date, 1, 4) Period,
tipo_establecimiento
FROM (
SELECT *,
LEAD(Price) OVER (PARTITION BY category ORDER BY year) next_price,
LEAD(year) OVER (PARTITION BY category ORDER BY year) next_date,
CASE
WHEN (category_2>= 35) AND (category_2 <= 61)
THEN 'S'
ELSE 'N'
END 'tipo_establecimiento'
FROM table
)
WHERE next_date IS NOT NULL AND Pct_change>=0
ORDER BY Pct_change DESC
WHERE next_date IS NOT NULL AND Pct_change>=0
)
SELECT
count(CASE WHEN Pct_change> 0.12 AND Pct_change <= 0.22 THEN 1 END) AS [12 - 22],
count(CASE WHEN Pct_change> 0.22 AND Pct_change <= 0.32 THEN 1 END) AS [22 - 32],
count(CASE WHEN Pct_change> 0.32 AND Pct_change <= 0.42 THEN 1 END) AS [32 - 42],
count(CASE WHEN Pct_change> 0.42 AND Pct_change <= 0.52 THEN 1 END) AS [42 - 52],
count(CASE WHEN Pct_change> 0.52 AND Pct_change <= 0.62 THEN 1 END) AS [52 - 62],
count(CASE WHEN Pct_change> 0.62 AND Pct_change <= 0.72 THEN 1 END) AS [62 - 72],
count(CASE WHEN Pct_change> 0.72 AND Pct_change <= 0.82 THEN 1 END) AS [72 - 82]
Thank you!!!
cf. my comment, I'm first assuming that your ranges are not hard-coded and that you wish to evenly split your data across quantiles of Prc_change. What this means is the calculation will figure out the ranges which split your sample as uniformly as possible. In this case, the following would work (where theview is the name of your previous view which calculates percentages):
select
concat('[',min(Pct_change),'-',min(Pct_change),']') as `range`
, avg(Pct_change) as `avg`
, count(*) as num_cat_in
from(
select *
, ntile(5)over(order by Pct_change) as bin
from theview
) t
group by bin
order by bin;
Here is a fiddle.
If on the other hand your ranges are hard-coded, I assume the ranges are in a table such as the one I create:
create table theranges (lower DOUBLE, upper DOUBLE);
insert into theranges values (0,0.2),(0.2,0.4),(0.4,0.6),(0.6,0.8),(0.8,1);
(You have to make sure that the ranges are non-overlapping. By convention I include percentages in the range from the lower bound included to the upper bound excluded, except for the upper bound of 1 which is included.) It is then a matter of left-joining the tables:
select
concat('[',lower,'-',upper,']') as `range`
, avg(Pct_change) as `avg`
, sum(if(Pct_change is null, 0, 1)) as num_cat_in
from theranges left join theview on (Pct_change>=lower and if(upper=1,true,Pct_change<upper))
group by lower, upper
order by lower;
(Note that in the bit that says upper=1
, you must change 1 to whatever your highest hard-coded range is; here I am assuming your percentages are between 0 and 1.)
Here is the second fiddle.