The dataset looks like this:
Project | Company | start year | end year |
---|---|---|---|
Project 1 | Company A | 2015 | 2020 |
Project 1 | Company A | 2015 | 2020 |
Project 1 | Company B | 2015 | 2020 |
Project 1 | Company C | 2015 | 2020 |
Project 2 | Company D | 2016 | 2021 |
Project 2 | Company D | 2016 | 2021 |
Project 2 | Company A | 2016 | 2019 |
Project 3 | Company E | 2019 | 2023 |
Project 3 | Company E | 2019 | 2023 |
Project 3 | Company E | 2019 | 2023 |
I am trying to get a number of unique companies (as they can be part of several projects) for each year, when they are part of a project with collaboration of at least two companies.
And those companies should be counted for each year they are in those projects.
So the result would be:
year | unique companies |
---|---|
2015 | 248 |
2016 | 279 |
2017 | 423 |
etc.
I have this data in Excel, haven't found any way to count it (work with Excel 2016).
I tried SQL Server Management Studio. My knowledge of SQL is not so wide, so I asked ChatGPT to help me with the query, we had quite a conversation and tried over 20 queries (as I explained again and shared the errors or unwanted results).
One of the queries we tried is this one. But the result was I believe only counting the company in the year the project started, as some years have very low numbers in the result.
WITH ProjectCompanyCTE AS
(
SELECT
[ID projektu] AS Project_ID,
[Ičo] AS Company_ID,
[Rok zahájení] AS Year
FROM
dbo.IsvavaiProjects
WHERE
[ID projektu] IN (SELECT [ID projektu]
FROM dbo.IsvavaiProjects
GROUP BY [ID projektu]
HAVING COUNT(DISTINCT [Ičo]) >= 2)
)
SELECT
Year,
COUNT(DISTINCT Company_ID) AS UniqueCompaniesCount
FROM
ProjectCompanyCTE
GROUP BY
Year
ORDER BY
Year;
You first need to start with a table of years (you can use GENERATE_SERIES
or some other method) and join your table to it. That splits out each row into its constituent years.
Then use a window function to count the number of companies per project. And simply group up by year and count distinct CompanyIDs, filtering the window function to 2 or more.
SELECT
ip.Year,
UniqueCompanies = COUNT(DISTINCT ip.CompanyID)
FROM (
SELECT
Year = year.value,
ip.Project_ID,
ip.CompanyID,
CompaniesPerProject = COUNT(*) OVER (PARTITION BY year.value, ip.Project_ID)
FROM GENERATE_SERIES(2015, 2023) year
JOIN dbo.IsvavaiProjects ip
ON year.value BETWEEN ip.StartYear AND ip.EndYear
) ip
WHERE ip.CompaniesPerProject >= 2
GROUP BY
ip.Year;