Search code examples
sqloracle-databasedate-range

SQL to Count Records that Existed by Decade


I have an Oracle a table that has 3 columns. FILE_NUM, START_DATE, END_DATE. FILE_NUM is a unique identifier and so each row is unique.

FILE_NUM START_DATE END_DATE
1111 Apr. 04, 1977 Jun. 30, 1984
1112 Jan. 03, 1987 Sep. 05, 2010
1113 May. 14, 1962 Jan. 30, 1998
1114 Oct. 09, 2001 Aug. 22, 2025
1113 May. 14, 1962 Jan. 30, 1998
1114 Nov. 18, 2003 Aug. 22, 2028
etc

I want to query a count of files that existed per decade. Something like:

Decade Count
1960-1969 10
1970-1979 25
1980-1989 42

The logic is, if the file existed at any point in any decade, count it in each of those decades. So for file_num 1112, it should get counted in the 1980's, 1990's, 2000's, and 2010's.

I don't even know if this is possible. I've tinkered around in my query a few times, but I'm not sure I know enough functions or technique to get this sort of result.

Thanks for any help.

EDIT:

I should clarify how the counts will be made. I need to approach the date ranges in 3 ways in order to capture which file_nums existed within each decade.

  1. If it started between the decade range, count it.
  2. If it ended between the decade range, count it.
  3. If it started before the decade range and ended after the decade range, count it.

I crafted up the following SQL. I am still trying to figure out if it's doing what I want it to do. If anyone has suggestion

with decades as
  (
  select
    concat(concat(floor(extract(year from t.start_date)/10)*10, '-'), floor(extract(year from t.start_date)/10)*10 + 9) DECADE
  from 
    table t
  group by
    concat(concat(floor(extract(year from t.start_date)/10)*10, '-'), floor(extract(year from t.start_date)/10)*10 + 9)
  )

select
  count(a.file_num),
  decades.decade
from
  table a,
  decades
where 
  extract(year from a.start_date) between extract(year from to_date(SUBSTR(decades.decade,0,4), 'YYYY')) and extract(year from to_date(SUBSTR(decades.decade,6,9), 'YYYY'))
  or
  extract(year from a.end_date) between extract(year from to_date(SUBSTR(decades.decade,0,4), 'YYYY')) and extract(year from to_date(SUBSTR(decades.decade,6,9), 'YYYY'))
  or
  (extract(year from a.start_date) < extract(year from to_date(SUBSTR(decades.decade,0,4), 'YYYY'))
   and
   extract(year from a.end_date) > extract(year from to_date(SUBSTR(decades.decade,6,9), 'YYYY'))
  )  
group by
  decades.decade
order by
  decades.decade

Solution

  • You can also take advantage of the connect by clause to do that. I assume you don't have any duplicate rows (these columns FILE_NUM, START_DATE, END_DATE should be unique) in your real data.

    SELECT dec_start, dec_end, COUNT(*) nb
    FROM (
        SELECT t.*, level
          , 10 * trunc( extract ( year from (START_DATE) ) / 10 ) + 10 * LEVEL - 10 dec_start
          , 10 * trunc( extract ( year from (START_DATE) ) / 10 ) + 10 * LEVEL - 1 dec_end
        FROM YourTable T
        CONNECT BY 
            10 * TRUNC( EXTRACT ( YEAR FROM (START_DATE) ) / 10 ) + 10 * LEVEL - 10 
                <   10 * CEIL( EXTRACT ( YEAR FROM (END_DATE) ) / 10 )
        AND PRIOR FILE_NUM = FILE_NUM
        AND PRIOR START_DATE = START_DATE
        AND PRIOR END_DATE = END_DATE
        AND PRIOR SYS_GUID() IS NOT NULL
    )
    group by dec_start, dec_end 
    order by dec_start, dec_end
    ;
    

    demo on db<>fiddle