Search code examples
sqlsqliteloopscountsubquery

Iterate through results of SQLite query as input to subsequent query


I have a single SQLite table that has the following fields representing metadata pulled from individual files stored on disk. There's a record for every file:

__path      denotes the full path and filename (in effect the PK)
__dirpath   denotes the directory path excluding the filename
__dirname   denotes the directory name in which the file is found
refid       denotes an attribute of interest, pulled from the underlying file on disk
  1. Files are grouped and stored by __dirname when created
  2. All files in __dirname should have identical refid, but the refid is sometimes absent
  3. As a starting point I would like to identify every __dirpath that has non-conforming files.

My query for identifying offending folders is as follows:

SELECT __dirpath
  FROM (
           SELECT DISTINCT __dirpath,
                           __dirname,
                           refid
             FROM source
       )
 GROUP BY __dirpath
HAVING count( * ) > 1
 ORDER BY __dirpath, __dirname;

Is it possible to iterate through a query's results and use each result as the input into another query without resorting to using something like Python alongside SQLite? As an example, to see records belonging to failed sets:

SELECT __dirpath, refid
  FROM source
 WHERE __dirpath = <nth result from aforementioned query>;

Solution

  • If you want all offending rows, one option is:

    select t.*
    from (
        select t.*,
            min(refid) over(partition by __dirpath, __dirname) as min_refid,
            max(refid) over(partition by __dirpath, __dirname) as max_refid
        from mytable t
    ) t
    where min_refid <> max_refid
    

    The logic is to compare the minimum and maximum refid over each group of rows having the same dir path and dir name. If they differ, then the row is offending.

    We could also use exists - which would better handle possible null values in refid:

    select t.*
    from mytable t
    where exists (
        select 1
        from mytable t1
        where 
            t1.__dirpath = t.__dirpath 
            and t1.__dirname = t.__dirname
            and t1.ref_id is not t.ref_id
    )