Search code examples
sqlitekotlinandroid-roomblacklist

Create a subquery using fields from a table


So I'm trying to build a system that allows users to tailor the content that appears in a feed by creating exclusionary rules. As a quick overview, here's a simple look at the tables I'm working with :

Table Name Fields
Feed PRIMARY KEY LONG postId
PRIMARY KEY INT feedId
STRING date
Post PRIMARY KEY LONG id
STRING creatorId
STRING title
STRING message
STRING uploadDate
Tag PRIMARY KEY AUTOGENERATE LONG id
FOREIGN KEY LONG parentPostId
STRING contents
BlacklistRule PRIMARY KEY AUTOGENERATE LONG id
STRING tableName
STRING fieldName
STRING contents

The Feed gives you a list of ids to pull from the Posts table. When a Post is rendered, all of its Tags are attached to it.

An earlier version of this system was limited to just blacklisted tags, it was simple to join the Tag table to the BlacklistedTag table and filter the results from there.

@Query("
SELECT DISTINCT Feed.postId FROM Feed
LEFT JOIN (
    SELECT DISTINCT Tag.parentPostId as postId FROM Tag
    INNER JOIN BlacklistedTag ON Tag.contents = BlacklistedTag.contents
    INNER JOIN Feed ON Tag.postId = Feed.postId
    WHERE Feed.feedId = :feedId
    ORDER BY datetime(Feed.date) DESC
) AS BlockedIds ON Feed.postId = BlockedIds.postId
WHERE Feed.feedId = :feedId
AND BlockedIds.postId IS NULL
ORDER BY datetime(date) DESC
LIMIT :pageSize
OFFSET :offset
")
fun getFilteredPostIdsByPage(
    pageSize : Int = 48,
    offset : Int = 0,
    feedId : Int = ContentFeedIds.Home
) : List<Long>

However, I wanted to make the system more generic. The BlacklistRules are supposed to allow the program to filter out successive subsets of queries, like Tags with contents of FOO (I hate seeing FOO in my content feed) or Posts with BAR in the title.

Each row in the BlacklistRule table represents an aggregated subset of ids that will be used to filter the Feed, but I'm not sure how to perform loops in Android's RoomDB library, or SQLite in general.

So my question is : How would I perform a loop where I continually append unique ids to a set, then use that set as the filter for my content feed? If I were to write this in another language it would work like this :

// create the complete list of blacklisted content
var blacklistedIds : List<Long> = emptyList<Long>()
for (rule : BlacklistRule in BlacklistRules) {
    val subquery : String = "SELECT UNIQUE postId FROM $rule.tableName " +
        "WHERE $rule.fieldName CONTAINS $rule.contents"
    // perform the subquery
    // add the ids from the subquery to the blacklistedIds list
}

// fetch and return a page of the feed where the ids are not present in the blacklist

Any help is appreciated. If this is a fundamentally flawed approach, I'd love suggestions on how to improve this process.


Solution

  • The solution could be along the lines of joining every rule with every post/feed/tag permutation which would result in a table from which the rule could be applied.

    For example suppose that the tables along with some data where:-

    CREATE TABLE IF NOT EXISTS feed (feedid INTEGER, postid INTEGER, date TEXT, PRIMARY KEY(feedid,postid));
    CREATE TABLE IF NOT EXISTS post (id INTEGER PRIMARY KEY, creatorId TEXT, title TEXT, message TEXT, uploaddate TEXT);
    CREATE TABLE IF NOT EXISTS tag (id INTEGER PRIMARY KEY, parentpostid REFERENCES post(id), contents TEXT);
    CREATE TABLE IF NOT EXISTS rule (id INTEGER PRIMARY KEY, tablename TEXT, fieldname TEXT, contents);
    
    INSERT OR IGNORE INTO post VALUES 
        (1,'Fred','fred''s first message','hi, how are you horrible',datetime('now','-2 days'))
        ,(2,'Mary','mary''s response to fred','fred, i was fine until you said what you said',datetime('now','-2 days','+30 minutes'))
        ,(3,'Fred','fred''s response to mary''s response','Mary I''m sorry I was kidding',datetime('now','-2 days','+30 minutes'))
        ,(4,'Mary','mary''s 2nd response','That is OK Fred you ugly so and so',datetime('now','-2 days','+30 minutes'))
        ,(5,'Fred','Fred again','All fine and agin sorry',datetime('now','-2 days','+30 minutes'))
    ;
    INSERT OR IGNORE INTO feed VALUES
        (100,1,datetime('now','-10 hours'))
        ,(200,1,datetime('now','-11 hours'))
        ,(202,2,datetime('now','-9 hours'))
        ,(203,2,datetime('now','-8 hours'))
    ;
    INSERT INTO rule (tablename,fieldname,contents) VALUES
        ('post','message','horrible')
        ,('post','message','nasty')
        ,('post','message','awful')
        ,('post','whatever','something')
        ,('tag','contents','ugly')
    ;
    INSERT OR IGNORE INTO tag (parentpostid,contents) VALUES
        (1,'ugly'),(1,'miserable'),(1,'happy'),(2,'happy'),(3,'happy'),(4,'happy'),(5,'ugly')
    ;
    

    Resulting in:-

    enter image description here

    enter image description here

    enter image description here

    and

    enter image description here

    The the following (SQL) could be the basis of ascertaining what is to be excluded SHOWN TO DEMOSTRATE THE PRINCIPLE(S):-

    /* DEMO exclude builder CTE */
    WITH 
        cte_exclude_builder AS (
            SELECT DISTINCT 
                post.*,
                feed.*,
                tag.*,
                rule.tablename, 
                rule.fieldname, 
                rule.contents
                ,CASE 
                    WHEN rule.tablename = 'post' AND fieldname = 'message'  THEN  instr(post.message,rule.contents)
                    WHEN rule.tablename = 'post' AND fieldname = 'whatever' THEN 0 /* approriate rule*/ 
                    ELSE 0
                END AS flg4
                , CASE
                    WHEN rule.tablename = 'tag' AND fieldname = 'contents' THEN instr(tag.contents,rule.contents)
                    ELSE 0
                END AS flg5
            FROM post 
            LEFT JOIN feed ON feed.postid=post.id
            LEFT JOIN tag ON tag.parentpostid = post.id
            JOIN rule
        )
    SELECT DISTINCT * FROM cte_exclude_builder
    

    The above, for the data above produces output as per:-

    enter image description here

    • note not all of the 55 rows output
    • the crux is the highlighted rows (some more) those that are blacklisted.
      • of course the JOIN ALL rules is also an important factor
    • flg4 and flg5 could be a single column (split so that the blacklist rule can be easily ascertained)

    Note that the above uses CTE's (Common Table Expression) see https://www.sqlite.org/lang_with.html

    To use the above to omit blacklisted posts (assumption that is what is required) then the following, which is very similar, can be used:-

    /* NEXT STEP exluding post using  */
    WITH 
        cte_exclude_builder AS (
            SELECT DISTINCT 
                post.*
                ,CASE 
                    WHEN rule.tablename = 'post' AND fieldname = 'message'  THEN  instr(post.message,rule.contents)
                    WHEN rule.tablename = 'post' AND fieldname = 'whatever' THEN 0 /* approriate rule*/
                END AS flg4
                , CASE
                    WHEN rule.tablename = 'tag' AND fieldname = 'contents' THEN instr(tag.contents,rule.contents)
                    ELSE 0
                END AS flg5
            FROM post 
            LEFT JOIN feed ON feed.postid=post.id
            LEFT JOIN tag ON tag.parentpostid = post.id
            JOIN rule
        )
    SELECT * FROM post WHERE id NOT IN (SELECT DISTINCT id FROM cte_exclude_builder WHERE flg4 OR flg5);
    ;
    
    • note for brevity the output is just the post The result being:-

    enter image description here

    i.e.

    • post id 1 was excluded because using horrible in the message column is a blacklist rule violation.
      • It also violated the use of the ugly tag.
    • post 2,3 and 4 are fine
    • post 5 has been excluded due to the violation of the ugly tag
    • note that

    The above is a little awkward as the rules (CASE WHEN THE ELSE END constructs) have to suit what ever tables/fieldnames should be considered.

    • simple fact is you cannot pass component names as parameters. The alternative would be to utilise @RawQuery.

    Note

    The above does not make any reference/indication in regards to incorporating the above into Room. It is simply a matter of cut and past into an @Query and then minor editing to eliminate the line feeds.

    The above also makes many assumptions/interpretations.