Search code examples
haskellarrowsopaleyeprofunctor

Array-aggregation across a link-table in Opaleye


I'm trying to construct an Opaleye query that matches the following SQL:

select * ,
    (select array_agg(tags.tagname)
     from articles_tags
     inner join tags on tags.id = articles_tags.tag_fk
         where articles_tags.article_fk = articles.id
    )
from articles

The tables involved (simplified) are:

articles: (id, title, content)
articles_tags: (article_fk, tag_fk)
tags: (id, tagname)

My goal is to query for articles that have one or more tags attached, and retrieve all attached tags as an array.

So far, I got the following elementary queries:

-- | Query all article-tag relations.
allTaggedArticlesQ :: OE.Select TaggedArticleR
allTaggedArticlesQ = OE.selectTable taggedArticlesTable

-- | Query article-tag relations for the given articles.
taggedArticlesQ :: OE.SelectArr PA.ArticleIdField TaggedArticleR
taggedArticlesQ = proc articleId -> do
  ta <- allTaggedArticlesQ -< ()
  OE.restrict -< articleFk ta .=== articleId
  returnA -< ta

-- | Join article-ids and tag names for the given subset of articles.
articleTagNamesQ :: OE.SelectArr PA.ArticleIdField ArticleTagR
articleTagNamesQ = proc articleIds -> do
  ta <- taggedArticlesQ -< articleIds
  tags <- PT.allTagsQ -< ()
  OE.restrict -< PT.tagKey tags .=== tagFk ta
  returnA -< ArticleTag (articleFk ta) (PT.tagName tags)

However, I cannot get the aggregation to work: The following does not type-check, and I do not understand how to compose this aggregation with the above query:

-- | Aggregate all tag names for all given articles
articleTagsQ :: PA.ArticleIdField -> OE.Select (PA.ArticleIdField, F (OE.SqlArray OE.SqlText))
articleTagsQ = OE.aggregate
          ( pArticleTag
              ArticleTag
                { atArticleFk = OE.groupBy,
                  atTagname = OE.arrayAgg
                }
          ) OE.selectTable articleTagNamesQ

In some blog posts and GitHub issues, I found a remark that the aggregation does not play nice with Product-Profunctors and Arrows and, therefore, cannot be included in an arrow query. Yet, I am relatively new to Haskell and haven't really understood the theory behind these two libraries (there does not seem to be a beginner-friendly documentation); therefore, I cannot come up with the general structure how to combine queries with aggregation. There are some examples by William Yao here, but I don't understand the general concept, so I can't apply these examples to my problem.

I would highly appreciate if someone can provide insight on how to compose aggregation with regular queries in Opaleye, thanks!


Solution

  • After crunching through several examples, here is the solution I finally managed to build and run:

    import           Control.Arrow
    import qualified Opaleye as OE
    import qualified Data.Profunctor.Product as PP
    
    type F field = OE.Field field 
    
    -- | Query all tags.
    allTagsQ :: OE.Select TagR
    allTagsQ = OE.selectTable tagsTable
    
    -- | Query all article-tag relations.
    allTaggedArticlesQ :: OE.Select TaggedArticleR
    allTaggedArticlesQ = OE.selectTable taggedArticlesTable
    
    -- | Join article-ids and tag names for all articles.
    articleTagNamesQ :: OE.Select (F OE.SqlInt8, F OE.SqlText)
    articleTagNamesQ = proc () -> do
      TaggedArticle {articleFk = aId, tagFk = tFk} <- allTaggedArticlesQ -< ()
      Tag {tagKey = tId, tagName = tn} <- allTagsQ -< ()
      OE.restrict -< tFk OE.(.===) tId -- INNER JOIN ON
      returnA -< (aId, tn)
    
    -- | Aggregate all tag names for all articles
    articleTagsQ :: OE.Select (F OE.SqlInt8, F (OE.SqlArray OE.SqlText))
    articleTagsQ =
      OE.aggregate (PP.p2 (OE.groupBy, OE.arrayAgg)) $
        arr (first) <<< articleTagNamesQ
    
    

    A row of the articles_tags table is represented in Haskell by the polymorphic TaggedArticle* Opaleye type, and similarly Tag* for tag rows.

    The key point is to select all rows of the two tables, then perform the join, and then finally do the aggregation. Because the aggregation function in Opaleye is neither an Arrow nor a ProductProfunctor, but the OE.aggregate function expects a Select a, I could not include the aggregation as part of a query written in arrow notation. Instead, I had to write a separate function that takes a Select a as input.

    Note that aggregation cannot be performed on the more general SelectArr. From the pacakge documentation: "By design there is no aggregation function of type Aggregator b b' -> \S.SelectArr a b -> S.SelectArr a b' . Such a function would allow violation of SQL's scoping rules and lead to invalid queries."

    My code above is somewhat simplified. I tried to use polymorphic types for keys. However, I could not figure out how to write all code in terms of these newtype wrappers; instead, I had to unwrap and rewrap the fields several times.

    Another issue I ran into was the definition of the row type that results from the JOIN. Initially, I defined a new polymorphic row type. But then I did not manage to properly unwrap the fields of that type so I could feed them into the OE.Aggregator. Therefore, I settled for the more verbose tuple notation above.