How to use Pandoc filter within Hakyll?

I am sorry to ask such a question. But I am really new to Haskell. I searched the Internet for a whole day but didn't find any example.

I have a pandoc filter written in python (tikzcd.py). I want to use that filter to process my blog posts.

I guess I need to use unixFilter or pandocCompileWithTransform but my knowledge to Haskell is really not enough to find a solution myself.

So, could someone provide me an example?

-----------U--P--D--A--T--E--S---------------

~~@Michael gives an solution using pandocCompileWithTransformM and unixFilter. It works. But there is a problem.~~

~~When using a filter from command line, what I will do is~~

pandoc -t json -READEROPTIONS input.markdown | ./filter.py | pandoc -f JSON -WRITEROPTIONS -o output.html

~~or equivalently~~
pandoc --filter ./filter.py -READEROPTIONS -WRITEROPTIONS -o html
~~This command is shorter but it doesn't show the procedures.~~

~~But with pandocCompilerTransformM, it does something like~~

pandoc -t html -READEROPTIONS -WRITEROPTIONS input.mardown | pandoc -t JSON | ./filter.py | pandoc -f JSON -WRITEROPTIONS -o output.html

The difference is that the text that passed to filter.py is different: one is the contents directly produced from markdown, while the other is some texts produced from HTML that was produced from markdown. As you know, to convert something back and forth will always produce unexpected problem. So I think there may be a better solution.

PS. I've stared to learn Haskell. I hope I could solve this problem myself someday. Thank you!

Solution

In the end I think you would use both. Using this https://github.com/listx/listx_blog/blob/master/blog.hs as a model, the following will have the same shape as transformer has in it. transformer is used on lines 69-80 for 'posts' -- that is as the third argument to pandocCompilerWithTransformM, which is a (Pandoc -> Compiler Pandoc) Here you would need to add the absolute path to your python filter -- or the name if it's in $PATH -- and reader and writer options (e.g. defaultHakyllReaderOptions and defaultHakyllWriterOptions)

import Text.Pandoc
import Hakyll

type Script = String 

transformer
  :: Script         -- e.g. "/absolute/path/filter.py"
  -> ReaderOptions  -- e.g.  defaultHakyllReaderOptions
  -> WriterOptions  -- e.g.  defaultHakyllWriterOptions
  -> (Pandoc -> Compiler Pandoc)
transformer script reader_opts writer_opts pandoc = 
    do let input_json = writeJSON writer_opts pandoc
       output_json <- unixFilter script [] input_json
       return $ 
          -- either (error.show) id $  -- this line needs to be uncommented atm.
          readJSON reader_opts output_json

similarly, (transformer "/usr/local/bin/myfilter.py" defaultHakyllReaderOptions defaultHakyllWriterOptions) might be used where (return . pandocTransform) is used, on line 125 of this example gist

For debugging you might outsource everything to unixFilter:

transform :: Script -> String -> Compiler String
transform script md = do json0 <- unixFilter pandoc input_args md
                         json1 <- unixFilter script [] json0
                         unixFilter pandoc output_args json1
 where
   pandoc = "pandoc"
   input_args = words "-f markdown -t json" -- add others
   output_args = words "-f json -t html"    -- add others

The three lines of the do block are the equivalent of the stages of unix piping in pandoc -t json | filter.py | pandoc -f json with whatever additional arguments.

I think maybe you are right there is an extra layer of pandoc back and forth here. The pandocCompilerWithTransform(M) functions are for a direct Pandoc-> Pandoc function - it will be applied to the Pandoc hakyll comes up with. I think we should dispense with this and use the Pandoc libraries directly. A use of unixCompile might be like this.

transformXLVI :: Script -> ReaderOptions -> WriterOptions -> String  -> Compiler Html
transformXLVI script ropts wopts = fmap fromJSON . unixFilter script [] . toJSON 
  where 
    toJSON   = writeJSON wopts 
    --           . either (error . show) id -- for pandoc > 1.14
               . readMarkdown ropts 
    fromJSON = writeHtml wopts
    --           . either (error . show) id
               . readJSON ropts

I hope the principles are emerging from these variations! This should be pretty much the same as the preceding transform; we are using the pandoc library in place of calls to the pandoc executable.