Search code examples
haskellfilterpandochakyll

How to use Pandoc filter within Hakyll?


I am sorry to ask such a question. But I am really new to Haskell. I searched the Internet for a whole day but didn't find any example.

I have a pandoc filter written in python (tikzcd.py). I want to use that filter to process my blog posts.

I guess I need to use unixFilter or pandocCompileWithTransform but my knowledge to Haskell is really not enough to find a solution myself.

So, could someone provide me an example?

-----------U--P--D--A--T--E--S---------------

@Michael gives an solution using pandocCompileWithTransformM and unixFilter. It works. But there is a problem.

When using a filter from command line, what I will do is

pandoc -t json -READEROPTIONS input.markdown | ./filter.py | pandoc -f JSON -WRITEROPTIONS -o output.html

or equivalently

pandoc --filter ./filter.py -READEROPTIONS -WRITEROPTIONS -o html

This command is shorter but it doesn't show the procedures.

But with pandocCompilerTransformM, it does something like

pandoc -t html -READEROPTIONS -WRITEROPTIONS input.mardown | pandoc -t JSON | ./filter.py | pandoc -f JSON -WRITEROPTIONS -o output.html

The difference is that the text that passed to filter.py is different: one is the contents directly produced from markdown, while the other is some texts produced from HTML that was produced from markdown. As you know, to convert something back and forth will always produce unexpected problem. So I think there may be a better solution.

PS. I've stared to learn Haskell. I hope I could solve this problem myself someday. Thank you!


Solution

  • In the end I think you would use both. Using this https://github.com/listx/listx_blog/blob/master/blog.hs as a model, the following will have the same shape as transformer has in it. transformer is used on lines 69-80 for 'posts' -- that is as the third argument to pandocCompilerWithTransformM, which is a (Pandoc -> Compiler Pandoc) Here you would need to add the absolute path to your python filter -- or the name if it's in $PATH -- and reader and writer options (e.g. defaultHakyllReaderOptions and defaultHakyllWriterOptions)

    import Text.Pandoc
    import Hakyll
    
    type Script = String 
    
    transformer
      :: Script         -- e.g. "/absolute/path/filter.py"
      -> ReaderOptions  -- e.g.  defaultHakyllReaderOptions
      -> WriterOptions  -- e.g.  defaultHakyllWriterOptions
      -> (Pandoc -> Compiler Pandoc)
    transformer script reader_opts writer_opts pandoc = 
        do let input_json = writeJSON writer_opts pandoc
           output_json <- unixFilter script [] input_json
           return $ 
              -- either (error.show) id $  -- this line needs to be uncommented atm.
              readJSON reader_opts output_json 
    

    similarly, (transformer "/usr/local/bin/myfilter.py" defaultHakyllReaderOptions defaultHakyllWriterOptions) might be used where (return . pandocTransform) is used, on line 125 of this example gist


    For debugging you might outsource everything to unixFilter:

    transform :: Script -> String -> Compiler String
    transform script md = do json0 <- unixFilter pandoc input_args md
                             json1 <- unixFilter script [] json0
                             unixFilter pandoc output_args json1
     where
       pandoc = "pandoc"
       input_args = words "-f markdown -t json" -- add others
       output_args = words "-f json -t html"    -- add others
    

    The three lines of the do block are the equivalent of the stages of unix piping in pandoc -t json | filter.py | pandoc -f json with whatever additional arguments.


    I think maybe you are right there is an extra layer of pandoc back and forth here. The pandocCompilerWithTransform(M) functions are for a direct Pandoc-> Pandoc function - it will be applied to the Pandoc hakyll comes up with. I think we should dispense with this and use the Pandoc libraries directly. A use of unixCompile might be like this.

    transformXLVI :: Script -> ReaderOptions -> WriterOptions -> String  -> Compiler Html
    transformXLVI script ropts wopts = fmap fromJSON . unixFilter script [] . toJSON 
      where 
        toJSON   = writeJSON wopts 
        --           . either (error . show) id -- for pandoc > 1.14
                   . readMarkdown ropts 
        fromJSON = writeHtml wopts
        --           . either (error . show) id
                   . readJSON ropts 
    

    I hope the principles are emerging from these variations! This should be pretty much the same as the preceding transform; we are using the pandoc library in place of calls to the pandoc executable.