Search code examples
luapandocpandoc-citeproc

Pandoc Lua filter to replace specific English words to Chinese words in citations


When writing a Chinese paper, both Chinese and English papers could be cited. However, styles are slightly differently. The example is as follows:

Cite an English article (Smith et al. 2022), and cite a Chinese article (张三 等 2018).

In other words, for papers with multiple authors, et al. is used for English papers, while is applied for Chinese papers. Considering that Citation Style Language cannot handle multiple languages, I’d ask help for Lua filter.

A Markdown file named test.md as an example:

Cite an English article [@makarchev2022], and cite a Chinese article [@luohongyun2018].

Then run the command below:

pandoc -C -t native test.md

And the output of the main body:

[ Para
    [ Str "Cite"
    , Space
    , Str "an"
    , Space
    , Str "English"
    , Space
    , Str "article"
    , Space
    , Cite
        [ Citation
            { citationId = "makarchev2022"
            , citationPrefix = []
            , citationSuffix = []
            , citationMode = NormalCitation
            , citationNoteNum = 1
            , citationHash = 0
            }
        ]
        [ Str "(Makarchev"
        , Space
        , Str "et"
        , Space
        , Str "al."
        , Space
        , Str "2022)"
        ]
    , Str ","
    , Space
    , Str "and"
    , Space
    , Str "cite"
    , Space
    , Str "a"
    , Space
    , Str "Chinese"
    , Space
    , Str "article"
    , Space
    , Cite
        [ Citation
            { citationId = "luohongyun2018"
            , citationPrefix = []
            , citationSuffix = []
            , citationMode = NormalCitation
            , citationNoteNum = 2
            , citationHash = 0
            }
        ]
        [ Str "(\32599\32418\20113"
        , Space
        , Str "et"
        , Space
        , Str "al."
        , Space
        , Str "2018)"
        ]
    , Str "."
    ]

Because @luohongyun2018 is a Chinese bibliography, I want to replace the last English et al. followed it, i.e.:

, Str "et"
, Space
, Str "al."

to an Chinese word :

, Str "\31561"

Is it possible to make it via Lua filter? Following the example in the Lua filter page, I have tried but didn’t make it by myself.

Any suggestions would be appreciated. Thanks in advance.


Solution

  • The filter below does two things: it checks if the citation text contains Chinese characters and, if so, then continues to to replace the et al..

    The test for Chinese characters is a bit fragile; it could be made more robust by using the utf8.codepoint function from standard Lua library instead.

    function Cite (cite)
      return cite:walk{
        Inlines = function (inlines)
          local has_cjk = false
          inlines:walk {
            Str = function (s)
              has_cjk = has_cjk or
                pandoc.layout.real_length(s.text) > pandoc.text.len(s.text)
            end
          }
          -- do nothing if this does not contain wide chars.
          if not has_cjk then
            return nil
          end
    
          local i = 1
          local result = pandoc.Inlines{}
          while i <= #inlines do
            if i + 2 <= #inlines and
              inlines[i].text == 'et' and
              inlines[i+1].t == 'Space' and
              inlines[i+2].text == 'al.' then
              result:insert(pandoc.Str '等')
              i = i + 3
            else
              result:insert(inlines[i])
              i = i + 1
            end
          end
          return result
        end
      }
    end