Chapter 5 of Real World Haskell introduces a pretty-printing library, specifically with an abstract Doc
, in the context of pretty-printing JSON:
Instead of rendering straight to a string, our Prettify module will use an abstract type that we'll call Doc. By basing our generic rendering library on an abstract type, we can choose an implementation that is flexible and efficient. If we decide to change the underlying code, our users will not be able to tell.
However, (as several commentors wrote on this otherwise excellent book wrote), it's a bit hard to understand from this chapter why is Doc
needed, or how exactly does it solve the problem. Specifically, in the context of a chapter focusing on modules, it's hard to understand the motivation given by
If we decide to change the underlying code, our users will not be able to tell.
This, on its own, could be achieved by simply exporting a pretty printing function, and not exporting anything related to the implementation. Why is Doc
needed, then, and how does it solve the problem?
I self-answered this question after spending a lot of time reading chapter 5 as well as [Hughes 95] and [Wadler 98], for the following reasons:
Doc
Many documents and data structures are (multiway) tree-like:
JSON, YANG, and basically any document with a hierarchal structure
Therefore, it makes sense to factor out tree pretty-printing, from the actual source of the tree-like data. This factored out library will just contain methods to construct some abstract Doc
from tree-like data, and pretty print this Doc
. The point, therefore, is to service several types of sources at once.
To simplify things, let's focus on a particularly simple source:
data Tree = Tree String [Tree]
deriving (Eq, Show)
which can be constructed like this, for example:
tree =
Tree "a" [
Tree "b" [
Tree "c" []],
Tree "d" [
Tree "e" [],
Tree "f" [],
Tree "g" [],
Tree "h" []
],
Tree "i" []
]
Again, for a specific simple example, the criteria for "prettiness" is folding nested elements as much as possible, as long as the result does not exceed some specified length. So, for example, for the above tree
, if we are given length 30, the prettiest output is defined to be
a[[c] d[e, f, g, h] i]
if we are given 20
a[
b[c]
d[e, f, g, h]
i
]
and if we are given 8
a[
b[c]
d[
e,
f,
g,
h
]
i
]
Doc
The following is a simplification of [Walder 98].
Any tree can be expressed by a combination of two types:
a text node, containing a string
a nest node, containing the indentation level, an opening string, child nodes, and a closing text node
Additionally, any node can be folded or not.
In order to represent this, we can use the following:
data Doc =
Text String Int
| Nest Int String [Doc] String Int
deriving (Eq, Show)
The Text
type contains just a String
of the content
The Nest
type contains
an Int
indicating the indentation
a String
indicating the start element
a [Doc]
indicating child elements
a String
indicating the closing element
an Int
indicating the total length of this node, should it be folded
We can easily find the length a Doc
will have, if folded, using this:
getDocFoldedLength :: Doc -> Int
getDocFoldedLength (Text s) = length s
getDocFoldedLength (Nest _ _ _ _ l) = l
To create a Nest
, we use the following:
nest :: Int -> String -> [Doc] -> String -> Doc
nest indent open chs close =
Nest indent open chs close (length open + length chs - 1 + sum (map getDocFoldedLength chs) + length close)
Note that the folded-version length is calculated once, and then "cached".
Getting the folded-version length of a Doc
in O(1) is easy:
getDocFoldedLength :: Doc -> Int
getDocFoldedLength (Text s) = length s
getDocFoldedLength (Nest _ _ _ _ l) = l
If we will decide to actually fold a Doc
, we will also need the folded version of its content:
getDocFoldedString :: Doc -> String
getDocFoldedString (Nest _ open cs close _) = open ++ intercalate " " (map getDocFoldedString cs) ++ close
getDocFoldedString (Text s) = s
Constructing a Doc
from a tree can be done like this:
showTree :: Tree -> Doc
showTree (Tree s ts) = if null chs then Text s else nest (1 + length s) (s ++ "[") chs "]" where
chs = intercalateDocs "," $ map showTree ts
where intercalateDocs
is a utility function, intercalating commas between non-Nest
Docs
:
intercalateDocs :: String -> [Doc] -> [Doc]
intercalateDocs _ l | length l < 2 = l
intercalateDocs delim (hd:tl) = case hd of
(Text s) -> (Text (s ++ delim)):intercalateDocs delim tl
otherwise -> hd:intercalateDocs delim tl
E.g., for tree
above showTree tree
gives
Nest 2 "a[" [Nest 2 "b[" [Text "c"] "]" 4,Nest 2 "d[" [Text "e,",Text "f,",Text "g,",Text "h"] "]" 13,Text "i"] "]" 23
Now for the heart of the matter, a pretty
function, deciding which nested elements to fold. Since each getDocElement
gives us the length of the folded-version of a Doc
, we can efficiently decide whether to fold or not:
pretty :: Int -> Doc -> String
pretty w doc = pretty' 0 w doc where
pretty' i _ (Text s) = replicate i ' ' ++ s
pretty' i w (Nest j open cs close l) | i + j + l <= w =
replicate i ' ' ++ open ++ intercalate " " (map getDocFoldedString cs) ++ close
pretty' i w (Nest j open cs close l) =
replicate i ' ' ++ open ++ "\n" ++ intercalate "\n" (map (pretty' (i + j) w) cs) ++ "\n" ++ replicate i ' ' ++ close
The function pretty' i w doc
transforms doc
into a pretty form, assuming the current indentation is i
, and the width is w
. Specifically,
it transforms any Text
to its string
it folds any Nest
if it fits; if not, it calls itself recursively on the children.
(See full version here.)
The papers use solutions that are more elegant and Haskell-Specific. The Doc
's algebraic data type includes also a "horizontal concatenation", that generates a sequence of documents depending on whether it (and the descendants) will be folded or not. A careful search doesn't generate all possible documents (whose number is exponential), but rather discards generating large numbers of layouts that cannot possibly be part of the optimal solution. The solution here achieves the same complexity by caching the folded length within each node, which is simpler.
The chapter uses a slightly different API for compatibility with existing Haskell Pretty-Printing libraries. It organizes the code into modules. It also deals with practical JSON-specific problems such as escaping (which is unrelated to pretty printing).