Search code examples
bashmanpage

Repeating characters when attempting to concatenate man pages to plain text files


I tried converting some man pages to plain text files. But when I open the file, many of the words have unnecessary repeating characters.

For example doing man awk > awk.txt changes the sections in the awk.txt file from:

  • NAME to NNAAMMEE
  • SYNOPSIS to SSYYNNOOPPSSIISS
  • DESCRIPTION to DDEESSCCRRIIPPTTIIOONN

I thought this would be a simple task. Why does this happen?


Solution

  • Man pages contain formating information (for instance to indicate if some words should be bold). Consequently, some characters may appear repeated when redirecting the output in a file.

    You may want to try:

    man awk | col -b > awk.txt
    

    What col is doing:

    col — filter reverse line feeds from input

    SYNOPSIS

    col [-bfhpx] [-l num]

    DESCRIPTION

    The col utility filters out reverse (and half reverse) line feeds so that the output is in the correct order with only forward and half forward line feeds, and replaces white-space characters with tabs where possible. This can be useful in processing the output of nroff(1) and tbl(1).

    The col utility reads from the standard input and writes to the standard output.

    The options are as follows:

    -b Do not output any backspaces, printing only the last character written to each column position.