Search code examples
linuxbashmanpage

Redirecting man page output to file results in double letters in words


I redirected the output of man djpeg into a text file so that I can reference it as I learn to use it. My instruction was man djpeg > textfile.txt. However, the output looks something like this:

LS(1)                     BSD General Commands Manual                    LS(1)

NNAAMMEE
     llss -- list directory contents

SSYYNNOOPPSSIISS
     llss [--AABBCCFFGGHHLLOOPPRRSSTTUUWW@@aabbccddeeffgghhiikkllmmnnooppqqrrssttuuwwxx11] [_f_i_l_e _._._.]

DDEESSCCRRIIPPTTIIOONN
     For each operand that names a _f_i_l_e of a type other than directory, llss
     displays its name as well as any requested, associated information.  For
     each operand that names a _f_i_l_e of type directory, llss displays the names
     of files contained within that directory, as well as any requested, asso-
     ciated information.

     If no operands are given, the contents of the current directory are dis-
     played.  If more than one operand is given, non-directory operands are
     displayed first; directory and non-directory operands are sorted sepa-
     rately and in lexicographical order.

     The following options are available:

     --@@      Display extended attribute keys and sizes in long (--ll) output.

     --11      (The numeric digit ``one''.)  Force output to be one entry per
             line.  This is the default when output is not to a terminal.

     --AA      List all entries except for _. and _._..  Always set for the super-
             user.

     --aa      Include directory entries whose names begin with a dot (_.).

     --BB      Force printing of non-printable characters (as defined by
             ctype(3) and current locale settings) in file names as \_x_x_x,
             where _x_x_x is the numeric value of the character in octal.

     --bb      As --BB, but use C escape codes whenever possible.

[...continues]

There's more but you get the point. Why is it repeating some of the characters? Also, why doesn't it repeat all of them if there's some function executing twice or a cache flushing incorrectly?


Solution

  • The 'man' program was originally designed to print its output on teletypes, and uses over-printing to produce bold character and underlining effects. What you're actually seeing is the effect of the file containing strings of the form X^HX, where the ^H is a backspace character. You also have strings like _^HX, for underlining (hence the _f_i_l_e).

    These can be stripped easily using a text editor like vi, which will display the backspaces.

    :%s/_^H//g
    

    will remove the underlines, and

    :%s/.^H.//g
    

    the boldings (the ^H in the above is ctrl-H. You will have to use ctrl-V ctrl-H to enter these into vi.