Search code examples
ansi-escapeescposescp

Safely ignoring unknown ANSI, ESC/P, ESC/POS sequences, know the length


Some context first:
I'm making a device which transforms an electronic typewriter into a serial printer/terminal. (don't ask why; I know that this does not make much sense practically)
Device inserted between the typewriter's controller and its keyboard.
It can:

  • let the keyboard through, transparently,
  • obtain keys presses, with or without blocking the typewriter from seeing them,
  • insert additional key presses.

With this I can make the typewriter work in different modes:

  • normal typewriter,
  • typewriter with each typed character logged through the serial port,
  • serial printer,
  • serial terminal.

For the serial printer/terminal modes I want to accept and understand some of the ANSI (for terminal), ESC/P, ESC/POS (for printer) escape sequences, depending on the mode.

And here comes the problem. Because the device is limited, it is possible to accept a very small subset of the escape sequences, which are possible to perform on the typewriter. I want to simply ignore any unsupported sequences.
The problem is that the sequences have different lengths.
When an unrecognised (by the device) sequence arrives, is there a general way to determine how many bytes long the sequence will be so that I know how many characters to ignore? (some simple rules based on first character(s) for example?)
Or am I forced to prepare a long lookup table (which takes precious flash space) for all possible sequences to always know how many bytes to ignore?

I want to avoid:

  • ignoring actual valid data which comes after the sequence and not printing it
  • printing parts of the escape sequences on paper
  • interpreting parts of unknown sequences as start of a new sequence

Of course, I could define my own sequences but then I would need a custom driver for my device. I prefer to use existing standard.

Edited to Add: as @Raymond Chen shows in the comment below, for ANSI sequences it can be detected where they are terminated. So no problem there. However for the ESC/P sequences (when in printer mode) I haven't noticed a similar way to know it.


Solution

  • ESC/P and ESC/POS have specifications by EPSON, but they are just de facto standards, not standardized ones.
    Other vendors diverting them do not necessarily comply with them and frequently make their own extensions.

    EPSON itself has made various extensions, and there are specifications such as ESC/P2, ESC/Page and ESC/Label(Zebra-ZPL II compatible?).

    For example, ESC/POS is here.
    ESC/POS Command Reference for TM Printers

    And here is ESC/P.
    EPSON ESC/P Reference Manual

    If you look elsewhere, you will find these.
    ESC/P - Wikipedia
    ESC/P 2 and FX Commands
    ESC/Label Command Reference Guide - Epson
    Esc/Pageコマンドリファレンス第4版 Membership registration is required.


    There are loose heuristic formats for their interpretation, but there will be no strictly standardized rules that can be applied to all.

    Whether you want to interpret all the documented commands steadily, or support them to a certain extent and give up on the details, you have a lot of options.

    There is a tool like this that is probably famous.
    ESC/POS command-line tools

    Included utilities
    esc2text
    esc2text extracts text and line breaks from binary ESC/POS files.

    It's not finished yet, but I'm making such a tool myself.
    EscPosUtils

    If you search, there will be other similar tools.