Search code examples
pythonparsingpyserialuart

Best approach to parse buffer string in python


I'm working on an embedded system that sends commands via Uart. Uart works at 115200 baud

On PC side I want to read these commands, parse them and execute the related action.

I choose python as language to build a script.

This is a typical command received from the embedded system:

S;SEND;40;{"ID":"asg01","T":1,"P":{"T":180}};E

Each message starts with S and ends with E. The command associated to the message is "SEND" and the payload length is 40.

My idea is read the bytes coming from the UART and:

  • check if the message starts with S
  • check if the message ends with E
  • if the above assumptions are true, split the message in order to find the command and the payload.

Which is the best way to parse the all bytes coming from an asynchronous uart?

My concern regards the lost of message due to wrong (or slow) parsing.

Thanks for the help!

BR, Federico


Solution

  • In my day job, I wrote the software for an embedded system and a PC communicating with each other by a USB cable, using the UART protocol at 115,200 baud.

    I see that you tagged your post with PySerial, so you already know about Python's most popular package for serial port communication. I will add that if you are using PyQt, there's a serial module included in that package as well.

    115,200 baud is not fast for a modern desktop PC. I doubt that any parsing you do on the PC side will fail to keep up. I parse data streams and plot graphs of my data in real time using PyQt.

    What I have noticed in my work with communication between an embedded system and a PC over a UART is that some data gets corrupted occasionally. A byte can be garbled, repeated, or dropped. Also, even if no bytes are added or dropped, you can occasionally perform a read while only part of a packet is in the buffer, and the read will terminate early. If you use a fixed read length of 40 bytes and trust that each read will always line up exactly with a data packet as you show above, you will frequently be wrong.

    To solve these kinds of problems, I wrote a FIFO class in Python which consumes serial port data at the head of the FIFO, yields valid data packets at the tail, and discards invalid data. My FIFO holds 3 times as many bytes as my data packets, so if I am looking for packet boundaries using specific sequences, I have plenty of signposts.

    A few more recommendations: work in Python 3 if you have the choice, it's cleaner. Use bytes and bytearray objects. Don't use str, because you will find yourself converting back and forth between Unicode and ASCII.