performance algorithm testing arduino attiny

How to test algorithm performance on devices with low resources?

I am interested in using atmel avr controllers to read data from LIN bus. Unfortunately, messages on such bus have no beginning or end indicator and only reasonable solution seems to be brute force parsing. Available data from bus is loaded into circular buffer, and brute force method finds valid messages in buffer.

Working with 64 byte buffer and 20MHZ attiny, how can I test the performance of my code in order to see if buffer overflow is likely to occur? Added: My concern is that algorith will be running slow, thus buffering even more data.

A bit about brute force algorithm. Second element in a buffer is assumed to be message size. For example, if assumed length is 22, first 21 bytes are XORed and tested against 22nd byte in buffer. If checksum passes, code checks if first (SRC) and third (DST) byte are what they are supposed to be.

Solution

AVR is one of the easiest microcontrollers for performance analysis, because it is a RISC machine with a simple intruction set and well-known instruction execution time for each instruction.

So, the beasic procedure is that you take the assembly coude and start calculating different scenarios. Basic register operations take one clock cycle, branches usually two cycles, and memory accesses three cycles. A XORing cycle would take maybe 5-10 cycles per byte, so it is relatively cheap. How you get your hands on the assembly code depends on the compiler, but all compilers tend to give you the end result in a reasonable legible form.

Usually without seeing the algorithm and knowing anything about the timing requirements it is quite impossible to give a definite answer to this kind of questions. However, as the LIN bus speed is limited to 20 kbit/s, you will have around 10 000 clock cycles for each byte. That is enough for almost anything.

A more difficult question is what to do with the LIN framing which is dependent on timing. It is not a very nice habit, as it really requires some time extra effort from the microcontroller. (What on earth is wrong with using the 9th bit?)

The LIN frame consists of a

break (at least 13 bit times)
synch delimiter (0x55)
message id (8 bits)
message (0..8 x 8 bits)
checksum (8 bits)

There are at least four possible approaches with their ups and downs:

(Your apporach.) Start at all possible starting positions and try to figure out where the checksummed message is. Once you are in sync, this is not needed. (Easy but returns ghost messages with a probability 1/256. Remember to discard the synch field.)
Use the internal UART and look for the synch field; try to figure out whether the data after the delimiter makes any sense. (This has lower probability of errors than the above, but requires the synch delimiter to come through without glitches and may thus miss messages.)
Look for the break. Easiest way to do this to timestamp all arriving bytes. It is quite probably not required to buffer the incoming data in any way, as the data rate is very low (max. 2000 bytes/s). Nominally, the distance between the end of the last character of a frame and the start of the first character of the next frame is at least 13 bits. As receiving a character takes 10 bits, the delay between receiving the end of the last character in the previous message and end of the first character of the next message is nominally at least 23 bits. In order to allow some tolerance for the bit timing, the limit could be set to, e.g. 17 bits. If the distance in time between "character received" interrupts exceeds this limit, the characters belong to different frame. Once you have detected the break, you may start collecting a new message. (This works almost according to the official spec.)
Do-it-yourself bit-by-bit. If you do not have a good synchronization between the slave and the master, you will have to determine the master clock using this method. The implementation is not very straightforward, but one example is: http://www.atmel.com/images/doc1637.pdf (I do not claim that one to be foolproof, it is rather simplistic.)

I would go with #3. Create an interrupt for incoming data and whenever data comes you compare the current timestamp (for which you need a counter) to the timestamp of the previous interrupt. If the inter-character time is too long, you start a new message, otherwise append to the old message. Then you may need double buffering for the messages (one you are collecting, another you are analyzing) to avoid very long interrupt routines.

The actual implementation depends on the other structure of your code. This shouldn't take much time.

And if you cannot make sure your clock is well enough synchronized (+- 4%) to the moster clock, then you'll have to look at #4, which is probably much more instructive but quite tedious.