Why cached memory alignment required?

I am working on mips32, gcc compiler with micro-mips optimization. Each core has it's own cache memory. Core A os - ThreadX, Core B os - rtos. I wish to pass pointer and size from Core A to Core B. Let's take a look at the following flow:

1. Core A(ThreadX): pass pointer and size to Core B
2. Core B(RTOS): write to pointer size bytes
3. Core B(RTOS): flush(pointer)
4. Core A(ThreadX): pointer cacheInvalidate(pointer)
5. Core A(RTOS): copy from pointer to buffer

I worked with unaligned address, seems that it caused some unexpected issues. After passing aligned address I failed to reproduce the issues. Do I have to work with an aligned address? Why? What behavior should I expect if I will pass unaligned address?

Solution

Don't reason at such a low level, if you care about portable C. Look into the assembler code generated by your compiler if you don't (e.g. using gcc -O -fverbose-asm -S with GCC).

If you care about a particular implementation, you should mention it (compiler and version, optimization flags, operating system, processor and brand). But beware of undefined behavior, be scared of UB.

If coding in C: with C99, you need operating system stuff and support, e.g. pthreads(7) (which uses futex(7)). So read a pthread tutorial. with a C11 conforming implementation (see n1570), you could use atomic operations, and <threads.h>

Do I have to work with an aligned address?

Perhaps yes. Your hardware is accessing aligned data in a different (and quicker and more "atomic") way than non-aligned data. Cache coherence is specific to a particular processor in its details.